VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
@ 2026-02-25 13:43 Sasha Levin
  2026-02-25 13:50 ` David Hildenbrand (Arm)
  2026-02-25 20:30 ` VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range David Hildenbrand (Arm)
  0 siblings, 2 replies; 8+ messages in thread
From: Sasha Levin @ 2026-02-25 13:43 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, Andrew Morton, David Hildenbrand, Hugh Dickins,
	Zi Yan, Gavin Guo

Hi,

I've been playing around with improvements to syzkaller locally, and hit the
following crash on v7.0-rc1:

   vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm ffff8881048e1780
   prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
   pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
   refcnt 1
   flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
   ------------[ cut here ]------------
   kernel BUG at mm/huge_memory.c:2999!
   Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
   CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G                 N  7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
   RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
   RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
   RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
   RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
   RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
   R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
   R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
   FS:  0000000000000000(0000) GS:ffff88816f701000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
   PKRU: 80000000
   Call Trace:
    <TASK>
    __split_huge_pmd+0x201/0x350
    unmap_page_range+0xa6a/0x3db0
    unmap_single_vma+0x14b/0x230
    unmap_vmas+0x28f/0x580
    exit_mmap+0x203/0xa80
    __mmput+0x11b/0x540
    mmput+0x81/0xa0
    do_exit+0x7b9/0x2c60
    do_group_exit+0xd5/0x2a0
    get_signal+0x1fdc/0x2340
    arch_do_signal_or_restart+0x93/0x790
    exit_to_user_mode_loop+0x84/0x480
    do_syscall_64+0x4df/0x700
    entry_SYSCALL_64_after_hwframe+0x77/0x7f
    </TASK>
   Kernel panic - not syncing: Fatal exception

The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
a 136KB region starting 816KB past the PMD base.

---

The following analysis was performed with the help of an LLM:

The crash path is:

   exit_mmap -> unmap_vmas -> unmap_page_range -> zap_pmd_range
     -> sees pmd_is_huge(*pmd) is true
     -> range doesn't cover full HPAGE_PMD_SIZE
     -> calls __split_huge_pmd()
       -> haddr = address & HPAGE_PMD_MASK = 0x555580c00000
       -> __split_huge_pmd_locked()
         -> VM_BUG_ON_VMA(vma->vm_start > haddr) fires
            because 0x555580cc0000 > 0x555580c00000

The root cause appears to be remove_migration_pmd() (mm/huge_memory.c:4906).
This function reinstalls a huge PMD via set_pmd_at() after migration
completes, but it never checks whether the VMA still covers the full
PMD-aligned 2MB range.

Every other code path that installs a huge PMD validates VMA boundaries:

   - do_huge_pmd_anonymous_page(): thp_vma_suitable_order()
   - collapse_huge_page():         hugepage_vma_revalidate()
   - MADV_COLLAPSE:                hugepage_vma_revalidate()
   - do_set_pmd() (shmem/tmpfs):   thp_vma_suitable_order()

remove_migration_pmd() checks none of these.

The suspected race window is:

   1. VMA [A, A+2MB) has a THP. Migration starts, PMD becomes a migration
      entry.

   2. Concurrently, __split_vma() runs under mmap_write_lock. It calls
      vma_adjust_trans_huge() which acquires the PMD lock, splits the PMD
      migration entry into 512 PTE migration entries, and releases the PMD
      lock. Then VMA boundaries are modified (e.g., vma->vm_start = A+X).

   3. remove_migration_ptes() runs via rmap_walk_anon() WITHOUT mmap_lock
      (only the anon_vma lock). page_vma_mapped_walk() acquires the PMD
      lock. If it wins the lock BEFORE step 2's split, it finds the PMD
      migration entry still intact and returns with pvmw->pte == NULL.

   4. remove_migration_pmd() then reinstalls the huge PMD via set_pmd_at()
      without checking that the VMA (whose boundaries may have already been
      modified in step 2) still covers the full PMD range.

   5. Later, exit_mmap -> unmap_page_range -> zap_pmd_range encounters the
      huge PMD, calls __split_huge_pmd(), and the VM_BUG_ON_VMA fires
      because vma->vm_start no longer aligns with the PMD base.

The fix should add a VMA boundary check in remove_migration_pmd(). If
haddr < vma->vm_start or haddr + HPAGE_PMD_SIZE > vma->vm_end, the
function should split the PMD migration entry into PTE-level migration
entries instead of reinstalling the huge PMD, allowing PTE-level removal
to handle each page individually.

-- 
Thanks,
Sasha


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
  2026-02-25 13:43 VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range Sasha Levin
@ 2026-02-25 13:50 ` David Hildenbrand (Arm)
  2026-02-25 18:12   ` Sasha Levin
  2026-03-02 10:57   ` Lorenzo Stoakes
  2026-02-25 20:30 ` VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range David Hildenbrand (Arm)
  1 sibling, 2 replies; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-25 13:50 UTC (permalink / raw)
  To: Sasha Levin, linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, Andrew Morton, Hugh Dickins, Zi Yan, Gavin Guo

On 2/25/26 14:43, Sasha Levin wrote:
> Hi,
> 
> I've been playing around with improvements to syzkaller locally, and hit
> the
> following crash on v7.0-rc1:
> 
>   vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm
> ffff8881048e1780
>   prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
>   pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
>   refcnt 1
>   flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
>   ------------[ cut here ]------------
>   kernel BUG at mm/huge_memory.c:2999!
>   Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>   CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G                
> N  7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
> debian-1.17.0-1 04/01/2014
>   RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
>   RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
>   RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
>   RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
>   RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
>   R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
>   R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
>   FS:  0000000000000000(0000) GS:ffff88816f701000(0000)
> knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
>   PKRU: 80000000
>   Call Trace:
>    <TASK>
>    __split_huge_pmd+0x201/0x350
>    unmap_page_range+0xa6a/0x3db0
>    unmap_single_vma+0x14b/0x230
>    unmap_vmas+0x28f/0x580
>    exit_mmap+0x203/0xa80
>    __mmput+0x11b/0x540
>    mmput+0x81/0xa0
>    do_exit+0x7b9/0x2c60
>    do_group_exit+0xd5/0x2a0
>    get_signal+0x1fdc/0x2340
>    arch_do_signal_or_restart+0x93/0x790
>    exit_to_user_mode_loop+0x84/0x480
>    do_syscall_64+0x4df/0x700
>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>    </TASK>
>   Kernel panic - not syncing: Fatal exception
> 
> The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
> mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
> 0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
> a 136KB region starting 816KB past the PMD base.

Do you have a reproducer and would this trigger before v7.0-rc1?

Lorenzo did some changes around anon_vma locking recently, maybe related
to that.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
  2026-02-25 13:50 ` David Hildenbrand (Arm)
@ 2026-02-25 18:12   ` Sasha Levin
  2026-03-02 10:57   ` Lorenzo Stoakes
  1 sibling, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2026-02-25 18:12 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-mm, linux-kernel, Lorenzo Stoakes, Andrew Morton,
	Hugh Dickins, Zi Yan, Gavin Guo

On Wed, Feb 25, 2026 at 02:50:16PM +0100, David Hildenbrand (Arm) wrote:
>On 2/25/26 14:43, Sasha Levin wrote:
>> Hi,
>>
>> I've been playing around with improvements to syzkaller locally, and hit
>> the
>> following crash on v7.0-rc1:
>>
>>   vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm
>> ffff8881048e1780
>>   prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
>>   pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
>>   refcnt 1
>>   flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
>>   ------------[ cut here ]------------
>>   kernel BUG at mm/huge_memory.c:2999!
>>   Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>>   CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G                
>> N  7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
>> debian-1.17.0-1 04/01/2014
>>   RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
>>   RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
>>   RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
>>   RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
>>   RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
>>   R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
>>   R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
>>   FS:  0000000000000000(0000) GS:ffff88816f701000(0000)
>> knlGS:0000000000000000
>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
>>   PKRU: 80000000
>>   Call Trace:
>>    <TASK>
>>    __split_huge_pmd+0x201/0x350
>>    unmap_page_range+0xa6a/0x3db0
>>    unmap_single_vma+0x14b/0x230
>>    unmap_vmas+0x28f/0x580
>>    exit_mmap+0x203/0xa80
>>    __mmput+0x11b/0x540
>>    mmput+0x81/0xa0
>>    do_exit+0x7b9/0x2c60
>>    do_group_exit+0xd5/0x2a0
>>    get_signal+0x1fdc/0x2340
>>    arch_do_signal_or_restart+0x93/0x790
>>    exit_to_user_mode_loop+0x84/0x480
>>    do_syscall_64+0x4df/0x700
>>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>    </TASK>
>>   Kernel panic - not syncing: Fatal exception
>>
>> The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
>> mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
>> 0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
>> a 136KB region starting 816KB past the PMD base.
>
>Do you have a reproducer and would this trigger before v7.0-rc1?

No reproducer. I saw it exactly once yesterday, syzkaller wasn't able to come
up with a reproducer and pointing the LLM at that task hasn't produced anything
useful either :(

-- 
Thanks,
Sasha


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
  2026-02-25 13:43 VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range Sasha Levin
  2026-02-25 13:50 ` David Hildenbrand (Arm)
@ 2026-02-25 20:30 ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-25 20:30 UTC (permalink / raw)
  To: Sasha Levin, linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, Andrew Morton, Hugh Dickins, Zi Yan, Gavin Guo

> The suspected race window is:
> 
>   1. VMA [A, A+2MB) has a THP. Migration starts, PMD becomes a migration
>      entry.
> 
>   2. Concurrently, __split_vma() runs under mmap_write_lock. It calls
>      vma_adjust_trans_huge() which acquires the PMD lock, splits the PMD
>      migration entry into 512 PTE migration entries, and releases the PMD
>      lock. Then VMA boundaries are modified (e.g., vma->vm_start = A+X).

split_huge_pmd_locked() will call __split_huge_pmd_locked() either for
* pmd_trans_huge(): Present PMD
* pmd_is_valid_softleaf(): Migration PMDs (for our purpose)

__split_huge_pmd_locked() will remap either, resulting in a PTE table.


> 
>   3. remove_migration_ptes() runs via rmap_walk_anon() WITHOUT mmap_lock
>      (only the anon_vma lock). page_vma_mapped_walk() acquires the PMD
>      lock. If it wins the lock BEFORE step 2's split, it finds the PMD
>      migration entry still intact and returns with pvmw->pte == NULL.
If 2. runs after 3, 2. would simply split the (present) PMD.
If 2. runs before 3, 2. would simply split the migration PMD.

And both should sync on the PMD table lock.

exit() cannot run before 2. completed.


Wondering if it's instead some missed vma_adjust_trans_huge() case. I
see a call from commit_merge(), so maybe some odd corner cases with VMA
merging.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
  2026-02-25 13:50 ` David Hildenbrand (Arm)
  2026-02-25 18:12   ` Sasha Levin
@ 2026-03-02 10:57   ` Lorenzo Stoakes
  2026-03-02 15:13     ` Sasha Levin
  2026-03-02 15:15     ` [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma() Sasha Levin
  1 sibling, 2 replies; 8+ messages in thread
From: Lorenzo Stoakes @ 2026-03-02 10:57 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Sasha Levin, linux-mm, linux-kernel, Andrew Morton, Hugh Dickins,
	Zi Yan, Gavin Guo

On Wed, Feb 25, 2026 at 02:50:16PM +0100, David Hildenbrand (Arm) wrote:
> On 2/25/26 14:43, Sasha Levin wrote:
> > Hi,
> >
> > I've been playing around with improvements to syzkaller locally, and hit
> > the
> > following crash on v7.0-rc1:
> >
> >   vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm
> > ffff8881048e1780
> >   prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
> >   pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
> >   refcnt 1
> >   flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> >   ------------[ cut here ]------------
> >   kernel BUG at mm/huge_memory.c:2999!
> >   Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> >   CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G
> > N  7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
> >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
> > debian-1.17.0-1 04/01/2014
> >   RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
> >   RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
> >   RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
> >   RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
> >   RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
> >   R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
> >   R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
> >   FS:  0000000000000000(0000) GS:ffff88816f701000(0000)
> > knlGS:0000000000000000
> >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
> >   PKRU: 80000000
> >   Call Trace:
> >    <TASK>
> >    __split_huge_pmd+0x201/0x350
> >    unmap_page_range+0xa6a/0x3db0
> >    unmap_single_vma+0x14b/0x230
> >    unmap_vmas+0x28f/0x580
> >    exit_mmap+0x203/0xa80
> >    __mmput+0x11b/0x540
> >    mmput+0x81/0xa0
> >    do_exit+0x7b9/0x2c60
> >    do_group_exit+0xd5/0x2a0
> >    get_signal+0x1fdc/0x2340
> >    arch_do_signal_or_restart+0x93/0x790
> >    exit_to_user_mode_loop+0x84/0x480
> >    do_syscall_64+0x4df/0x700
> >    entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >    </TASK>
> >   Kernel panic - not syncing: Fatal exception
> >
> > The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
> > mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
> > 0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
> > a 136KB region starting 816KB past the PMD base.
>
> Do you have a reproducer and would this trigger before v7.0-rc1?
>
> Lorenzo did some changes around anon_vma locking recently, maybe related
> to that.

A quick glance doesn't suggest any changes I made should have had an impact
here.

_Should have_ :)

I think without a reproducer this is going to be hard to pinpoint. Hopefully
syzbot proper should figure one out eventually?

>
> --
> Cheers,
>
> David

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
  2026-03-02 10:57   ` Lorenzo Stoakes
@ 2026-03-02 15:13     ` Sasha Levin
  2026-03-02 15:15     ` [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma() Sasha Levin
  1 sibling, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2026-03-02 15:13 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: David Hildenbrand (Arm),
	linux-mm, linux-kernel, Andrew Morton, Hugh Dickins, Zi Yan,
	Gavin Guo

On Mon, Mar 02, 2026 at 10:57:47AM +0000, Lorenzo Stoakes wrote:
>On Wed, Feb 25, 2026 at 02:50:16PM +0100, David Hildenbrand (Arm) wrote:
>> On 2/25/26 14:43, Sasha Levin wrote:
>> > Hi,
>> >
>> > I've been playing around with improvements to syzkaller locally, and hit
>> > the
>> > following crash on v7.0-rc1:
>> >
>> >   vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm
>> > ffff8881048e1780
>> >   prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
>> >   pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
>> >   refcnt 1
>> >   flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
>> >   ------------[ cut here ]------------
>> >   kernel BUG at mm/huge_memory.c:2999!
>> >   Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> >   CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G
>> > N  7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
>> >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
>> > debian-1.17.0-1 04/01/2014
>> >   RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
>> >   RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
>> >   RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
>> >   RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
>> >   RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
>> >   R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
>> >   R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
>> >   FS:  0000000000000000(0000) GS:ffff88816f701000(0000)
>> > knlGS:0000000000000000
>> >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >   CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
>> >   PKRU: 80000000
>> >   Call Trace:
>> >    <TASK>
>> >    __split_huge_pmd+0x201/0x350
>> >    unmap_page_range+0xa6a/0x3db0
>> >    unmap_single_vma+0x14b/0x230
>> >    unmap_vmas+0x28f/0x580
>> >    exit_mmap+0x203/0xa80
>> >    __mmput+0x11b/0x540
>> >    mmput+0x81/0xa0
>> >    do_exit+0x7b9/0x2c60
>> >    do_group_exit+0xd5/0x2a0
>> >    get_signal+0x1fdc/0x2340
>> >    arch_do_signal_or_restart+0x93/0x790
>> >    exit_to_user_mode_loop+0x84/0x480
>> >    do_syscall_64+0x4df/0x700
>> >    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> >    </TASK>
>> >   Kernel panic - not syncing: Fatal exception
>> >
>> > The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
>> > mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
>> > 0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
>> > a 136KB region starting 816KB past the PMD base.
>>
>> Do you have a reproducer and would this trigger before v7.0-rc1?
>>
>> Lorenzo did some changes around anon_vma locking recently, maybe related
>> to that.
>
>A quick glance doesn't suggest any changes I made should have had an impact
>here.
>
>_Should have_ :)
>
>I think without a reproducer this is going to be hard to pinpoint. Hopefully
>syzbot proper should figure one out eventually?

So no luck just yet.

I did hit a different issue, which the LLM was able to triage, and I'm running
with the patch right now to make sure that the issue doesn't reproduce.

I'm not sure if it's related or not, but I'll send the WIP patch as a reply.

-- 
Thanks,
Sasha


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma()
  2026-03-02 10:57   ` Lorenzo Stoakes
  2026-03-02 15:13     ` Sasha Levin
@ 2026-03-02 15:15     ` Sasha Levin
  2026-03-02 17:48       ` Lorenzo Stoakes
  1 sibling, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2026-03-02 15:15 UTC (permalink / raw)
  To: lorenzo.stoakes
  Cc: akpm, david, gavinguo, hughd, linux-kernel, linux-mm, sashal, ziy

When dup_anon_vma() calls anon_vma_clone() and it fails with -ENOMEM,
dst->anon_vma is left pointing at src->anon_vma without a corresponding
num_active_vmas increment (which only happens on the success path).

The internal cleanup_partial_anon_vmas() correctly frees partially-
allocated AVCs but does not clear dst->anon_vma. Later, when the VMA is
torn down during process exit, unlink_anon_vmas() sees a non-NULL
vma->anon_vma and decrements num_active_vmas without a prior matching
increment, causing an underflow. This eventually triggers:

  WARNING: mm/rmap.c:528 at unlink_anon_vmas+0x68e/0x900 mm/rmap.c:528

First, fault injection in the mlock2 syscall path:

  FAULT_INJECTION: forcing a failure.
  name failslab, interval 1, probability 0, space 0, times 0
  CPU: 3 PID: 4261 Comm: syz.6.96
  Call Trace:
   should_fail_ex.cold+0xd8/0x15d
   should_failslab+0xd4/0x150
   kmem_cache_alloc_noprof+0x60/0x630
   anon_vma_clone+0x2ed/0xcf0
   dup_anon_vma+0x1cb/0x320
   vma_modify+0x16dd/0x2230
   vma_modify_flags+0x1f9/0x350
   mlock_fixup+0x225/0xe10
   apply_vma_lock_flags+0x249/0x360
   do_mlock+0x269/0x7f0
   __x64_sys_mlock2+0xc0/0x100

Followed by the WARNING on the same task during exit:

  WARNING: mm/rmap.c:528 at unlink_anon_vmas+0x68e/0x900
  CPU: 3 PID: 4261 Comm: syz.6.96
  Call Trace:
   free_pgtables+0x312/0x950
   exit_mmap+0x487/0xa80
   __mmput+0x11b/0x540
   exit_mm
   do_exit+0x7b9/0x2c60

Fix this by clearing dst->anon_vma on clone failure, restoring the VMA
to its original unfaulted state. This ensures unlink_anon_vmas() will
correctly bail out early at the !active_anon_vma check.

Other callers of anon_vma_clone() are unaffected: VMA_OP_SPLIT/REMAP
free the dst VMA on error, and VMA_OP_FORK explicitly sets anon_vma to
NULL before cloning.

Fixes: 542eda1a83294 ("mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts")
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 mm/vma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/vma.c b/mm/vma.c
index be64f781a3aa7..4cf6a2a05c10a 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -629,8 +629,10 @@ static int dup_anon_vma(struct vm_area_struct *dst,
 		vma_assert_write_locked(dst);
 		dst->anon_vma = src->anon_vma;
 		ret = anon_vma_clone(dst, src, VMA_OP_MERGE_UNFAULTED);
-		if (ret)
+		if (ret) {
+			dst->anon_vma = NULL;
 			return ret;
+		}

 		*dup = dst;
 	}
-- 
2.51.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma()
  2026-03-02 15:15     ` [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma() Sasha Levin
@ 2026-03-02 17:48       ` Lorenzo Stoakes
  0 siblings, 0 replies; 8+ messages in thread
From: Lorenzo Stoakes @ 2026-03-02 17:48 UTC (permalink / raw)
  To: Sasha Levin; +Cc: akpm, david, gavinguo, hughd, linux-kernel, linux-mm, ziy

On Mon, Mar 02, 2026 at 10:15:47AM -0500, Sasha Levin wrote:
> When dup_anon_vma() calls anon_vma_clone() and it fails with -ENOMEM,
> dst->anon_vma is left pointing at src->anon_vma without a corresponding
> num_active_vmas increment (which only happens on the success path).
>
> The internal cleanup_partial_anon_vmas() correctly frees partially-
> allocated AVCs but does not clear dst->anon_vma. Later, when the VMA is
> torn down during process exit, unlink_anon_vmas() sees a non-NULL
> vma->anon_vma and decrements num_active_vmas without a prior matching
> increment, causing an underflow. This eventually triggers:

Yikes!

>
>   WARNING: mm/rmap.c:528 at unlink_anon_vmas+0x68e/0x900 mm/rmap.c:528
>
> First, fault injection in the mlock2 syscall path:
>
>   FAULT_INJECTION: forcing a failure.
>   name failslab, interval 1, probability 0, space 0, times 0
>   CPU: 3 PID: 4261 Comm: syz.6.96
>   Call Trace:
>    should_fail_ex.cold+0xd8/0x15d
>    should_failslab+0xd4/0x150
>    kmem_cache_alloc_noprof+0x60/0x630
>    anon_vma_clone+0x2ed/0xcf0
>    dup_anon_vma+0x1cb/0x320
>    vma_modify+0x16dd/0x2230
>    vma_modify_flags+0x1f9/0x350
>    mlock_fixup+0x225/0xe10
>    apply_vma_lock_flags+0x249/0x360
>    do_mlock+0x269/0x7f0
>    __x64_sys_mlock2+0xc0/0x100
>
> Followed by the WARNING on the same task during exit:
>
>   WARNING: mm/rmap.c:528 at unlink_anon_vmas+0x68e/0x900
>   CPU: 3 PID: 4261 Comm: syz.6.96
>   Call Trace:
>    free_pgtables+0x312/0x950
>    exit_mmap+0x487/0xa80
>    __mmput+0x11b/0x540
>    exit_mm
>    do_exit+0x7b9/0x2c60
>
> Fix this by clearing dst->anon_vma on clone failure, restoring the VMA
> to its original unfaulted state. This ensures unlink_anon_vmas() will
> correctly bail out early at the !active_anon_vma check.
>
> Other callers of anon_vma_clone() are unaffected: VMA_OP_SPLIT/REMAP
> free the dst VMA on error, and VMA_OP_FORK explicitly sets anon_vma to
> NULL before cloning.
>
> Fixes: 542eda1a83294 ("mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts")
> Assisted-by: Claude:claude-opus-4-6
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  mm/vma.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index be64f781a3aa7..4cf6a2a05c10a 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -629,8 +629,10 @@ static int dup_anon_vma(struct vm_area_struct *dst,
>  		vma_assert_write_locked(dst);
>  		dst->anon_vma = src->anon_vma;
>  		ret = anon_vma_clone(dst, src, VMA_OP_MERGE_UNFAULTED);
> -		if (ret)
> +		if (ret) {
> +			dst->anon_vma = NULL;
>  			return ret;
> +		}

Hm, I think I'd rather we tackle this at the source to be honest.

I think it makes sense to do this in cleanup_partial_anon_vmas() since that's
handling the rest of the cleanup, and this is what the anon_vma_clone() error
path previously did.

Something like:

static void cleanup_partial_anon_vmas(struct vm_area_struct *vma)
{
	struct anon_vma_chain *avc, *next;

	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
		list_del(&avc->same_vma);
		anon_vma_chain_free(avc);
	}
+	vma->anon_vma = NULL;
}


>
>  		*dup = dst;
>  	}
> --
> 2.51.0
>

Thanks for looking at this, this definitely needs fixing, albeit luckily real
world OOM's like this are probably near-impossible to trigger due to be 'too
small to fail' allocations, however we do absolutely need to ensure these code
paths are correctly handled.

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-02 17:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-25 13:43 VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range Sasha Levin
2026-02-25 13:50 ` David Hildenbrand (Arm)
2026-02-25 18:12   ` Sasha Levin
2026-03-02 10:57   ` Lorenzo Stoakes
2026-03-02 15:13     ` Sasha Levin
2026-03-02 15:15     ` [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma() Sasha Levin
2026-03-02 17:48       ` Lorenzo Stoakes
2026-02-25 20:30 ` VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox