[syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
@ 2026-01-14 16:32 syzbot
  2026-01-14 16:42 ` Dmitry Vyukov
  0 siblings, 1 reply; 8+ messages in thread
From: syzbot @ 2026-01-14 16:32 UTC (permalink / raw)
  To: Liam.Howlett, akpm, david, harry.yoo, jannh, linux-kernel,
	linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka

Hello,

syzbot found the following issue on:

HEAD commit:    cfd4039213e7 Merge tag 'io_uring-6.19-20251208' of git://g..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1554d992580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=c3201432211be40f
dashboard link: https://syzkaller.appspot.com/bug?extid=f5d897f5194d92aa1769
compiler:       Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/9f556ae6e3c4/disk-cfd40392.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/efcf53c1d459/vmlinux-cfd40392.xz
kernel image: https://storage.googleapis.com/syzbot-assets/858f42961336/bzImage-cfd40392.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com

==================================================================
BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare

write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1:
 __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212
 __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673
 hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
 hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
 handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
 do_user_addr_fault+0x3fe/0x1080 arch/x86/mm/fault.c:1387
 handle_page_fault arch/x86/mm/fault.c:1476 [inline]
 exc_page_fault+0x62/0xa0 arch/x86/mm/fault.c:1532
 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618
 fault_in_readable+0xad/0x170 mm/gup.c:-1
 fault_in_iov_iter_readable+0x129/0x210 lib/iov_iter.c:106
 generic_perform_write+0x3cf/0x490 mm/filemap.c:4363
 shmem_file_write_iter+0xc5/0xf0 mm/shmem.c:3490
 new_sync_write fs/read_write.c:593 [inline]
 vfs_write+0x52a/0x960 fs/read_write.c:686
 ksys_pwrite64 fs/read_write.c:793 [inline]
 __do_sys_pwrite64 fs/read_write.c:801 [inline]
 __se_sys_pwrite64 fs/read_write.c:798 [inline]
 __x64_sys_pwrite64+0xfd/0x150 fs/read_write.c:798
 x64_sys_call+0x9f7/0x3000 arch/x86/include/generated/asm/syscalls_64.h:19
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xd8/0x2a0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0:
 __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667
 hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
 hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
 handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
 faultin_page mm/gup.c:1126 [inline]
 __get_user_pages+0x1024/0x1ed0 mm/gup.c:1428
 populate_vma_page_range mm/gup.c:1860 [inline]
 __mm_populate+0x243/0x3a0 mm/gup.c:1963
 mm_populate include/linux/mm.h:3701 [inline]
 vm_mmap_pgoff+0x232/0x2e0 mm/util.c:586
 ksys_mmap_pgoff+0x268/0x310 mm/mmap.c:604
 x64_sys_call+0x16bb/0x3000 arch/x86/include/generated/asm/syscalls_64.h:10
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xd8/0x2a0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x0000000000000000 -> 0xffff888104ecca28

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G        W           syzkaller #0 PREEMPT(voluntary) 
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
  2026-01-14 16:32 [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare syzbot
@ 2026-01-14 16:42 ` Dmitry Vyukov
  2026-01-14 16:59   ` Jann Horn
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2026-01-14 16:42 UTC (permalink / raw)
  To: syzbot
  Cc: Liam.Howlett, akpm, david, harry.yoo, jannh, linux-kernel,
	linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka

On Wed, 14 Jan 2026 at 17:32, syzbot
<syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    cfd4039213e7 Merge tag 'io_uring-6.19-20251208' of git://g..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1554d992580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=c3201432211be40f
> dashboard link: https://syzkaller.appspot.com/bug?extid=f5d897f5194d92aa1769
> compiler:       Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/9f556ae6e3c4/disk-cfd40392.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/efcf53c1d459/vmlinux-cfd40392.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/858f42961336/bzImage-cfd40392.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com
>
> ==================================================================
> BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
>
> write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1:
>  __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212
>  __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673
>  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
>  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
>  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
>  do_user_addr_fault+0x3fe/0x1080 arch/x86/mm/fault.c:1387
>  handle_page_fault arch/x86/mm/fault.c:1476 [inline]
>  exc_page_fault+0x62/0xa0 arch/x86/mm/fault.c:1532
>  asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618
>  fault_in_readable+0xad/0x170 mm/gup.c:-1
>  fault_in_iov_iter_readable+0x129/0x210 lib/iov_iter.c:106
>  generic_perform_write+0x3cf/0x490 mm/filemap.c:4363
>  shmem_file_write_iter+0xc5/0xf0 mm/shmem.c:3490
>  new_sync_write fs/read_write.c:593 [inline]
>  vfs_write+0x52a/0x960 fs/read_write.c:686
>  ksys_pwrite64 fs/read_write.c:793 [inline]
>  __do_sys_pwrite64 fs/read_write.c:801 [inline]
>  __se_sys_pwrite64 fs/read_write.c:798 [inline]
>  __x64_sys_pwrite64+0xfd/0x150 fs/read_write.c:798
>  x64_sys_call+0x9f7/0x3000 arch/x86/include/generated/asm/syscalls_64.h:19
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xd8/0x2a0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0:
>  __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667
>  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
>  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
>  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
>  faultin_page mm/gup.c:1126 [inline]
>  __get_user_pages+0x1024/0x1ed0 mm/gup.c:1428
>  populate_vma_page_range mm/gup.c:1860 [inline]
>  __mm_populate+0x243/0x3a0 mm/gup.c:1963
>  mm_populate include/linux/mm.h:3701 [inline]
>  vm_mmap_pgoff+0x232/0x2e0 mm/util.c:586
>  ksys_mmap_pgoff+0x268/0x310 mm/mmap.c:604
>  x64_sys_call+0x16bb/0x3000 arch/x86/include/generated/asm/syscalls_64.h:10
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xd8/0x2a0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> value changed: 0x0000000000000000 -> 0xffff888104ecca28
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G        W           syzkaller #0 PREEMPT(voluntary)
> Tainted: [W]=WARN
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
> ==================================================================

Hi Harry,

I see you've been debugging:
KASAN: slab-use-after-free Read in folio_remove_rmap_ptes
https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/

Can that bug be caused by this data race?
Below is an explanation by Gemini LLM as to why this race is harmful.
Obviously take it with a grain of salt, but with my limited mm
knowledge it does not look immediately wrong (re rmap invariant).

However, now digging into details I see that this Lorenzo's patch
also marked as fixing "KASAN: slab-use-after-free Read in
folio_remove_rmap_ptes":

mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/

So perhaps the race is still benign (or points to another issue?)

Here is what LLM said about the race:
-----

The bug report is actionable and points to a harmful data race in the Linux
kernel's memory management subsystem, specifically in the handling of
anonymous `hugetlb` mappings.

**Analysis:**

1.  **Race Location:** The data race occurs on the `vma->anon_vma` field
       of a `struct vm_area_struct`.
    *   **Writer:** Task 13471 executes `__anon_vma_prepare` in `mm/rmap.c`.
        This function initializes the `anon_vma` for a VMA. It holds
        `mm->page_table_lock` and writes to `vma->anon_vma` (line 211 in the
        viewed source, corresponding to the report's `mm/rmap.c:212` area).
    *   **Reader:** Task 13473 executes `__vmf_anon_prepare` in `mm/memory.c`.
        This function is an optimization wrapper that checks if
        `vma->anon_vma` is already set (line 3666/3667) to avoid the overhead
        of `__anon_vma_prepare`. This check is performed **without** holding
        `mm->page_table_lock`.

2.  **Consistency:** The report is consistent. Both tasks are handling
`hugetlb` page faults (`hugetlb_no_page` -> `__vmf_anon_prepare`).
The writer stack shows it proceeded into `__anon_vma_prepare` (implying
`vma->anon_vma` was NULL initially), while the reader stack shows it
reading `vma->anon_vma`. The value change `0x0000000000000000 ->
0xffff888104ecca28` confirms initialization from NULL to a pointer.

3.  **Harmfulness (Why it is not benign):**
    *   In `__anon_vma_prepare`, the code currently initializes
    `vma->anon_vma` **before** linking the VMA to the `anon_vma`
    structure via `anon_vma_chain_link`.
    *   ```c
        vma->anon_vma = anon_vma;
        anon_vma_chain_link(vma, avc, anon_vma);
        ```
    *   Because the reader (`__vmf_anon_prepare`) checks `vma->anon_vma`
    locklessly, it can see the non-NULL value before `anon_vma_chain_link`
    has completed (due to compiler/CPU reordering or simple preemption
    between the two statements).
    *   If the reader proceeds, it assumes the `anon_vma` is fully ready.
    It then maps a page and sets `folio->mapping = anon_vma`.
    *   However, if `anon_vma_chain_link` hasn't finished, the `anon_vma`
    (specifically its interval tree) does not yet contain the entry for
    this `vma`.
    *   This breaks the reverse mapping (rmap) invariant. If the kernel
    subsequently tries to unmap or migrate this page (finding it via
    `folio->mapping`), `rmap_walk` will fail to find the VMA in the
    `anon_vma`'s interval tree. This can lead to pages being effectively
    pinned, migration failures, or in worst-case scenarios (like memory
    corruption handling or specific reclaim paths), logical errors where
    a page is assumed unmapped when it is not.

4.  **Fix:** The fix requires enforcing ordering. `vma->anon_vma` should
be set **after** `anon_vma_chain_link` is complete, and `smp_store_release`
/ `smp_load_acquire` (or equivalent barriers) should be used to ensure the
reader observes the fully initialized state.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
  2026-01-14 16:42 ` Dmitry Vyukov
@ 2026-01-14 16:59   ` Jann Horn
  2026-01-14 17:05     ` Dmitry Vyukov
  0 siblings, 1 reply; 8+ messages in thread
From: Jann Horn @ 2026-01-14 16:59 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, Liam.Howlett, akpm, david, harry.yoo, linux-kernel,
	linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka

On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> On Wed, 14 Jan 2026 at 17:32, syzbot
> <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote:
> > ==================================================================
> > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
> >
> > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1:
> >  __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212
> >  __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673
> >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
[...]
> > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0:
> >  __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667
> >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
[...]
> >
> > value changed: 0x0000000000000000 -> 0xffff888104ecca28
> >
> > Reported by Kernel Concurrency Sanitizer on:
> > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G        W           syzkaller #0 PREEMPT(voluntary)
> > Tainted: [W]=WARN
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
> > ==================================================================
>
> Hi Harry,
>
> I see you've been debugging:
> KASAN: slab-use-after-free Read in folio_remove_rmap_ptes
> https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/
>
> Can that bug be caused by this data race?
> Below is an explanation by Gemini LLM as to why this race is harmful.
> Obviously take it with a grain of salt, but with my limited mm
> knowledge it does not look immediately wrong (re rmap invariant).
>
> However, now digging into details I see that this Lorenzo's patch
> also marked as fixing "KASAN: slab-use-after-free Read in
> folio_remove_rmap_ptes":
>
> mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
> https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/
>
> So perhaps the race is still benign (or points to another issue?)
>
> Here is what LLM said about the race:
> -----
>
> The bug report is actionable and points to a harmful data race in the Linux
> kernel's memory management subsystem, specifically in the handling of
> anonymous `hugetlb` mappings.

This data race is not specific to hugetlb at all, and it isn't caused
by any recent changes. It's a longstanding thing in core MM, but it's
pretty benign as far as I know.

Fundamentally, the field vma->anon_vma can be read while only holding
the mmap lock in read mode; and it can concurrently be changed from
NULL to non-NULL.

One scenario to cause such a data race is to create a new anonymous
VMA, then trigger two concurrent page faults inside this VMA. Assume a
configuration with VMA locking disabled for simplicity, so that both
faults happen under the mmap lock in read mode. This will lead to two
concurrent calls to __vmf_anon_prepare()
(https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623),
both threads only holding the mmap_lock in read mode.
__vmf_anon_prepare() is essentially this (from
https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623,
with VMA locking code removed):

vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
{
        struct vm_area_struct *vma = vmf->vma;
        vm_fault_t ret = 0;

        if (likely(vma->anon_vma))
                return 0;
        [...]
        if (__anon_vma_prepare(vma))
                ret = VM_FAULT_OOM;
        [...]
        return ret;
}

int __anon_vma_prepare(struct vm_area_struct *vma)
{
        struct mm_struct *mm = vma->vm_mm;
        struct anon_vma *anon_vma, *allocated;
        struct anon_vma_chain *avc;

        [...]

        [... allocate stuff ...]

        anon_vma_lock_write(anon_vma);
        /* page_table_lock to protect against threads */
        spin_lock(&mm->page_table_lock);
        if (likely(!vma->anon_vma)) {
                vma->anon_vma = anon_vma;
                [...]
        }
        spin_unlock(&mm->page_table_lock);
        anon_vma_unlock_write(anon_vma);

        [... cleanup ...]

        return 0;

        [... error handling ...]
}

So if one thread reaches the "vma->anon_vma = anon_vma" assignment
while the other thread is running the "if (likely(vma->anon_vma))"
check, you get a (AFAIK benign) data race.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
  2026-01-14 16:59   ` Jann Horn
@ 2026-01-14 17:05     ` Dmitry Vyukov
  2026-01-14 17:29       ` Jann Horn
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2026-01-14 17:05 UTC (permalink / raw)
  To: Jann Horn
  Cc: syzbot, Liam.Howlett, akpm, david, harry.yoo, linux-kernel,
	linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka

On Wed, 14 Jan 2026 at 18:00, Jann Horn <jannh@google.com> wrote:
>
> On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > On Wed, 14 Jan 2026 at 17:32, syzbot
> > <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote:
> > > ==================================================================
> > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
> > >
> > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1:
> > >  __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212
> > >  __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673
> > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> [...]
> > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0:
> > >  __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667
> > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> [...]
> > >
> > > value changed: 0x0000000000000000 -> 0xffff888104ecca28
> > >
> > > Reported by Kernel Concurrency Sanitizer on:
> > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G        W           syzkaller #0 PREEMPT(voluntary)
> > > Tainted: [W]=WARN
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
> > > ==================================================================
> >
> > Hi Harry,
> >
> > I see you've been debugging:
> > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes
> > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/
> >
> > Can that bug be caused by this data race?
> > Below is an explanation by Gemini LLM as to why this race is harmful.
> > Obviously take it with a grain of salt, but with my limited mm
> > knowledge it does not look immediately wrong (re rmap invariant).
> >
> > However, now digging into details I see that this Lorenzo's patch
> > also marked as fixing "KASAN: slab-use-after-free Read in
> > folio_remove_rmap_ptes":
> >
> > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
> > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/
> >
> > So perhaps the race is still benign (or points to another issue?)
> >
> > Here is what LLM said about the race:
> > -----
> >
> > The bug report is actionable and points to a harmful data race in the Linux
> > kernel's memory management subsystem, specifically in the handling of
> > anonymous `hugetlb` mappings.
>
> This data race is not specific to hugetlb at all, and it isn't caused
> by any recent changes. It's a longstanding thing in core MM, but it's
> pretty benign as far as I know.
>
> Fundamentally, the field vma->anon_vma can be read while only holding
> the mmap lock in read mode; and it can concurrently be changed from
> NULL to non-NULL.
>
> One scenario to cause such a data race is to create a new anonymous
> VMA, then trigger two concurrent page faults inside this VMA. Assume a
> configuration with VMA locking disabled for simplicity, so that both
> faults happen under the mmap lock in read mode. This will lead to two
> concurrent calls to __vmf_anon_prepare()
> (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623),
> both threads only holding the mmap_lock in read mode.
> __vmf_anon_prepare() is essentially this (from
> https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623,
> with VMA locking code removed):
>
> vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
> {
>         struct vm_area_struct *vma = vmf->vma;
>         vm_fault_t ret = 0;
>
>         if (likely(vma->anon_vma))
>                 return 0;
>         [...]
>         if (__anon_vma_prepare(vma))
>                 ret = VM_FAULT_OOM;
>         [...]
>         return ret;
> }
>
> int __anon_vma_prepare(struct vm_area_struct *vma)
> {
>         struct mm_struct *mm = vma->vm_mm;
>         struct anon_vma *anon_vma, *allocated;
>         struct anon_vma_chain *avc;
>
>         [...]
>
>         [... allocate stuff ...]
>
>         anon_vma_lock_write(anon_vma);
>         /* page_table_lock to protect against threads */
>         spin_lock(&mm->page_table_lock);
>         if (likely(!vma->anon_vma)) {
>                 vma->anon_vma = anon_vma;
>                 [...]
>         }
>         spin_unlock(&mm->page_table_lock);
>         anon_vma_unlock_write(anon_vma);
>
>         [... cleanup ...]
>
>         return 0;
>
>         [... error handling ...]
> }
>
> So if one thread reaches the "vma->anon_vma = anon_vma" assignment
> while the other thread is running the "if (likely(vma->anon_vma))"
> check, you get a (AFAIK benign) data race.

Thanks for checking, Jann.

To double check"

"vma->anon_vma = anon_vma" is done w/o store-release, so the lockless
readers can't read anon_vma contents, is it correct? So none of them
really reading anon_vma, right?

Also, anon_vma_chain_link and num_active_vmas++ indeed happen after
assignment to anon_vma:

    /* page_table_lock to protect against threads */
    spin_lock(&mm->page_table_lock);
    if (likely(!vma->anon_vma)) {
        vma->anon_vma = anon_vma;
        anon_vma_chain_link(vma, avc, anon_vma);
        anon_vma->num_active_vmas++;
        allocated = NULL;
        avc = NULL;
    }
    spin_unlock(&mm->page_table_lock);

So the lockless readers that observe anon_vma!=NULL won't rely on
these invariants, right?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
  2026-01-14 17:05     ` Dmitry Vyukov
@ 2026-01-14 17:29       ` Jann Horn
  2026-01-14 17:48         ` Jann Horn
  0 siblings, 1 reply; 8+ messages in thread
From: Jann Horn @ 2026-01-14 17:29 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, Liam.Howlett, akpm, david, harry.yoo, linux-kernel,
	linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka

On Wed, Jan 14, 2026 at 6:06 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> On Wed, 14 Jan 2026 at 18:00, Jann Horn <jannh@google.com> wrote:
> > On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > On Wed, 14 Jan 2026 at 17:32, syzbot
> > > <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote:
> > > > ==================================================================
> > > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
> > > >
> > > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1:
> > > >  __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212
> > > >  __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673
> > > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> > [...]
> > > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0:
> > > >  __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667
> > > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> > [...]
> > > >
> > > > value changed: 0x0000000000000000 -> 0xffff888104ecca28
> > > >
> > > > Reported by Kernel Concurrency Sanitizer on:
> > > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G        W           syzkaller #0 PREEMPT(voluntary)
> > > > Tainted: [W]=WARN
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
> > > > ==================================================================
> > >
> > > Hi Harry,
> > >
> > > I see you've been debugging:
> > > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes
> > > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/
> > >
> > > Can that bug be caused by this data race?
> > > Below is an explanation by Gemini LLM as to why this race is harmful.
> > > Obviously take it with a grain of salt, but with my limited mm
> > > knowledge it does not look immediately wrong (re rmap invariant).
> > >
> > > However, now digging into details I see that this Lorenzo's patch
> > > also marked as fixing "KASAN: slab-use-after-free Read in
> > > folio_remove_rmap_ptes":
> > >
> > > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
> > > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/
> > >
> > > So perhaps the race is still benign (or points to another issue?)
> > >
> > > Here is what LLM said about the race:
> > > -----
> > >
> > > The bug report is actionable and points to a harmful data race in the Linux
> > > kernel's memory management subsystem, specifically in the handling of
> > > anonymous `hugetlb` mappings.
> >
> > This data race is not specific to hugetlb at all, and it isn't caused
> > by any recent changes. It's a longstanding thing in core MM, but it's
> > pretty benign as far as I know.
> >
> > Fundamentally, the field vma->anon_vma can be read while only holding
> > the mmap lock in read mode; and it can concurrently be changed from
> > NULL to non-NULL.
> >
> > One scenario to cause such a data race is to create a new anonymous
> > VMA, then trigger two concurrent page faults inside this VMA. Assume a
> > configuration with VMA locking disabled for simplicity, so that both
> > faults happen under the mmap lock in read mode. This will lead to two
> > concurrent calls to __vmf_anon_prepare()
> > (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623),
> > both threads only holding the mmap_lock in read mode.
> > __vmf_anon_prepare() is essentially this (from
> > https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623,
> > with VMA locking code removed):
> >
> > vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
> > {
> >         struct vm_area_struct *vma = vmf->vma;
> >         vm_fault_t ret = 0;
> >
> >         if (likely(vma->anon_vma))
> >                 return 0;
> >         [...]
> >         if (__anon_vma_prepare(vma))
> >                 ret = VM_FAULT_OOM;
> >         [...]
> >         return ret;
> > }
> >
> > int __anon_vma_prepare(struct vm_area_struct *vma)
> > {
> >         struct mm_struct *mm = vma->vm_mm;
> >         struct anon_vma *anon_vma, *allocated;
> >         struct anon_vma_chain *avc;
> >
> >         [...]
> >
> >         [... allocate stuff ...]
> >
> >         anon_vma_lock_write(anon_vma);
> >         /* page_table_lock to protect against threads */
> >         spin_lock(&mm->page_table_lock);
> >         if (likely(!vma->anon_vma)) {
> >                 vma->anon_vma = anon_vma;
> >                 [...]
> >         }
> >         spin_unlock(&mm->page_table_lock);
> >         anon_vma_unlock_write(anon_vma);
> >
> >         [... cleanup ...]
> >
> >         return 0;
> >
> >         [... error handling ...]
> > }
> >
> > So if one thread reaches the "vma->anon_vma = anon_vma" assignment
> > while the other thread is running the "if (likely(vma->anon_vma))"
> > check, you get a (AFAIK benign) data race.
>
> Thanks for checking, Jann.
>
> To double check"
>
> "vma->anon_vma = anon_vma" is done w/o store-release, so the lockless
> readers can't read anon_vma contents, is it correct? So none of them
> really reading anon_vma, right?

I think you are right that this should be using store-release;
searching around, I also mentioned this in
<https://lore.kernel.org/all/CAG48ez0qsAM-dkOUDetmNBSK4typ5t_FvMvtGiB7wQsP-G1jVg@mail.gmail.com/>:

| > +Note that there are some exceptions to this - the `anon_vma`
field is permitted
| > +to be written to under mmap read lock and is instead serialised
by the `struct
| > +mm_struct` field `page_table_lock`. In addition the `vm_mm` and all
|
| Hm, we really ought to add some smp_store_release() and READ_ONCE(),
| or something along those lines, around our ->anon_vma accesses...
| especially the "vma->anon_vma = anon_vma" assignment in
| __anon_vma_prepare() looks to me like, on architectures like arm64
| with write-write reordering, we could theoretically end up making a
| new anon_vma pointer visible to a concurrent page fault before the
| anon_vma has been initialized? Though I have no idea if that is
| practically possible, stuff would have to be reordered quite a bit for
| that to happen...

I just noticed that I tried fixing this back in 2023, I don't
remember why that didn't end up landing; the memory ordering was kind
of messy to think about:
<https://lore.kernel.org/all/20230726214103.3261108-4-jannh@google.com/>

> Also, anon_vma_chain_link and num_active_vmas++ indeed happen after
> assignment to anon_vma:
>
>     /* page_table_lock to protect against threads */
>     spin_lock(&mm->page_table_lock);
>     if (likely(!vma->anon_vma)) {
>         vma->anon_vma = anon_vma;
>         anon_vma_chain_link(vma, avc, anon_vma);
>         anon_vma->num_active_vmas++;
>         allocated = NULL;
>         avc = NULL;
>     }
>     spin_unlock(&mm->page_table_lock);
>
> So the lockless readers that observe anon_vma!=NULL won't rely on
> these invariants, right?

Yeah, that stuff should be sufficiently protected because of the anon_vma lock.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
  2026-01-14 17:29       ` Jann Horn
@ 2026-01-14 17:48         ` Jann Horn
  2026-01-14 18:02           ` Lorenzo Stoakes
  0 siblings, 1 reply; 8+ messages in thread
From: Jann Horn @ 2026-01-14 17:48 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, Liam.Howlett, akpm, david, harry.yoo, linux-kernel,
	linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka

On Wed, Jan 14, 2026 at 6:29 PM Jann Horn <jannh@google.com> wrote:
> On Wed, Jan 14, 2026 at 6:06 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > On Wed, 14 Jan 2026 at 18:00, Jann Horn <jannh@google.com> wrote:
> > > On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > > On Wed, 14 Jan 2026 at 17:32, syzbot
> > > > <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote:
> > > > > ==================================================================
> > > > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
> > > > >
> > > > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1:
> > > > >  __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212
> > > > >  __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673
> > > > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > > > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > > > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> > > [...]
> > > > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0:
> > > > >  __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667
> > > > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > > > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > > > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> > > [...]
> > > > >
> > > > > value changed: 0x0000000000000000 -> 0xffff888104ecca28
> > > > >
> > > > > Reported by Kernel Concurrency Sanitizer on:
> > > > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G        W           syzkaller #0 PREEMPT(voluntary)
> > > > > Tainted: [W]=WARN
> > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
> > > > > ==================================================================
> > > >
> > > > Hi Harry,
> > > >
> > > > I see you've been debugging:
> > > > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes
> > > > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/
> > > >
> > > > Can that bug be caused by this data race?
> > > > Below is an explanation by Gemini LLM as to why this race is harmful.
> > > > Obviously take it with a grain of salt, but with my limited mm
> > > > knowledge it does not look immediately wrong (re rmap invariant).
> > > >
> > > > However, now digging into details I see that this Lorenzo's patch
> > > > also marked as fixing "KASAN: slab-use-after-free Read in
> > > > folio_remove_rmap_ptes":
> > > >
> > > > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
> > > > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/
> > > >
> > > > So perhaps the race is still benign (or points to another issue?)
> > > >
> > > > Here is what LLM said about the race:
> > > > -----
> > > >
> > > > The bug report is actionable and points to a harmful data race in the Linux
> > > > kernel's memory management subsystem, specifically in the handling of
> > > > anonymous `hugetlb` mappings.
> > >
> > > This data race is not specific to hugetlb at all, and it isn't caused
> > > by any recent changes. It's a longstanding thing in core MM, but it's
> > > pretty benign as far as I know.
> > >
> > > Fundamentally, the field vma->anon_vma can be read while only holding
> > > the mmap lock in read mode; and it can concurrently be changed from
> > > NULL to non-NULL.
> > >
> > > One scenario to cause such a data race is to create a new anonymous
> > > VMA, then trigger two concurrent page faults inside this VMA. Assume a
> > > configuration with VMA locking disabled for simplicity, so that both
> > > faults happen under the mmap lock in read mode. This will lead to two
> > > concurrent calls to __vmf_anon_prepare()
> > > (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623),
> > > both threads only holding the mmap_lock in read mode.
> > > __vmf_anon_prepare() is essentially this (from
> > > https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623,
> > > with VMA locking code removed):
> > >
> > > vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
> > > {
> > >         struct vm_area_struct *vma = vmf->vma;
> > >         vm_fault_t ret = 0;
> > >
> > >         if (likely(vma->anon_vma))
> > >                 return 0;
> > >         [...]
> > >         if (__anon_vma_prepare(vma))
> > >                 ret = VM_FAULT_OOM;
> > >         [...]
> > >         return ret;
> > > }
> > >
> > > int __anon_vma_prepare(struct vm_area_struct *vma)
> > > {
> > >         struct mm_struct *mm = vma->vm_mm;
> > >         struct anon_vma *anon_vma, *allocated;
> > >         struct anon_vma_chain *avc;
> > >
> > >         [...]
> > >
> > >         [... allocate stuff ...]
> > >
> > >         anon_vma_lock_write(anon_vma);
> > >         /* page_table_lock to protect against threads */
> > >         spin_lock(&mm->page_table_lock);
> > >         if (likely(!vma->anon_vma)) {
> > >                 vma->anon_vma = anon_vma;
> > >                 [...]
> > >         }
> > >         spin_unlock(&mm->page_table_lock);
> > >         anon_vma_unlock_write(anon_vma);
> > >
> > >         [... cleanup ...]
> > >
> > >         return 0;
> > >
> > >         [... error handling ...]
> > > }
> > >
> > > So if one thread reaches the "vma->anon_vma = anon_vma" assignment
> > > while the other thread is running the "if (likely(vma->anon_vma))"
> > > check, you get a (AFAIK benign) data race.
> >
> > Thanks for checking, Jann.
> >
> > To double check"
> >
> > "vma->anon_vma = anon_vma" is done w/o store-release, so the lockless
> > readers can't read anon_vma contents, is it correct? So none of them
> > really reading anon_vma, right?
>
> I think you are right that this should be using store-release;
> searching around, I also mentioned this in
> <https://lore.kernel.org/all/CAG48ez0qsAM-dkOUDetmNBSK4typ5t_FvMvtGiB7wQsP-G1jVg@mail.gmail.com/>:
>
> | > +Note that there are some exceptions to this - the `anon_vma`
> field is permitted
> | > +to be written to under mmap read lock and is instead serialised
> by the `struct
> | > +mm_struct` field `page_table_lock`. In addition the `vm_mm` and all
> |
> | Hm, we really ought to add some smp_store_release() and READ_ONCE(),
> | or something along those lines, around our ->anon_vma accesses...
> | especially the "vma->anon_vma = anon_vma" assignment in
> | __anon_vma_prepare() looks to me like, on architectures like arm64
> | with write-write reordering, we could theoretically end up making a
> | new anon_vma pointer visible to a concurrent page fault before the
> | anon_vma has been initialized? Though I have no idea if that is
> | practically possible, stuff would have to be reordered quite a bit for
> | that to happen...
>
> I just noticed that I tried fixing this back in 2023, I don't
> remember why that didn't end up landing; the memory ordering was kind
> of messy to think about:
> <https://lore.kernel.org/all/20230726214103.3261108-4-jannh@google.com/>
>
> > Also, anon_vma_chain_link and num_active_vmas++ indeed happen after
> > assignment to anon_vma:
> >
> >     /* page_table_lock to protect against threads */
> >     spin_lock(&mm->page_table_lock);
> >     if (likely(!vma->anon_vma)) {
> >         vma->anon_vma = anon_vma;
> >         anon_vma_chain_link(vma, avc, anon_vma);
> >         anon_vma->num_active_vmas++;
> >         allocated = NULL;
> >         avc = NULL;
> >     }
> >     spin_unlock(&mm->page_table_lock);
> >
> > So the lockless readers that observe anon_vma!=NULL won't rely on
> > these invariants, right?
>
> Yeah, that stuff should be sufficiently protected because of the anon_vma lock.

Er, except it actually isn't entirely, as I noticed in that old patch I linked:

@@ -1072,7 +1071,15 @@ static int anon_vma_compatible(struct
vm_area_struct *a, struct vm_area_struct *
 static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old,
struct vm_area_struct *a, struct vm_area_struct *b)
 {
        if (anon_vma_compatible(a, b)) {
-               struct anon_vma *anon_vma = READ_ONCE(old->anon_vma);
+               /*
+                * Pairs with smp_store_release() in __anon_vma_prepare().
+                *
+                * We could get away with a READ_ONCE() here, but
+                * smp_load_acquire() ensures that the following
+                * list_is_singular() check on old->anon_vma_chain doesn't race
+                * with __anon_vma_prepare().
+                */
+               struct anon_vma *anon_vma = smp_load_acquire(&old->anon_vma);

                if (anon_vma && list_is_singular(&old->anon_vma_chain))
                        return anon_vma;

That list_is_singular(&old->anon_vma_chain) does plain loads on the
list_head, which can concurrently be modified by anon_vma_chain_link()
(which is called from __anon_vma_prepare()). I think that... probably
shouldn't cause any functional problems, but it is ugly.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
  2026-01-14 17:48         ` Jann Horn
@ 2026-01-14 18:02           ` Lorenzo Stoakes
  2026-01-14 18:23             ` Jann Horn
  0 siblings, 1 reply; 8+ messages in thread
From: Lorenzo Stoakes @ 2026-01-14 18:02 UTC (permalink / raw)
  To: Jann Horn
  Cc: Dmitry Vyukov, syzbot, Liam.Howlett, akpm, david, harry.yoo,
	linux-kernel, linux-mm, riel, syzkaller-bugs, vbabka

On Wed, Jan 14, 2026 at 06:48:37PM +0100, Jann Horn wrote:
> On Wed, Jan 14, 2026 at 6:29 PM Jann Horn <jannh@google.com> wrote:
> > On Wed, Jan 14, 2026 at 6:06 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > On Wed, 14 Jan 2026 at 18:00, Jann Horn <jannh@google.com> wrote:
> > > > On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > > > On Wed, 14 Jan 2026 at 17:32, syzbot
> > > > > <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote:
> > > > > > ==================================================================
> > > > > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
> > > > > >
> > > > > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1:
> > > > > >  __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212
> > > > > >  __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673
> > > > > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > > > > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > > > > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> > > > [...]
> > > > > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0:
> > > > > >  __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667
> > > > > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > > > > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > > > > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> > > > [...]
> > > > > >
> > > > > > value changed: 0x0000000000000000 -> 0xffff888104ecca28
> > > > > >
> > > > > > Reported by Kernel Concurrency Sanitizer on:
> > > > > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G        W           syzkaller #0 PREEMPT(voluntary)
> > > > > > Tainted: [W]=WARN
> > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
> > > > > > ==================================================================
> > > > >
> > > > > Hi Harry,
> > > > >
> > > > > I see you've been debugging:
> > > > > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes
> > > > > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/
> > > > >
> > > > > Can that bug be caused by this data race?
> > > > > Below is an explanation by Gemini LLM as to why this race is harmful.
> > > > > Obviously take it with a grain of salt, but with my limited mm
> > > > > knowledge it does not look immediately wrong (re rmap invariant).
> > > > >
> > > > > However, now digging into details I see that this Lorenzo's patch
> > > > > also marked as fixing "KASAN: slab-use-after-free Read in
> > > > > folio_remove_rmap_ptes":
> > > > >
> > > > > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
> > > > > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/
> > > > >
> > > > > So perhaps the race is still benign (or points to another issue?)
> > > > >
> > > > > Here is what LLM said about the race:
> > > > > -----
> > > > >
> > > > > The bug report is actionable and points to a harmful data race in the Linux
> > > > > kernel's memory management subsystem, specifically in the handling of
> > > > > anonymous `hugetlb` mappings.
> > > >
> > > > This data race is not specific to hugetlb at all, and it isn't caused
> > > > by any recent changes. It's a longstanding thing in core MM, but it's
> > > > pretty benign as far as I know.
> > > >
> > > > Fundamentally, the field vma->anon_vma can be read while only holding
> > > > the mmap lock in read mode; and it can concurrently be changed from
> > > > NULL to non-NULL.

Well isn't that what the page_table_lock is for...?

> > > >
> > > > One scenario to cause such a data race is to create a new anonymous
> > > > VMA, then trigger two concurrent page faults inside this VMA. Assume a
> > > > configuration with VMA locking disabled for simplicity, so that both
> > > > faults happen under the mmap lock in read mode. This will lead to two
> > > > concurrent calls to __vmf_anon_prepare()
> > > > (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623),
> > > > both threads only holding the mmap_lock in read mode.
> > > > __vmf_anon_prepare() is essentially this (from
> > > > https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623,
> > > > with VMA locking code removed):
> > > >
> > > > vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
> > > > {
> > > >         struct vm_area_struct *vma = vmf->vma;
> > > >         vm_fault_t ret = 0;
> > > >
> > > >         if (likely(vma->anon_vma))
> > > >                 return 0;
> > > >         [...]
> > > >         if (__anon_vma_prepare(vma))
> > > >                 ret = VM_FAULT_OOM;
> > > >         [...]
> > > >         return ret;
> > > > }
> > > >
> > > > int __anon_vma_prepare(struct vm_area_struct *vma)
> > > > {
> > > >         struct mm_struct *mm = vma->vm_mm;
> > > >         struct anon_vma *anon_vma, *allocated;
> > > >         struct anon_vma_chain *avc;
> > > >
> > > >         [...]
> > > >
> > > >         [... allocate stuff ...]
> > > >
> > > >         anon_vma_lock_write(anon_vma);
> > > >         /* page_table_lock to protect against threads */
> > > >         spin_lock(&mm->page_table_lock);
> > > >         if (likely(!vma->anon_vma)) {
> > > >                 vma->anon_vma = anon_vma;
> > > >                 [...]
> > > >         }
> > > >         spin_unlock(&mm->page_table_lock);
> > > >         anon_vma_unlock_write(anon_vma);
> > > >
> > > >         [... cleanup ...]
> > > >
> > > >         return 0;
> > > >
> > > >         [... error handling ...]
> > > > }
> > > >
> > > > So if one thread reaches the "vma->anon_vma = anon_vma" assignment
> > > > while the other thread is running the "if (likely(vma->anon_vma))"
> > > > check, you get a (AFAIK benign) data race.
> > >
> > > Thanks for checking, Jann.
> > >
> > > To double check"
> > >
> > > "vma->anon_vma = anon_vma" is done w/o store-release, so the lockless
> > > readers can't read anon_vma contents, is it correct? So none of them
> > > really reading anon_vma, right?
> >
> > I think you are right that this should be using store-release;
> > searching around, I also mentioned this in
> > <https://lore.kernel.org/all/CAG48ez0qsAM-dkOUDetmNBSK4typ5t_FvMvtGiB7wQsP-G1jVg@mail.gmail.com/>:
> >
> > | > +Note that there are some exceptions to this - the `anon_vma`
> > field is permitted
> > | > +to be written to under mmap read lock and is instead serialised
> > by the `struct
> > | > +mm_struct` field `page_table_lock`. In addition the `vm_mm` and all
> > |
> > | Hm, we really ought to add some smp_store_release() and READ_ONCE(),
> > | or something along those lines, around our ->anon_vma accesses...
> > | especially the "vma->anon_vma = anon_vma" assignment in
> > | __anon_vma_prepare() looks to me like, on architectures like arm64
> > | with write-write reordering, we could theoretically end up making a
> > | new anon_vma pointer visible to a concurrent page fault before the
> > | anon_vma has been initialized? Though I have no idea if that is
> > | practically possible, stuff would have to be reordered quite a bit for
> > | that to happen...

As far as the page fault is concerned it only really cares about whether it
exists or not, not whether it's initialised.

The operations that check/modify fields within the anon_vma are protected by the
anon rmap lock (my recent series takes advantage of this to avoid holding that
lock during AVC allocation for instance).

This lock also protects the interval tree.

> >
> > I just noticed that I tried fixing this back in 2023, I don't
> > remember why that didn't end up landing; the memory ordering was kind
> > of messy to think about:
> > <https://lore.kernel.org/all/20230726214103.3261108-4-jannh@google.com/>
> >
> > > Also, anon_vma_chain_link and num_active_vmas++ indeed happen after
> > > assignment to anon_vma:
> > >
> > >     /* page_table_lock to protect against threads */
> > >     spin_lock(&mm->page_table_lock);
> > >     if (likely(!vma->anon_vma)) {
> > >         vma->anon_vma = anon_vma;
> > >         anon_vma_chain_link(vma, avc, anon_vma);
> > >         anon_vma->num_active_vmas++;
> > >         allocated = NULL;
> > >         avc = NULL;
> > >     }
> > >     spin_unlock(&mm->page_table_lock);
> > >
> > > So the lockless readers that observe anon_vma!=NULL won't rely on
> > > these invariants, right?
> >
> > Yeah, that stuff should be sufficiently protected because of the anon_vma lock.
>
> Er, except it actually isn't entirely, as I noticed in that old patch I linked:
>
> @@ -1072,7 +1071,15 @@ static int anon_vma_compatible(struct
> vm_area_struct *a, struct vm_area_struct *
>  static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old,
> struct vm_area_struct *a, struct vm_area_struct *b)
>  {
>         if (anon_vma_compatible(a, b)) {
> -               struct anon_vma *anon_vma = READ_ONCE(old->anon_vma);
> +               /*
> +                * Pairs with smp_store_release() in __anon_vma_prepare().
> +                *
> +                * We could get away with a READ_ONCE() here, but
> +                * smp_load_acquire() ensures that the following
> +                * list_is_singular() check on old->anon_vma_chain doesn't race
> +                * with __anon_vma_prepare().
> +                */
> +               struct anon_vma *anon_vma = smp_load_acquire(&old->anon_vma);

Yeah I'm not sure this is really hugely important, as this being slightly wrong
only leads to very rarely having slightly less efficient lock scalability.

>
>                 if (anon_vma && list_is_singular(&old->anon_vma_chain))
>                         return anon_vma;
>
> That list_is_singular(&old->anon_vma_chain) does plain loads on the
> list_head, which can concurrently be modified by anon_vma_chain_link()

We're no longer using that directly as per my latest changes :)

But I don't think it really matters.

> (which is called from __anon_vma_prepare()). I think that... probably
> shouldn't cause any functional problems, but it is ugly.

But yeah this seems pretty benign.

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
  2026-01-14 18:02           ` Lorenzo Stoakes
@ 2026-01-14 18:23             ` Jann Horn
  0 siblings, 0 replies; 8+ messages in thread
From: Jann Horn @ 2026-01-14 18:23 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Dmitry Vyukov, syzbot, Liam.Howlett, akpm, david, harry.yoo,
	linux-kernel, linux-mm, riel, syzkaller-bugs, vbabka

On Wed, Jan 14, 2026 at 7:02 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> On Wed, Jan 14, 2026 at 06:48:37PM +0100, Jann Horn wrote:
> > On Wed, Jan 14, 2026 at 6:29 PM Jann Horn <jannh@google.com> wrote:
> > > On Wed, Jan 14, 2026 at 6:06 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > > On Wed, 14 Jan 2026 at 18:00, Jann Horn <jannh@google.com> wrote:
> > > > > On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > > > > On Wed, 14 Jan 2026 at 17:32, syzbot
> > > > > > <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote:
> > > > > > > ==================================================================
> > > > > > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
> > > > > > >
> > > > > > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1:
> > > > > > >  __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212
> > > > > > >  __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673
> > > > > > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > > > > > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > > > > > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> > > > > [...]
> > > > > > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0:
> > > > > > >  __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667
> > > > > > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > > > > > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > > > > > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> > > > > [...]
> > > > > > >
> > > > > > > value changed: 0x0000000000000000 -> 0xffff888104ecca28
> > > > > > >
> > > > > > > Reported by Kernel Concurrency Sanitizer on:
> > > > > > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G        W           syzkaller #0 PREEMPT(voluntary)
> > > > > > > Tainted: [W]=WARN
> > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
> > > > > > > ==================================================================
> > > > > >
> > > > > > Hi Harry,
> > > > > >
> > > > > > I see you've been debugging:
> > > > > > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes
> > > > > > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/
> > > > > >
> > > > > > Can that bug be caused by this data race?
> > > > > > Below is an explanation by Gemini LLM as to why this race is harmful.
> > > > > > Obviously take it with a grain of salt, but with my limited mm
> > > > > > knowledge it does not look immediately wrong (re rmap invariant).
> > > > > >
> > > > > > However, now digging into details I see that this Lorenzo's patch
> > > > > > also marked as fixing "KASAN: slab-use-after-free Read in
> > > > > > folio_remove_rmap_ptes":
> > > > > >
> > > > > > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
> > > > > > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/
> > > > > >
> > > > > > So perhaps the race is still benign (or points to another issue?)
> > > > > >
> > > > > > Here is what LLM said about the race:
> > > > > > -----
> > > > > >
> > > > > > The bug report is actionable and points to a harmful data race in the Linux
> > > > > > kernel's memory management subsystem, specifically in the handling of
> > > > > > anonymous `hugetlb` mappings.
> > > > >
> > > > > This data race is not specific to hugetlb at all, and it isn't caused
> > > > > by any recent changes. It's a longstanding thing in core MM, but it's
> > > > > pretty benign as far as I know.
> > > > >
> > > > > Fundamentally, the field vma->anon_vma can be read while only holding
> > > > > the mmap lock in read mode; and it can concurrently be changed from
> > > > > NULL to non-NULL.
>
> Well isn't that what the page_table_lock is for...?

The page_table_lock prevents writer-writer data races, but not
reader-writer data races. (It is only held by writers, not by
readers.)

> > > > >
> > > > > One scenario to cause such a data race is to create a new anonymous
> > > > > VMA, then trigger two concurrent page faults inside this VMA. Assume a
> > > > > configuration with VMA locking disabled for simplicity, so that both
> > > > > faults happen under the mmap lock in read mode. This will lead to two
> > > > > concurrent calls to __vmf_anon_prepare()
> > > > > (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623),
> > > > > both threads only holding the mmap_lock in read mode.
> > > > > __vmf_anon_prepare() is essentially this (from
> > > > > https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623,
> > > > > with VMA locking code removed):
> > > > >
> > > > > vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
> > > > > {
> > > > >         struct vm_area_struct *vma = vmf->vma;
> > > > >         vm_fault_t ret = 0;
> > > > >
> > > > >         if (likely(vma->anon_vma))
> > > > >                 return 0;
> > > > >         [...]
> > > > >         if (__anon_vma_prepare(vma))
> > > > >                 ret = VM_FAULT_OOM;
> > > > >         [...]
> > > > >         return ret;
> > > > > }
> > > > >
> > > > > int __anon_vma_prepare(struct vm_area_struct *vma)
> > > > > {
> > > > >         struct mm_struct *mm = vma->vm_mm;
> > > > >         struct anon_vma *anon_vma, *allocated;
> > > > >         struct anon_vma_chain *avc;
> > > > >
> > > > >         [...]
> > > > >
> > > > >         [... allocate stuff ...]
> > > > >
> > > > >         anon_vma_lock_write(anon_vma);
> > > > >         /* page_table_lock to protect against threads */
> > > > >         spin_lock(&mm->page_table_lock);
> > > > >         if (likely(!vma->anon_vma)) {
> > > > >                 vma->anon_vma = anon_vma;
> > > > >                 [...]
> > > > >         }
> > > > >         spin_unlock(&mm->page_table_lock);
> > > > >         anon_vma_unlock_write(anon_vma);
> > > > >
> > > > >         [... cleanup ...]
> > > > >
> > > > >         return 0;
> > > > >
> > > > >         [... error handling ...]
> > > > > }
> > > > >
> > > > > So if one thread reaches the "vma->anon_vma = anon_vma" assignment
> > > > > while the other thread is running the "if (likely(vma->anon_vma))"
> > > > > check, you get a (AFAIK benign) data race.
> > > >
> > > > Thanks for checking, Jann.
> > > >
> > > > To double check"
> > > >
> > > > "vma->anon_vma = anon_vma" is done w/o store-release, so the lockless
> > > > readers can't read anon_vma contents, is it correct? So none of them
> > > > really reading anon_vma, right?
> > >
> > > I think you are right that this should be using store-release;
> > > searching around, I also mentioned this in
> > > <https://lore.kernel.org/all/CAG48ez0qsAM-dkOUDetmNBSK4typ5t_FvMvtGiB7wQsP-G1jVg@mail.gmail.com/>:
> > >
> > > | > +Note that there are some exceptions to this - the `anon_vma`
> > > field is permitted
> > > | > +to be written to under mmap read lock and is instead serialised
> > > by the `struct
> > > | > +mm_struct` field `page_table_lock`. In addition the `vm_mm` and all
> > > |
> > > | Hm, we really ought to add some smp_store_release() and READ_ONCE(),
> > > | or something along those lines, around our ->anon_vma accesses...
> > > | especially the "vma->anon_vma = anon_vma" assignment in
> > > | __anon_vma_prepare() looks to me like, on architectures like arm64
> > > | with write-write reordering, we could theoretically end up making a
> > > | new anon_vma pointer visible to a concurrent page fault before the
> > > | anon_vma has been initialized? Though I have no idea if that is
> > > | practically possible, stuff would have to be reordered quite a bit for
> > > | that to happen...
>
> As far as the page fault is concerned it only really cares about whether it
> exists or not, not whether it's initialised.

Hmm, yeah, I'm not sure if anything in the page fault path actually
directly accesses the anon_vma. The page fault path does eventually
re-publish the anon_vma pointer with `WRITE_ONCE(folio->mapping,
(struct address_space *) anon_vma)` in __folio_set_anon() though,
which could then potentially allow a third thread to walk through
folio->mapping and observe the uninitialized anon_vma...

Looking at the situation on latest stable (v6.18.5), two racing faults
on _adjacent_ anonymous VMAs could also end up with one thread writing
->anon_vma while the other thread executes reusable_anon_vma(),
loading the pointer to that anon_vma and accessing its
->anon_vma_chain.

> The operations that check/modify fields within the anon_vma are protected by the
> anon rmap lock (my recent series takes advantage of this to avoid holding that
> lock during AVC allocation for instance).
>
> This lock also protects the interval tree.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-01-14 18:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-14 16:32 [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare syzbot
2026-01-14 16:42 ` Dmitry Vyukov
2026-01-14 16:59   ` Jann Horn
2026-01-14 17:05     ` Dmitry Vyukov
2026-01-14 17:29       ` Jann Horn
2026-01-14 17:48         ` Jann Horn
2026-01-14 18:02           ` Lorenzo Stoakes
2026-01-14 18:23             ` Jann Horn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox