* [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare @ 2026-01-14 16:32 syzbot 2026-01-14 16:42 ` Dmitry Vyukov 0 siblings, 1 reply; 8+ messages in thread From: syzbot @ 2026-01-14 16:32 UTC (permalink / raw) To: Liam.Howlett, akpm, david, harry.yoo, jannh, linux-kernel, linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka Hello, syzbot found the following issue on: HEAD commit: cfd4039213e7 Merge tag 'io_uring-6.19-20251208' of git://g.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1554d992580000 kernel config: https://syzkaller.appspot.com/x/.config?x=c3201432211be40f dashboard link: https://syzkaller.appspot.com/bug?extid=f5d897f5194d92aa1769 compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/9f556ae6e3c4/disk-cfd40392.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/efcf53c1d459/vmlinux-cfd40392.xz kernel image: https://storage.googleapis.com/syzbot-assets/858f42961336/bzImage-cfd40392.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com ================================================================== BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1: __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212 __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673 hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 do_user_addr_fault+0x3fe/0x1080 arch/x86/mm/fault.c:1387 handle_page_fault arch/x86/mm/fault.c:1476 [inline] exc_page_fault+0x62/0xa0 arch/x86/mm/fault.c:1532 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618 fault_in_readable+0xad/0x170 mm/gup.c:-1 fault_in_iov_iter_readable+0x129/0x210 lib/iov_iter.c:106 generic_perform_write+0x3cf/0x490 mm/filemap.c:4363 shmem_file_write_iter+0xc5/0xf0 mm/shmem.c:3490 new_sync_write fs/read_write.c:593 [inline] vfs_write+0x52a/0x960 fs/read_write.c:686 ksys_pwrite64 fs/read_write.c:793 [inline] __do_sys_pwrite64 fs/read_write.c:801 [inline] __se_sys_pwrite64 fs/read_write.c:798 [inline] __x64_sys_pwrite64+0xfd/0x150 fs/read_write.c:798 x64_sys_call+0x9f7/0x3000 arch/x86/include/generated/asm/syscalls_64.h:19 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd8/0x2a0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0: __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667 hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 faultin_page mm/gup.c:1126 [inline] __get_user_pages+0x1024/0x1ed0 mm/gup.c:1428 populate_vma_page_range mm/gup.c:1860 [inline] __mm_populate+0x243/0x3a0 mm/gup.c:1963 mm_populate include/linux/mm.h:3701 [inline] vm_mmap_pgoff+0x232/0x2e0 mm/util.c:586 ksys_mmap_pgoff+0x268/0x310 mm/mmap.c:604 x64_sys_call+0x16bb/0x3000 arch/x86/include/generated/asm/syscalls_64.h:10 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd8/0x2a0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f value changed: 0x0000000000000000 -> 0xffff888104ecca28 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G W syzkaller #0 PREEMPT(voluntary) Tainted: [W]=WARN Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 ================================================================== --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. If the report is already addressed, let syzbot know by replying with: #syz fix: exact-commit-title If you want to overwrite report's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard) If the report is a duplicate of another one, reply with: #syz dup: exact-subject-of-another-report If you want to undo deduplication, reply with: #syz undup ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare 2026-01-14 16:32 [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare syzbot @ 2026-01-14 16:42 ` Dmitry Vyukov 2026-01-14 16:59 ` Jann Horn 0 siblings, 1 reply; 8+ messages in thread From: Dmitry Vyukov @ 2026-01-14 16:42 UTC (permalink / raw) To: syzbot Cc: Liam.Howlett, akpm, david, harry.yoo, jannh, linux-kernel, linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka On Wed, 14 Jan 2026 at 17:32, syzbot <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote: > > Hello, > > syzbot found the following issue on: > > HEAD commit: cfd4039213e7 Merge tag 'io_uring-6.19-20251208' of git://g.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1554d992580000 > kernel config: https://syzkaller.appspot.com/x/.config?x=c3201432211be40f > dashboard link: https://syzkaller.appspot.com/bug?extid=f5d897f5194d92aa1769 > compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 > > Unfortunately, I don't have any reproducer for this issue yet. > > Downloadable assets: > disk image: https://storage.googleapis.com/syzbot-assets/9f556ae6e3c4/disk-cfd40392.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/efcf53c1d459/vmlinux-cfd40392.xz > kernel image: https://storage.googleapis.com/syzbot-assets/858f42961336/bzImage-cfd40392.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com > > ================================================================== > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1: > __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212 > __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673 > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 > do_user_addr_fault+0x3fe/0x1080 arch/x86/mm/fault.c:1387 > handle_page_fault arch/x86/mm/fault.c:1476 [inline] > exc_page_fault+0x62/0xa0 arch/x86/mm/fault.c:1532 > asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618 > fault_in_readable+0xad/0x170 mm/gup.c:-1 > fault_in_iov_iter_readable+0x129/0x210 lib/iov_iter.c:106 > generic_perform_write+0x3cf/0x490 mm/filemap.c:4363 > shmem_file_write_iter+0xc5/0xf0 mm/shmem.c:3490 > new_sync_write fs/read_write.c:593 [inline] > vfs_write+0x52a/0x960 fs/read_write.c:686 > ksys_pwrite64 fs/read_write.c:793 [inline] > __do_sys_pwrite64 fs/read_write.c:801 [inline] > __se_sys_pwrite64 fs/read_write.c:798 [inline] > __x64_sys_pwrite64+0xfd/0x150 fs/read_write.c:798 > x64_sys_call+0x9f7/0x3000 arch/x86/include/generated/asm/syscalls_64.h:19 > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > do_syscall_64+0xd8/0x2a0 arch/x86/entry/syscall_64.c:94 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0: > __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667 > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 > faultin_page mm/gup.c:1126 [inline] > __get_user_pages+0x1024/0x1ed0 mm/gup.c:1428 > populate_vma_page_range mm/gup.c:1860 [inline] > __mm_populate+0x243/0x3a0 mm/gup.c:1963 > mm_populate include/linux/mm.h:3701 [inline] > vm_mmap_pgoff+0x232/0x2e0 mm/util.c:586 > ksys_mmap_pgoff+0x268/0x310 mm/mmap.c:604 > x64_sys_call+0x16bb/0x3000 arch/x86/include/generated/asm/syscalls_64.h:10 > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > do_syscall_64+0xd8/0x2a0 arch/x86/entry/syscall_64.c:94 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > value changed: 0x0000000000000000 -> 0xffff888104ecca28 > > Reported by Kernel Concurrency Sanitizer on: > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G W syzkaller #0 PREEMPT(voluntary) > Tainted: [W]=WARN > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 > ================================================================== Hi Harry, I see you've been debugging: KASAN: slab-use-after-free Read in folio_remove_rmap_ptes https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/ Can that bug be caused by this data race? Below is an explanation by Gemini LLM as to why this race is harmful. Obviously take it with a grain of salt, but with my limited mm knowledge it does not look immediately wrong (re rmap invariant). However, now digging into details I see that this Lorenzo's patch also marked as fixing "KASAN: slab-use-after-free Read in folio_remove_rmap_ptes": mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/ So perhaps the race is still benign (or points to another issue?) Here is what LLM said about the race: ----- The bug report is actionable and points to a harmful data race in the Linux kernel's memory management subsystem, specifically in the handling of anonymous `hugetlb` mappings. **Analysis:** 1. **Race Location:** The data race occurs on the `vma->anon_vma` field of a `struct vm_area_struct`. * **Writer:** Task 13471 executes `__anon_vma_prepare` in `mm/rmap.c`. This function initializes the `anon_vma` for a VMA. It holds `mm->page_table_lock` and writes to `vma->anon_vma` (line 211 in the viewed source, corresponding to the report's `mm/rmap.c:212` area). * **Reader:** Task 13473 executes `__vmf_anon_prepare` in `mm/memory.c`. This function is an optimization wrapper that checks if `vma->anon_vma` is already set (line 3666/3667) to avoid the overhead of `__anon_vma_prepare`. This check is performed **without** holding `mm->page_table_lock`. 2. **Consistency:** The report is consistent. Both tasks are handling `hugetlb` page faults (`hugetlb_no_page` -> `__vmf_anon_prepare`). The writer stack shows it proceeded into `__anon_vma_prepare` (implying `vma->anon_vma` was NULL initially), while the reader stack shows it reading `vma->anon_vma`. The value change `0x0000000000000000 -> 0xffff888104ecca28` confirms initialization from NULL to a pointer. 3. **Harmfulness (Why it is not benign):** * In `__anon_vma_prepare`, the code currently initializes `vma->anon_vma` **before** linking the VMA to the `anon_vma` structure via `anon_vma_chain_link`. * ```c vma->anon_vma = anon_vma; anon_vma_chain_link(vma, avc, anon_vma); ``` * Because the reader (`__vmf_anon_prepare`) checks `vma->anon_vma` locklessly, it can see the non-NULL value before `anon_vma_chain_link` has completed (due to compiler/CPU reordering or simple preemption between the two statements). * If the reader proceeds, it assumes the `anon_vma` is fully ready. It then maps a page and sets `folio->mapping = anon_vma`. * However, if `anon_vma_chain_link` hasn't finished, the `anon_vma` (specifically its interval tree) does not yet contain the entry for this `vma`. * This breaks the reverse mapping (rmap) invariant. If the kernel subsequently tries to unmap or migrate this page (finding it via `folio->mapping`), `rmap_walk` will fail to find the VMA in the `anon_vma`'s interval tree. This can lead to pages being effectively pinned, migration failures, or in worst-case scenarios (like memory corruption handling or specific reclaim paths), logical errors where a page is assumed unmapped when it is not. 4. **Fix:** The fix requires enforcing ordering. `vma->anon_vma` should be set **after** `anon_vma_chain_link` is complete, and `smp_store_release` / `smp_load_acquire` (or equivalent barriers) should be used to ensure the reader observes the fully initialized state. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare 2026-01-14 16:42 ` Dmitry Vyukov @ 2026-01-14 16:59 ` Jann Horn 2026-01-14 17:05 ` Dmitry Vyukov 0 siblings, 1 reply; 8+ messages in thread From: Jann Horn @ 2026-01-14 16:59 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, Liam.Howlett, akpm, david, harry.yoo, linux-kernel, linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote: > On Wed, 14 Jan 2026 at 17:32, syzbot > <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote: > > ================================================================== > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare > > > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1: > > __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212 > > __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673 > > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 [...] > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0: > > __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667 > > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 [...] > > > > value changed: 0x0000000000000000 -> 0xffff888104ecca28 > > > > Reported by Kernel Concurrency Sanitizer on: > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G W syzkaller #0 PREEMPT(voluntary) > > Tainted: [W]=WARN > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 > > ================================================================== > > Hi Harry, > > I see you've been debugging: > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/ > > Can that bug be caused by this data race? > Below is an explanation by Gemini LLM as to why this race is harmful. > Obviously take it with a grain of salt, but with my limited mm > knowledge it does not look immediately wrong (re rmap invariant). > > However, now digging into details I see that this Lorenzo's patch > also marked as fixing "KASAN: slab-use-after-free Read in > folio_remove_rmap_ptes": > > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/ > > So perhaps the race is still benign (or points to another issue?) > > Here is what LLM said about the race: > ----- > > The bug report is actionable and points to a harmful data race in the Linux > kernel's memory management subsystem, specifically in the handling of > anonymous `hugetlb` mappings. This data race is not specific to hugetlb at all, and it isn't caused by any recent changes. It's a longstanding thing in core MM, but it's pretty benign as far as I know. Fundamentally, the field vma->anon_vma can be read while only holding the mmap lock in read mode; and it can concurrently be changed from NULL to non-NULL. One scenario to cause such a data race is to create a new anonymous VMA, then trigger two concurrent page faults inside this VMA. Assume a configuration with VMA locking disabled for simplicity, so that both faults happen under the mmap lock in read mode. This will lead to two concurrent calls to __vmf_anon_prepare() (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623), both threads only holding the mmap_lock in read mode. __vmf_anon_prepare() is essentially this (from https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623, with VMA locking code removed): vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; vm_fault_t ret = 0; if (likely(vma->anon_vma)) return 0; [...] if (__anon_vma_prepare(vma)) ret = VM_FAULT_OOM; [...] return ret; } int __anon_vma_prepare(struct vm_area_struct *vma) { struct mm_struct *mm = vma->vm_mm; struct anon_vma *anon_vma, *allocated; struct anon_vma_chain *avc; [...] [... allocate stuff ...] anon_vma_lock_write(anon_vma); /* page_table_lock to protect against threads */ spin_lock(&mm->page_table_lock); if (likely(!vma->anon_vma)) { vma->anon_vma = anon_vma; [...] } spin_unlock(&mm->page_table_lock); anon_vma_unlock_write(anon_vma); [... cleanup ...] return 0; [... error handling ...] } So if one thread reaches the "vma->anon_vma = anon_vma" assignment while the other thread is running the "if (likely(vma->anon_vma))" check, you get a (AFAIK benign) data race. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare 2026-01-14 16:59 ` Jann Horn @ 2026-01-14 17:05 ` Dmitry Vyukov 2026-01-14 17:29 ` Jann Horn 0 siblings, 1 reply; 8+ messages in thread From: Dmitry Vyukov @ 2026-01-14 17:05 UTC (permalink / raw) To: Jann Horn Cc: syzbot, Liam.Howlett, akpm, david, harry.yoo, linux-kernel, linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka On Wed, 14 Jan 2026 at 18:00, Jann Horn <jannh@google.com> wrote: > > On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Wed, 14 Jan 2026 at 17:32, syzbot > > <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote: > > > ================================================================== > > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare > > > > > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1: > > > __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212 > > > __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673 > > > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > > > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > > > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 > [...] > > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0: > > > __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667 > > > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > > > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > > > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 > [...] > > > > > > value changed: 0x0000000000000000 -> 0xffff888104ecca28 > > > > > > Reported by Kernel Concurrency Sanitizer on: > > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G W syzkaller #0 PREEMPT(voluntary) > > > Tainted: [W]=WARN > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 > > > ================================================================== > > > > Hi Harry, > > > > I see you've been debugging: > > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes > > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/ > > > > Can that bug be caused by this data race? > > Below is an explanation by Gemini LLM as to why this race is harmful. > > Obviously take it with a grain of salt, but with my limited mm > > knowledge it does not look immediately wrong (re rmap invariant). > > > > However, now digging into details I see that this Lorenzo's patch > > also marked as fixing "KASAN: slab-use-after-free Read in > > folio_remove_rmap_ptes": > > > > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge > > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/ > > > > So perhaps the race is still benign (or points to another issue?) > > > > Here is what LLM said about the race: > > ----- > > > > The bug report is actionable and points to a harmful data race in the Linux > > kernel's memory management subsystem, specifically in the handling of > > anonymous `hugetlb` mappings. > > This data race is not specific to hugetlb at all, and it isn't caused > by any recent changes. It's a longstanding thing in core MM, but it's > pretty benign as far as I know. > > Fundamentally, the field vma->anon_vma can be read while only holding > the mmap lock in read mode; and it can concurrently be changed from > NULL to non-NULL. > > One scenario to cause such a data race is to create a new anonymous > VMA, then trigger two concurrent page faults inside this VMA. Assume a > configuration with VMA locking disabled for simplicity, so that both > faults happen under the mmap lock in read mode. This will lead to two > concurrent calls to __vmf_anon_prepare() > (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623), > both threads only holding the mmap_lock in read mode. > __vmf_anon_prepare() is essentially this (from > https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623, > with VMA locking code removed): > > vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > vm_fault_t ret = 0; > > if (likely(vma->anon_vma)) > return 0; > [...] > if (__anon_vma_prepare(vma)) > ret = VM_FAULT_OOM; > [...] > return ret; > } > > int __anon_vma_prepare(struct vm_area_struct *vma) > { > struct mm_struct *mm = vma->vm_mm; > struct anon_vma *anon_vma, *allocated; > struct anon_vma_chain *avc; > > [...] > > [... allocate stuff ...] > > anon_vma_lock_write(anon_vma); > /* page_table_lock to protect against threads */ > spin_lock(&mm->page_table_lock); > if (likely(!vma->anon_vma)) { > vma->anon_vma = anon_vma; > [...] > } > spin_unlock(&mm->page_table_lock); > anon_vma_unlock_write(anon_vma); > > [... cleanup ...] > > return 0; > > [... error handling ...] > } > > So if one thread reaches the "vma->anon_vma = anon_vma" assignment > while the other thread is running the "if (likely(vma->anon_vma))" > check, you get a (AFAIK benign) data race. Thanks for checking, Jann. To double check" "vma->anon_vma = anon_vma" is done w/o store-release, so the lockless readers can't read anon_vma contents, is it correct? So none of them really reading anon_vma, right? Also, anon_vma_chain_link and num_active_vmas++ indeed happen after assignment to anon_vma: /* page_table_lock to protect against threads */ spin_lock(&mm->page_table_lock); if (likely(!vma->anon_vma)) { vma->anon_vma = anon_vma; anon_vma_chain_link(vma, avc, anon_vma); anon_vma->num_active_vmas++; allocated = NULL; avc = NULL; } spin_unlock(&mm->page_table_lock); So the lockless readers that observe anon_vma!=NULL won't rely on these invariants, right? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare 2026-01-14 17:05 ` Dmitry Vyukov @ 2026-01-14 17:29 ` Jann Horn 2026-01-14 17:48 ` Jann Horn 0 siblings, 1 reply; 8+ messages in thread From: Jann Horn @ 2026-01-14 17:29 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, Liam.Howlett, akpm, david, harry.yoo, linux-kernel, linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka On Wed, Jan 14, 2026 at 6:06 PM Dmitry Vyukov <dvyukov@google.com> wrote: > On Wed, 14 Jan 2026 at 18:00, Jann Horn <jannh@google.com> wrote: > > On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > > On Wed, 14 Jan 2026 at 17:32, syzbot > > > <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote: > > > > ================================================================== > > > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare > > > > > > > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1: > > > > __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212 > > > > __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673 > > > > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > > > > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > > > > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 > > [...] > > > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0: > > > > __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667 > > > > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > > > > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > > > > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 > > [...] > > > > > > > > value changed: 0x0000000000000000 -> 0xffff888104ecca28 > > > > > > > > Reported by Kernel Concurrency Sanitizer on: > > > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G W syzkaller #0 PREEMPT(voluntary) > > > > Tainted: [W]=WARN > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 > > > > ================================================================== > > > > > > Hi Harry, > > > > > > I see you've been debugging: > > > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes > > > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/ > > > > > > Can that bug be caused by this data race? > > > Below is an explanation by Gemini LLM as to why this race is harmful. > > > Obviously take it with a grain of salt, but with my limited mm > > > knowledge it does not look immediately wrong (re rmap invariant). > > > > > > However, now digging into details I see that this Lorenzo's patch > > > also marked as fixing "KASAN: slab-use-after-free Read in > > > folio_remove_rmap_ptes": > > > > > > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge > > > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/ > > > > > > So perhaps the race is still benign (or points to another issue?) > > > > > > Here is what LLM said about the race: > > > ----- > > > > > > The bug report is actionable and points to a harmful data race in the Linux > > > kernel's memory management subsystem, specifically in the handling of > > > anonymous `hugetlb` mappings. > > > > This data race is not specific to hugetlb at all, and it isn't caused > > by any recent changes. It's a longstanding thing in core MM, but it's > > pretty benign as far as I know. > > > > Fundamentally, the field vma->anon_vma can be read while only holding > > the mmap lock in read mode; and it can concurrently be changed from > > NULL to non-NULL. > > > > One scenario to cause such a data race is to create a new anonymous > > VMA, then trigger two concurrent page faults inside this VMA. Assume a > > configuration with VMA locking disabled for simplicity, so that both > > faults happen under the mmap lock in read mode. This will lead to two > > concurrent calls to __vmf_anon_prepare() > > (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623), > > both threads only holding the mmap_lock in read mode. > > __vmf_anon_prepare() is essentially this (from > > https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623, > > with VMA locking code removed): > > > > vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf) > > { > > struct vm_area_struct *vma = vmf->vma; > > vm_fault_t ret = 0; > > > > if (likely(vma->anon_vma)) > > return 0; > > [...] > > if (__anon_vma_prepare(vma)) > > ret = VM_FAULT_OOM; > > [...] > > return ret; > > } > > > > int __anon_vma_prepare(struct vm_area_struct *vma) > > { > > struct mm_struct *mm = vma->vm_mm; > > struct anon_vma *anon_vma, *allocated; > > struct anon_vma_chain *avc; > > > > [...] > > > > [... allocate stuff ...] > > > > anon_vma_lock_write(anon_vma); > > /* page_table_lock to protect against threads */ > > spin_lock(&mm->page_table_lock); > > if (likely(!vma->anon_vma)) { > > vma->anon_vma = anon_vma; > > [...] > > } > > spin_unlock(&mm->page_table_lock); > > anon_vma_unlock_write(anon_vma); > > > > [... cleanup ...] > > > > return 0; > > > > [... error handling ...] > > } > > > > So if one thread reaches the "vma->anon_vma = anon_vma" assignment > > while the other thread is running the "if (likely(vma->anon_vma))" > > check, you get a (AFAIK benign) data race. > > Thanks for checking, Jann. > > To double check" > > "vma->anon_vma = anon_vma" is done w/o store-release, so the lockless > readers can't read anon_vma contents, is it correct? So none of them > really reading anon_vma, right? I think you are right that this should be using store-release; searching around, I also mentioned this in <https://lore.kernel.org/all/CAG48ez0qsAM-dkOUDetmNBSK4typ5t_FvMvtGiB7wQsP-G1jVg@mail.gmail.com/>: | > +Note that there are some exceptions to this - the `anon_vma` field is permitted | > +to be written to under mmap read lock and is instead serialised by the `struct | > +mm_struct` field `page_table_lock`. In addition the `vm_mm` and all | | Hm, we really ought to add some smp_store_release() and READ_ONCE(), | or something along those lines, around our ->anon_vma accesses... | especially the "vma->anon_vma = anon_vma" assignment in | __anon_vma_prepare() looks to me like, on architectures like arm64 | with write-write reordering, we could theoretically end up making a | new anon_vma pointer visible to a concurrent page fault before the | anon_vma has been initialized? Though I have no idea if that is | practically possible, stuff would have to be reordered quite a bit for | that to happen... I just noticed that I tried fixing this back in 2023, I don't remember why that didn't end up landing; the memory ordering was kind of messy to think about: <https://lore.kernel.org/all/20230726214103.3261108-4-jannh@google.com/> > Also, anon_vma_chain_link and num_active_vmas++ indeed happen after > assignment to anon_vma: > > /* page_table_lock to protect against threads */ > spin_lock(&mm->page_table_lock); > if (likely(!vma->anon_vma)) { > vma->anon_vma = anon_vma; > anon_vma_chain_link(vma, avc, anon_vma); > anon_vma->num_active_vmas++; > allocated = NULL; > avc = NULL; > } > spin_unlock(&mm->page_table_lock); > > So the lockless readers that observe anon_vma!=NULL won't rely on > these invariants, right? Yeah, that stuff should be sufficiently protected because of the anon_vma lock. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare 2026-01-14 17:29 ` Jann Horn @ 2026-01-14 17:48 ` Jann Horn 2026-01-14 18:02 ` Lorenzo Stoakes 0 siblings, 1 reply; 8+ messages in thread From: Jann Horn @ 2026-01-14 17:48 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, Liam.Howlett, akpm, david, harry.yoo, linux-kernel, linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka On Wed, Jan 14, 2026 at 6:29 PM Jann Horn <jannh@google.com> wrote: > On Wed, Jan 14, 2026 at 6:06 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Wed, 14 Jan 2026 at 18:00, Jann Horn <jannh@google.com> wrote: > > > On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Wed, 14 Jan 2026 at 17:32, syzbot > > > > <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote: > > > > > ================================================================== > > > > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare > > > > > > > > > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1: > > > > > __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212 > > > > > __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673 > > > > > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > > > > > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > > > > > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 > > > [...] > > > > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0: > > > > > __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667 > > > > > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > > > > > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > > > > > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 > > > [...] > > > > > > > > > > value changed: 0x0000000000000000 -> 0xffff888104ecca28 > > > > > > > > > > Reported by Kernel Concurrency Sanitizer on: > > > > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G W syzkaller #0 PREEMPT(voluntary) > > > > > Tainted: [W]=WARN > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 > > > > > ================================================================== > > > > > > > > Hi Harry, > > > > > > > > I see you've been debugging: > > > > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes > > > > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/ > > > > > > > > Can that bug be caused by this data race? > > > > Below is an explanation by Gemini LLM as to why this race is harmful. > > > > Obviously take it with a grain of salt, but with my limited mm > > > > knowledge it does not look immediately wrong (re rmap invariant). > > > > > > > > However, now digging into details I see that this Lorenzo's patch > > > > also marked as fixing "KASAN: slab-use-after-free Read in > > > > folio_remove_rmap_ptes": > > > > > > > > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge > > > > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/ > > > > > > > > So perhaps the race is still benign (or points to another issue?) > > > > > > > > Here is what LLM said about the race: > > > > ----- > > > > > > > > The bug report is actionable and points to a harmful data race in the Linux > > > > kernel's memory management subsystem, specifically in the handling of > > > > anonymous `hugetlb` mappings. > > > > > > This data race is not specific to hugetlb at all, and it isn't caused > > > by any recent changes. It's a longstanding thing in core MM, but it's > > > pretty benign as far as I know. > > > > > > Fundamentally, the field vma->anon_vma can be read while only holding > > > the mmap lock in read mode; and it can concurrently be changed from > > > NULL to non-NULL. > > > > > > One scenario to cause such a data race is to create a new anonymous > > > VMA, then trigger two concurrent page faults inside this VMA. Assume a > > > configuration with VMA locking disabled for simplicity, so that both > > > faults happen under the mmap lock in read mode. This will lead to two > > > concurrent calls to __vmf_anon_prepare() > > > (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623), > > > both threads only holding the mmap_lock in read mode. > > > __vmf_anon_prepare() is essentially this (from > > > https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623, > > > with VMA locking code removed): > > > > > > vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf) > > > { > > > struct vm_area_struct *vma = vmf->vma; > > > vm_fault_t ret = 0; > > > > > > if (likely(vma->anon_vma)) > > > return 0; > > > [...] > > > if (__anon_vma_prepare(vma)) > > > ret = VM_FAULT_OOM; > > > [...] > > > return ret; > > > } > > > > > > int __anon_vma_prepare(struct vm_area_struct *vma) > > > { > > > struct mm_struct *mm = vma->vm_mm; > > > struct anon_vma *anon_vma, *allocated; > > > struct anon_vma_chain *avc; > > > > > > [...] > > > > > > [... allocate stuff ...] > > > > > > anon_vma_lock_write(anon_vma); > > > /* page_table_lock to protect against threads */ > > > spin_lock(&mm->page_table_lock); > > > if (likely(!vma->anon_vma)) { > > > vma->anon_vma = anon_vma; > > > [...] > > > } > > > spin_unlock(&mm->page_table_lock); > > > anon_vma_unlock_write(anon_vma); > > > > > > [... cleanup ...] > > > > > > return 0; > > > > > > [... error handling ...] > > > } > > > > > > So if one thread reaches the "vma->anon_vma = anon_vma" assignment > > > while the other thread is running the "if (likely(vma->anon_vma))" > > > check, you get a (AFAIK benign) data race. > > > > Thanks for checking, Jann. > > > > To double check" > > > > "vma->anon_vma = anon_vma" is done w/o store-release, so the lockless > > readers can't read anon_vma contents, is it correct? So none of them > > really reading anon_vma, right? > > I think you are right that this should be using store-release; > searching around, I also mentioned this in > <https://lore.kernel.org/all/CAG48ez0qsAM-dkOUDetmNBSK4typ5t_FvMvtGiB7wQsP-G1jVg@mail.gmail.com/>: > > | > +Note that there are some exceptions to this - the `anon_vma` > field is permitted > | > +to be written to under mmap read lock and is instead serialised > by the `struct > | > +mm_struct` field `page_table_lock`. In addition the `vm_mm` and all > | > | Hm, we really ought to add some smp_store_release() and READ_ONCE(), > | or something along those lines, around our ->anon_vma accesses... > | especially the "vma->anon_vma = anon_vma" assignment in > | __anon_vma_prepare() looks to me like, on architectures like arm64 > | with write-write reordering, we could theoretically end up making a > | new anon_vma pointer visible to a concurrent page fault before the > | anon_vma has been initialized? Though I have no idea if that is > | practically possible, stuff would have to be reordered quite a bit for > | that to happen... > > I just noticed that I tried fixing this back in 2023, I don't > remember why that didn't end up landing; the memory ordering was kind > of messy to think about: > <https://lore.kernel.org/all/20230726214103.3261108-4-jannh@google.com/> > > > Also, anon_vma_chain_link and num_active_vmas++ indeed happen after > > assignment to anon_vma: > > > > /* page_table_lock to protect against threads */ > > spin_lock(&mm->page_table_lock); > > if (likely(!vma->anon_vma)) { > > vma->anon_vma = anon_vma; > > anon_vma_chain_link(vma, avc, anon_vma); > > anon_vma->num_active_vmas++; > > allocated = NULL; > > avc = NULL; > > } > > spin_unlock(&mm->page_table_lock); > > > > So the lockless readers that observe anon_vma!=NULL won't rely on > > these invariants, right? > > Yeah, that stuff should be sufficiently protected because of the anon_vma lock. Er, except it actually isn't entirely, as I noticed in that old patch I linked: @@ -1072,7 +1071,15 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct * static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old, struct vm_area_struct *a, struct vm_area_struct *b) { if (anon_vma_compatible(a, b)) { - struct anon_vma *anon_vma = READ_ONCE(old->anon_vma); + /* + * Pairs with smp_store_release() in __anon_vma_prepare(). + * + * We could get away with a READ_ONCE() here, but + * smp_load_acquire() ensures that the following + * list_is_singular() check on old->anon_vma_chain doesn't race + * with __anon_vma_prepare(). + */ + struct anon_vma *anon_vma = smp_load_acquire(&old->anon_vma); if (anon_vma && list_is_singular(&old->anon_vma_chain)) return anon_vma; That list_is_singular(&old->anon_vma_chain) does plain loads on the list_head, which can concurrently be modified by anon_vma_chain_link() (which is called from __anon_vma_prepare()). I think that... probably shouldn't cause any functional problems, but it is ugly. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare 2026-01-14 17:48 ` Jann Horn @ 2026-01-14 18:02 ` Lorenzo Stoakes 2026-01-14 18:23 ` Jann Horn 0 siblings, 1 reply; 8+ messages in thread From: Lorenzo Stoakes @ 2026-01-14 18:02 UTC (permalink / raw) To: Jann Horn Cc: Dmitry Vyukov, syzbot, Liam.Howlett, akpm, david, harry.yoo, linux-kernel, linux-mm, riel, syzkaller-bugs, vbabka On Wed, Jan 14, 2026 at 06:48:37PM +0100, Jann Horn wrote: > On Wed, Jan 14, 2026 at 6:29 PM Jann Horn <jannh@google.com> wrote: > > On Wed, Jan 14, 2026 at 6:06 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > > On Wed, 14 Jan 2026 at 18:00, Jann Horn <jannh@google.com> wrote: > > > > On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > On Wed, 14 Jan 2026 at 17:32, syzbot > > > > > <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote: > > > > > > ================================================================== > > > > > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare > > > > > > > > > > > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1: > > > > > > __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212 > > > > > > __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673 > > > > > > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > > > > > > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > > > > > > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 > > > > [...] > > > > > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0: > > > > > > __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667 > > > > > > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > > > > > > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > > > > > > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 > > > > [...] > > > > > > > > > > > > value changed: 0x0000000000000000 -> 0xffff888104ecca28 > > > > > > > > > > > > Reported by Kernel Concurrency Sanitizer on: > > > > > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G W syzkaller #0 PREEMPT(voluntary) > > > > > > Tainted: [W]=WARN > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 > > > > > > ================================================================== > > > > > > > > > > Hi Harry, > > > > > > > > > > I see you've been debugging: > > > > > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes > > > > > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/ > > > > > > > > > > Can that bug be caused by this data race? > > > > > Below is an explanation by Gemini LLM as to why this race is harmful. > > > > > Obviously take it with a grain of salt, but with my limited mm > > > > > knowledge it does not look immediately wrong (re rmap invariant). > > > > > > > > > > However, now digging into details I see that this Lorenzo's patch > > > > > also marked as fixing "KASAN: slab-use-after-free Read in > > > > > folio_remove_rmap_ptes": > > > > > > > > > > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge > > > > > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/ > > > > > > > > > > So perhaps the race is still benign (or points to another issue?) > > > > > > > > > > Here is what LLM said about the race: > > > > > ----- > > > > > > > > > > The bug report is actionable and points to a harmful data race in the Linux > > > > > kernel's memory management subsystem, specifically in the handling of > > > > > anonymous `hugetlb` mappings. > > > > > > > > This data race is not specific to hugetlb at all, and it isn't caused > > > > by any recent changes. It's a longstanding thing in core MM, but it's > > > > pretty benign as far as I know. > > > > > > > > Fundamentally, the field vma->anon_vma can be read while only holding > > > > the mmap lock in read mode; and it can concurrently be changed from > > > > NULL to non-NULL. Well isn't that what the page_table_lock is for...? > > > > > > > > One scenario to cause such a data race is to create a new anonymous > > > > VMA, then trigger two concurrent page faults inside this VMA. Assume a > > > > configuration with VMA locking disabled for simplicity, so that both > > > > faults happen under the mmap lock in read mode. This will lead to two > > > > concurrent calls to __vmf_anon_prepare() > > > > (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623), > > > > both threads only holding the mmap_lock in read mode. > > > > __vmf_anon_prepare() is essentially this (from > > > > https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623, > > > > with VMA locking code removed): > > > > > > > > vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf) > > > > { > > > > struct vm_area_struct *vma = vmf->vma; > > > > vm_fault_t ret = 0; > > > > > > > > if (likely(vma->anon_vma)) > > > > return 0; > > > > [...] > > > > if (__anon_vma_prepare(vma)) > > > > ret = VM_FAULT_OOM; > > > > [...] > > > > return ret; > > > > } > > > > > > > > int __anon_vma_prepare(struct vm_area_struct *vma) > > > > { > > > > struct mm_struct *mm = vma->vm_mm; > > > > struct anon_vma *anon_vma, *allocated; > > > > struct anon_vma_chain *avc; > > > > > > > > [...] > > > > > > > > [... allocate stuff ...] > > > > > > > > anon_vma_lock_write(anon_vma); > > > > /* page_table_lock to protect against threads */ > > > > spin_lock(&mm->page_table_lock); > > > > if (likely(!vma->anon_vma)) { > > > > vma->anon_vma = anon_vma; > > > > [...] > > > > } > > > > spin_unlock(&mm->page_table_lock); > > > > anon_vma_unlock_write(anon_vma); > > > > > > > > [... cleanup ...] > > > > > > > > return 0; > > > > > > > > [... error handling ...] > > > > } > > > > > > > > So if one thread reaches the "vma->anon_vma = anon_vma" assignment > > > > while the other thread is running the "if (likely(vma->anon_vma))" > > > > check, you get a (AFAIK benign) data race. > > > > > > Thanks for checking, Jann. > > > > > > To double check" > > > > > > "vma->anon_vma = anon_vma" is done w/o store-release, so the lockless > > > readers can't read anon_vma contents, is it correct? So none of them > > > really reading anon_vma, right? > > > > I think you are right that this should be using store-release; > > searching around, I also mentioned this in > > <https://lore.kernel.org/all/CAG48ez0qsAM-dkOUDetmNBSK4typ5t_FvMvtGiB7wQsP-G1jVg@mail.gmail.com/>: > > > > | > +Note that there are some exceptions to this - the `anon_vma` > > field is permitted > > | > +to be written to under mmap read lock and is instead serialised > > by the `struct > > | > +mm_struct` field `page_table_lock`. In addition the `vm_mm` and all > > | > > | Hm, we really ought to add some smp_store_release() and READ_ONCE(), > > | or something along those lines, around our ->anon_vma accesses... > > | especially the "vma->anon_vma = anon_vma" assignment in > > | __anon_vma_prepare() looks to me like, on architectures like arm64 > > | with write-write reordering, we could theoretically end up making a > > | new anon_vma pointer visible to a concurrent page fault before the > > | anon_vma has been initialized? Though I have no idea if that is > > | practically possible, stuff would have to be reordered quite a bit for > > | that to happen... As far as the page fault is concerned it only really cares about whether it exists or not, not whether it's initialised. The operations that check/modify fields within the anon_vma are protected by the anon rmap lock (my recent series takes advantage of this to avoid holding that lock during AVC allocation for instance). This lock also protects the interval tree. > > > > I just noticed that I tried fixing this back in 2023, I don't > > remember why that didn't end up landing; the memory ordering was kind > > of messy to think about: > > <https://lore.kernel.org/all/20230726214103.3261108-4-jannh@google.com/> > > > > > Also, anon_vma_chain_link and num_active_vmas++ indeed happen after > > > assignment to anon_vma: > > > > > > /* page_table_lock to protect against threads */ > > > spin_lock(&mm->page_table_lock); > > > if (likely(!vma->anon_vma)) { > > > vma->anon_vma = anon_vma; > > > anon_vma_chain_link(vma, avc, anon_vma); > > > anon_vma->num_active_vmas++; > > > allocated = NULL; > > > avc = NULL; > > > } > > > spin_unlock(&mm->page_table_lock); > > > > > > So the lockless readers that observe anon_vma!=NULL won't rely on > > > these invariants, right? > > > > Yeah, that stuff should be sufficiently protected because of the anon_vma lock. > > Er, except it actually isn't entirely, as I noticed in that old patch I linked: > > @@ -1072,7 +1071,15 @@ static int anon_vma_compatible(struct > vm_area_struct *a, struct vm_area_struct * > static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old, > struct vm_area_struct *a, struct vm_area_struct *b) > { > if (anon_vma_compatible(a, b)) { > - struct anon_vma *anon_vma = READ_ONCE(old->anon_vma); > + /* > + * Pairs with smp_store_release() in __anon_vma_prepare(). > + * > + * We could get away with a READ_ONCE() here, but > + * smp_load_acquire() ensures that the following > + * list_is_singular() check on old->anon_vma_chain doesn't race > + * with __anon_vma_prepare(). > + */ > + struct anon_vma *anon_vma = smp_load_acquire(&old->anon_vma); Yeah I'm not sure this is really hugely important, as this being slightly wrong only leads to very rarely having slightly less efficient lock scalability. > > if (anon_vma && list_is_singular(&old->anon_vma_chain)) > return anon_vma; > > That list_is_singular(&old->anon_vma_chain) does plain loads on the > list_head, which can concurrently be modified by anon_vma_chain_link() We're no longer using that directly as per my latest changes :) But I don't think it really matters. > (which is called from __anon_vma_prepare()). I think that... probably > shouldn't cause any functional problems, but it is ugly. But yeah this seems pretty benign. Thanks, Lorenzo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare 2026-01-14 18:02 ` Lorenzo Stoakes @ 2026-01-14 18:23 ` Jann Horn 0 siblings, 0 replies; 8+ messages in thread From: Jann Horn @ 2026-01-14 18:23 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Dmitry Vyukov, syzbot, Liam.Howlett, akpm, david, harry.yoo, linux-kernel, linux-mm, riel, syzkaller-bugs, vbabka On Wed, Jan 14, 2026 at 7:02 PM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > On Wed, Jan 14, 2026 at 06:48:37PM +0100, Jann Horn wrote: > > On Wed, Jan 14, 2026 at 6:29 PM Jann Horn <jannh@google.com> wrote: > > > On Wed, Jan 14, 2026 at 6:06 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Wed, 14 Jan 2026 at 18:00, Jann Horn <jannh@google.com> wrote: > > > > > On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > > On Wed, 14 Jan 2026 at 17:32, syzbot > > > > > > <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote: > > > > > > > ================================================================== > > > > > > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare > > > > > > > > > > > > > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1: > > > > > > > __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212 > > > > > > > __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673 > > > > > > > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > > > > > > > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > > > > > > > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 > > > > > [...] > > > > > > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0: > > > > > > > __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667 > > > > > > > hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782 > > > > > > > hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1 > > > > > > > handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578 > > > > > [...] > > > > > > > > > > > > > > value changed: 0x0000000000000000 -> 0xffff888104ecca28 > > > > > > > > > > > > > > Reported by Kernel Concurrency Sanitizer on: > > > > > > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G W syzkaller #0 PREEMPT(voluntary) > > > > > > > Tainted: [W]=WARN > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 > > > > > > > ================================================================== > > > > > > > > > > > > Hi Harry, > > > > > > > > > > > > I see you've been debugging: > > > > > > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes > > > > > > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/ > > > > > > > > > > > > Can that bug be caused by this data race? > > > > > > Below is an explanation by Gemini LLM as to why this race is harmful. > > > > > > Obviously take it with a grain of salt, but with my limited mm > > > > > > knowledge it does not look immediately wrong (re rmap invariant). > > > > > > > > > > > > However, now digging into details I see that this Lorenzo's patch > > > > > > also marked as fixing "KASAN: slab-use-after-free Read in > > > > > > folio_remove_rmap_ptes": > > > > > > > > > > > > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge > > > > > > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/ > > > > > > > > > > > > So perhaps the race is still benign (or points to another issue?) > > > > > > > > > > > > Here is what LLM said about the race: > > > > > > ----- > > > > > > > > > > > > The bug report is actionable and points to a harmful data race in the Linux > > > > > > kernel's memory management subsystem, specifically in the handling of > > > > > > anonymous `hugetlb` mappings. > > > > > > > > > > This data race is not specific to hugetlb at all, and it isn't caused > > > > > by any recent changes. It's a longstanding thing in core MM, but it's > > > > > pretty benign as far as I know. > > > > > > > > > > Fundamentally, the field vma->anon_vma can be read while only holding > > > > > the mmap lock in read mode; and it can concurrently be changed from > > > > > NULL to non-NULL. > > Well isn't that what the page_table_lock is for...? The page_table_lock prevents writer-writer data races, but not reader-writer data races. (It is only held by writers, not by readers.) > > > > > > > > > > One scenario to cause such a data race is to create a new anonymous > > > > > VMA, then trigger two concurrent page faults inside this VMA. Assume a > > > > > configuration with VMA locking disabled for simplicity, so that both > > > > > faults happen under the mmap lock in read mode. This will lead to two > > > > > concurrent calls to __vmf_anon_prepare() > > > > > (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623), > > > > > both threads only holding the mmap_lock in read mode. > > > > > __vmf_anon_prepare() is essentially this (from > > > > > https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623, > > > > > with VMA locking code removed): > > > > > > > > > > vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf) > > > > > { > > > > > struct vm_area_struct *vma = vmf->vma; > > > > > vm_fault_t ret = 0; > > > > > > > > > > if (likely(vma->anon_vma)) > > > > > return 0; > > > > > [...] > > > > > if (__anon_vma_prepare(vma)) > > > > > ret = VM_FAULT_OOM; > > > > > [...] > > > > > return ret; > > > > > } > > > > > > > > > > int __anon_vma_prepare(struct vm_area_struct *vma) > > > > > { > > > > > struct mm_struct *mm = vma->vm_mm; > > > > > struct anon_vma *anon_vma, *allocated; > > > > > struct anon_vma_chain *avc; > > > > > > > > > > [...] > > > > > > > > > > [... allocate stuff ...] > > > > > > > > > > anon_vma_lock_write(anon_vma); > > > > > /* page_table_lock to protect against threads */ > > > > > spin_lock(&mm->page_table_lock); > > > > > if (likely(!vma->anon_vma)) { > > > > > vma->anon_vma = anon_vma; > > > > > [...] > > > > > } > > > > > spin_unlock(&mm->page_table_lock); > > > > > anon_vma_unlock_write(anon_vma); > > > > > > > > > > [... cleanup ...] > > > > > > > > > > return 0; > > > > > > > > > > [... error handling ...] > > > > > } > > > > > > > > > > So if one thread reaches the "vma->anon_vma = anon_vma" assignment > > > > > while the other thread is running the "if (likely(vma->anon_vma))" > > > > > check, you get a (AFAIK benign) data race. > > > > > > > > Thanks for checking, Jann. > > > > > > > > To double check" > > > > > > > > "vma->anon_vma = anon_vma" is done w/o store-release, so the lockless > > > > readers can't read anon_vma contents, is it correct? So none of them > > > > really reading anon_vma, right? > > > > > > I think you are right that this should be using store-release; > > > searching around, I also mentioned this in > > > <https://lore.kernel.org/all/CAG48ez0qsAM-dkOUDetmNBSK4typ5t_FvMvtGiB7wQsP-G1jVg@mail.gmail.com/>: > > > > > > | > +Note that there are some exceptions to this - the `anon_vma` > > > field is permitted > > > | > +to be written to under mmap read lock and is instead serialised > > > by the `struct > > > | > +mm_struct` field `page_table_lock`. In addition the `vm_mm` and all > > > | > > > | Hm, we really ought to add some smp_store_release() and READ_ONCE(), > > > | or something along those lines, around our ->anon_vma accesses... > > > | especially the "vma->anon_vma = anon_vma" assignment in > > > | __anon_vma_prepare() looks to me like, on architectures like arm64 > > > | with write-write reordering, we could theoretically end up making a > > > | new anon_vma pointer visible to a concurrent page fault before the > > > | anon_vma has been initialized? Though I have no idea if that is > > > | practically possible, stuff would have to be reordered quite a bit for > > > | that to happen... > > As far as the page fault is concerned it only really cares about whether it > exists or not, not whether it's initialised. Hmm, yeah, I'm not sure if anything in the page fault path actually directly accesses the anon_vma. The page fault path does eventually re-publish the anon_vma pointer with `WRITE_ONCE(folio->mapping, (struct address_space *) anon_vma)` in __folio_set_anon() though, which could then potentially allow a third thread to walk through folio->mapping and observe the uninitialized anon_vma... Looking at the situation on latest stable (v6.18.5), two racing faults on _adjacent_ anonymous VMAs could also end up with one thread writing ->anon_vma while the other thread executes reusable_anon_vma(), loading the pointer to that anon_vma and accessing its ->anon_vma_chain. > The operations that check/modify fields within the anon_vma are protected by the > anon rmap lock (my recent series takes advantage of this to avoid holding that > lock during AVC allocation for instance). > > This lock also protects the interval tree. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-01-14 18:24 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-01-14 16:32 [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare syzbot 2026-01-14 16:42 ` Dmitry Vyukov 2026-01-14 16:59 ` Jann Horn 2026-01-14 17:05 ` Dmitry Vyukov 2026-01-14 17:29 ` Jann Horn 2026-01-14 17:48 ` Jann Horn 2026-01-14 18:02 ` Lorenzo Stoakes 2026-01-14 18:23 ` Jann Horn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox