linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Jann Horn <jannh@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>,
	syzbot <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com>,
	Liam.Howlett@oracle.com, akpm@linux-foundation.org,
	david@kernel.org, harry.yoo@oracle.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	riel@surriel.com, syzkaller-bugs@googlegroups.com,
	vbabka@suse.cz
Subject: Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
Date: Wed, 14 Jan 2026 18:02:22 +0000	[thread overview]
Message-ID: <96aec792-7d10-4dfa-bf35-cc94300f4d2b@lucifer.local> (raw)
In-Reply-To: <CAG48ez3+t-uv88cUhG6=YfMyRVW5REi3-aaW3a_yXSZDK38_ow@mail.gmail.com>

On Wed, Jan 14, 2026 at 06:48:37PM +0100, Jann Horn wrote:
> On Wed, Jan 14, 2026 at 6:29 PM Jann Horn <jannh@google.com> wrote:
> > On Wed, Jan 14, 2026 at 6:06 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > On Wed, 14 Jan 2026 at 18:00, Jann Horn <jannh@google.com> wrote:
> > > > On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > > > On Wed, 14 Jan 2026 at 17:32, syzbot
> > > > > <syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com> wrote:
> > > > > > ==================================================================
> > > > > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
> > > > > >
> > > > > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1:
> > > > > >  __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212
> > > > > >  __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673
> > > > > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > > > > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > > > > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> > > > [...]
> > > > > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0:
> > > > > >  __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667
> > > > > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > > > > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > > > > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> > > > [...]
> > > > > >
> > > > > > value changed: 0x0000000000000000 -> 0xffff888104ecca28
> > > > > >
> > > > > > Reported by Kernel Concurrency Sanitizer on:
> > > > > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G        W           syzkaller #0 PREEMPT(voluntary)
> > > > > > Tainted: [W]=WARN
> > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
> > > > > > ==================================================================
> > > > >
> > > > > Hi Harry,
> > > > >
> > > > > I see you've been debugging:
> > > > > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes
> > > > > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/
> > > > >
> > > > > Can that bug be caused by this data race?
> > > > > Below is an explanation by Gemini LLM as to why this race is harmful.
> > > > > Obviously take it with a grain of salt, but with my limited mm
> > > > > knowledge it does not look immediately wrong (re rmap invariant).
> > > > >
> > > > > However, now digging into details I see that this Lorenzo's patch
> > > > > also marked as fixing "KASAN: slab-use-after-free Read in
> > > > > folio_remove_rmap_ptes":
> > > > >
> > > > > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
> > > > > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/
> > > > >
> > > > > So perhaps the race is still benign (or points to another issue?)
> > > > >
> > > > > Here is what LLM said about the race:
> > > > > -----
> > > > >
> > > > > The bug report is actionable and points to a harmful data race in the Linux
> > > > > kernel's memory management subsystem, specifically in the handling of
> > > > > anonymous `hugetlb` mappings.
> > > >
> > > > This data race is not specific to hugetlb at all, and it isn't caused
> > > > by any recent changes. It's a longstanding thing in core MM, but it's
> > > > pretty benign as far as I know.
> > > >
> > > > Fundamentally, the field vma->anon_vma can be read while only holding
> > > > the mmap lock in read mode; and it can concurrently be changed from
> > > > NULL to non-NULL.

Well isn't that what the page_table_lock is for...?

> > > >
> > > > One scenario to cause such a data race is to create a new anonymous
> > > > VMA, then trigger two concurrent page faults inside this VMA. Assume a
> > > > configuration with VMA locking disabled for simplicity, so that both
> > > > faults happen under the mmap lock in read mode. This will lead to two
> > > > concurrent calls to __vmf_anon_prepare()
> > > > (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623),
> > > > both threads only holding the mmap_lock in read mode.
> > > > __vmf_anon_prepare() is essentially this (from
> > > > https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623,
> > > > with VMA locking code removed):
> > > >
> > > > vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
> > > > {
> > > >         struct vm_area_struct *vma = vmf->vma;
> > > >         vm_fault_t ret = 0;
> > > >
> > > >         if (likely(vma->anon_vma))
> > > >                 return 0;
> > > >         [...]
> > > >         if (__anon_vma_prepare(vma))
> > > >                 ret = VM_FAULT_OOM;
> > > >         [...]
> > > >         return ret;
> > > > }
> > > >
> > > > int __anon_vma_prepare(struct vm_area_struct *vma)
> > > > {
> > > >         struct mm_struct *mm = vma->vm_mm;
> > > >         struct anon_vma *anon_vma, *allocated;
> > > >         struct anon_vma_chain *avc;
> > > >
> > > >         [...]
> > > >
> > > >         [... allocate stuff ...]
> > > >
> > > >         anon_vma_lock_write(anon_vma);
> > > >         /* page_table_lock to protect against threads */
> > > >         spin_lock(&mm->page_table_lock);
> > > >         if (likely(!vma->anon_vma)) {
> > > >                 vma->anon_vma = anon_vma;
> > > >                 [...]
> > > >         }
> > > >         spin_unlock(&mm->page_table_lock);
> > > >         anon_vma_unlock_write(anon_vma);
> > > >
> > > >         [... cleanup ...]
> > > >
> > > >         return 0;
> > > >
> > > >         [... error handling ...]
> > > > }
> > > >
> > > > So if one thread reaches the "vma->anon_vma = anon_vma" assignment
> > > > while the other thread is running the "if (likely(vma->anon_vma))"
> > > > check, you get a (AFAIK benign) data race.
> > >
> > > Thanks for checking, Jann.
> > >
> > > To double check"
> > >
> > > "vma->anon_vma = anon_vma" is done w/o store-release, so the lockless
> > > readers can't read anon_vma contents, is it correct? So none of them
> > > really reading anon_vma, right?
> >
> > I think you are right that this should be using store-release;
> > searching around, I also mentioned this in
> > <https://lore.kernel.org/all/CAG48ez0qsAM-dkOUDetmNBSK4typ5t_FvMvtGiB7wQsP-G1jVg@mail.gmail.com/>:
> >
> > | > +Note that there are some exceptions to this - the `anon_vma`
> > field is permitted
> > | > +to be written to under mmap read lock and is instead serialised
> > by the `struct
> > | > +mm_struct` field `page_table_lock`. In addition the `vm_mm` and all
> > |
> > | Hm, we really ought to add some smp_store_release() and READ_ONCE(),
> > | or something along those lines, around our ->anon_vma accesses...
> > | especially the "vma->anon_vma = anon_vma" assignment in
> > | __anon_vma_prepare() looks to me like, on architectures like arm64
> > | with write-write reordering, we could theoretically end up making a
> > | new anon_vma pointer visible to a concurrent page fault before the
> > | anon_vma has been initialized? Though I have no idea if that is
> > | practically possible, stuff would have to be reordered quite a bit for
> > | that to happen...

As far as the page fault is concerned it only really cares about whether it
exists or not, not whether it's initialised.

The operations that check/modify fields within the anon_vma are protected by the
anon rmap lock (my recent series takes advantage of this to avoid holding that
lock during AVC allocation for instance).

This lock also protects the interval tree.

> >
> > I just noticed that I tried fixing this back in 2023, I don't
> > remember why that didn't end up landing; the memory ordering was kind
> > of messy to think about:
> > <https://lore.kernel.org/all/20230726214103.3261108-4-jannh@google.com/>
> >
> > > Also, anon_vma_chain_link and num_active_vmas++ indeed happen after
> > > assignment to anon_vma:
> > >
> > >     /* page_table_lock to protect against threads */
> > >     spin_lock(&mm->page_table_lock);
> > >     if (likely(!vma->anon_vma)) {
> > >         vma->anon_vma = anon_vma;
> > >         anon_vma_chain_link(vma, avc, anon_vma);
> > >         anon_vma->num_active_vmas++;
> > >         allocated = NULL;
> > >         avc = NULL;
> > >     }
> > >     spin_unlock(&mm->page_table_lock);
> > >
> > > So the lockless readers that observe anon_vma!=NULL won't rely on
> > > these invariants, right?
> >
> > Yeah, that stuff should be sufficiently protected because of the anon_vma lock.
>
> Er, except it actually isn't entirely, as I noticed in that old patch I linked:
>
> @@ -1072,7 +1071,15 @@ static int anon_vma_compatible(struct
> vm_area_struct *a, struct vm_area_struct *
>  static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old,
> struct vm_area_struct *a, struct vm_area_struct *b)
>  {
>         if (anon_vma_compatible(a, b)) {
> -               struct anon_vma *anon_vma = READ_ONCE(old->anon_vma);
> +               /*
> +                * Pairs with smp_store_release() in __anon_vma_prepare().
> +                *
> +                * We could get away with a READ_ONCE() here, but
> +                * smp_load_acquire() ensures that the following
> +                * list_is_singular() check on old->anon_vma_chain doesn't race
> +                * with __anon_vma_prepare().
> +                */
> +               struct anon_vma *anon_vma = smp_load_acquire(&old->anon_vma);

Yeah I'm not sure this is really hugely important, as this being slightly wrong
only leads to very rarely having slightly less efficient lock scalability.

>
>                 if (anon_vma && list_is_singular(&old->anon_vma_chain))
>                         return anon_vma;
>
> That list_is_singular(&old->anon_vma_chain) does plain loads on the
> list_head, which can concurrently be modified by anon_vma_chain_link()

We're no longer using that directly as per my latest changes :)

But I don't think it really matters.

> (which is called from __anon_vma_prepare()). I think that... probably
> shouldn't cause any functional problems, but it is ugly.

But yeah this seems pretty benign.

Thanks, Lorenzo


  reply	other threads:[~2026-01-14 18:02 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-14 16:32 syzbot
2026-01-14 16:42 ` Dmitry Vyukov
2026-01-14 16:59   ` Jann Horn
2026-01-14 17:05     ` Dmitry Vyukov
2026-01-14 17:29       ` Jann Horn
2026-01-14 17:48         ` Jann Horn
2026-01-14 18:02           ` Lorenzo Stoakes [this message]
2026-01-14 18:23             ` Jann Horn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=96aec792-7d10-4dfa-bf35-cc94300f4d2b@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=dvyukov@google.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@surriel.com \
    --cc=syzbot+f5d897f5194d92aa1769@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox