From: Pedro Falcato <pfalcato@suse.de>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: Jianzhou Zhao <luckd0g@163.com>,
akpm@linux-foundation.org, Liam.Howlett@oracle.com,
vbabka@suse.cz, jannh@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: BUG: KCSAN: data-race in do_mremap / vma_complete
Date: Wed, 11 Mar 2026 10:27:32 +0000 [thread overview]
Message-ID: <u7wzfkse7au4aganufewx2ykz5xg2lgzaf2eu3tfkmowcmsobb@4xubw3ikany2> (raw)
In-Reply-To: <abaad620-bc19-4a3b-9bef-6a1ad4335ce3@lucifer.local>
On Wed, Mar 11, 2026 at 10:11:20AM +0000, Lorenzo Stoakes (Oracle) wrote:
> (Removing incorrect mail, I know it'll take a while to propagate the mail
> change :)
>
> On Wed, Mar 11, 2026 at 03:58:55PM +0800, Jianzhou Zhao wrote:
> >
> > Subject: [BUG] mm/mremap: KCSAN: data-race in do_mremap / vma_complete
> > Dear Maintainers,
> > We are writing to report a KCSAN-detected data race vulnerability within the memory management subsystem, specifically involving `vma_complete` and `check_mremap_params`. This bug was found by our custom fuzzing tool, RacePilot. The race occurs when `vma_complete` increments the `mm->map_count` concurrently while `check_mremap_params` evaluates the same `current->mm->map_count` without holding the appropriate `mmap_lock` or using atomic snapshot primitives (`READ_ONCE`). We observed this bug on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty.
> > Call Trace & Context
> > ==================================================================
> > BUG: KCSAN: data-race in do_mremap / vma_complete
> > write to 0xffff88800c232348 of 4 bytes by task 27920 on cpu 1:
> > vma_complete+0x6d2/0x8a0 home/kfuzz/linux/mm/vma.c:354
> > __split_vma+0x5fb/0x6f0 home/kfuzz/linux/mm/vma.c:567
> > vms_gather_munmap_vmas+0xe5/0x6a0 home/kfuzz/linux/mm/vma.c:1369
> > do_vmi_align_munmap+0x2a3/0x450 home/kfuzz/linux/mm/vma.c:1538
> > do_vmi_munmap+0x19c/0x2e0 home/kfuzz/linux/mm/vma.c:1596
> > do_munmap+0x97/0xc0 home/kfuzz/linux/mm/mmap.c:1068
> > mremap_to+0x179/0x240 home/kfuzz/linux/mm/mremap.c:1374
> > ...
> > __x64_sys_mremap+0x66/0x80 home/kfuzz/linux/mm/mremap.c:1961
> > read to 0xffff88800c232348 of 4 bytes by task 27919 on cpu 0:
> > check_mremap_params home/kfuzz/linux/mm/mremap.c:1816 [inline]
> > do_mremap+0x352/0x1090 home/kfuzz/linux/mm/mremap.c:1920
> > __do_sys_mremap+0x129/0x160 home/kfuzz/linux/mm/mremap.c:1993
> > __se_sys_mremap home/kfuzz/linux/mm/mremap.c:1961 [inline]
> > __x64_sys_mremap+0x66/0x80 home/kfuzz/linux/mm/mremap.c:1961
> > ...
> > value changed: 0x0000001f -> 0x00000020
> > Reported by Kernel Concurrency Sanitizer on:
> > CPU: 0 UID: 0 PID: 27919 Comm: syz.7.1375 Not tainted 6.18.0-08691-g2061f18ad76e-dirty #42 PREEMPT(voluntary)
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> > ==================================================================
> > Execution Flow & Code Context
> > In `mm/vma.c`, the `vma_complete()` function finalizes VMA alterations such as insertions. When a new VMA is successfully attached (e.g., during splitting), the function increments the process's `map_count` while holding the necessary `mmap_lock` in write mode from the calling context:
> > ```c
> > // mm/vma.c
> > static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
> > struct mm_struct *mm)
> > {
> > ...
> > } else if (vp->insert) {
> > /* ... split ... */
> > vma_iter_store_new(vmi, vp->insert);
> > mm->map_count++; // <-- Plain concurrent write
> > }
> > ...
> > }
> > ```
> > Conversely, the `mremap` syscall validation sequence preemptively evaluates `check_mremap_params()` *before* acquiring the `mmap_lock`. This allows dropping malformed syscalls fast but leaves the map quota check unsynchronized:
> > ```c
> > // mm/mremap.c
> > static unsigned long check_mremap_params(struct vma_remap_struct *vrm)
> > {
> > ...
> > /* Worst-scenario case ... */
> > if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3) // <-- Plain concurrent read
> > return -ENOMEM;
> > return 0;
> > }
> > ```
> > At `mm/mremap.c:1924`, the `mmap_write_lock_killable(mm)` is only acquired *after* `check_mremap_params()` successfully returns.
> > Root Cause Analysis
> > A KCSAN data race arises because the `mremap` parameters validator attempts to enact an early heuristic rejection based on the current threshold of `mm->map_count`. However, this evaluation executes entirely without locks (`mmap_lock` is taken subsequently in `do_mremap`). This establishes a plain, lockless read racing against concurrent threads legitimately mutating `mm->map_count` (such as `vma_complete` splitting areas and incrementing the count under the protection of `mmap_lock`). The lack of `READ_ONCE()` combined with a mutating operation provokes the KCSAN alarm and potentially permits compiler load shearing.
> > Unfortunately, we were unable to generate a reproducer for this bug.
> > Potential Impact
> > This data race technically threatens the deterministic outcome of the `mremap` heuristic limit guard. Because `map_count` spans 4 bytes, severe compiler load tearing across cache lines theoretically could trick `check_mremap_params` into accepting or rejecting expansions erratically. Functionally, as a heuristic pre-check, it is virtually benign since a stricter bounded evaluation takes place later under safety locks, but fixing it stops sanitizing infrastructure exhaustion and formalizes the lockless memory access.
> > Proposed Fix
> > To inform the compiler and memory models that the read access of `map_count` inside `check_mremap_params` deliberately operates locklessly, we should wrap the evaluation using the `data_race()` macro to suppress KCSAN warnings effectively while conveying intent.
PLEASE WRAP YOUR LINES. thank you.
> > ```diff
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -1813,7 +1813,7 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm)
> > * Check whether current map count plus 2 still leads us to 4 maps below
> > * the threshold, otherwise return -ENOMEM here to be more safe.
> > */
> > - if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3)
> > + if ((data_race(current->mm->map_count) + 2) >= sysctl_max_map_count - 3)
> > return -ENOMEM;
>
> Ack, this used to be checked under the mmap write lock.
>
> I'll send a patch that factors out these kinds of checks + potentially does a
> speculative check ahead of time and then re-checks once lock established.
>
Well, the problem is that the data_race() is incorrect. It would only be okay
if the check could fail (with no bad side-effects). Otherwise, we need READ_ONCE()
and WRITE_ONCE().
--
Pedro
next prev parent reply other threads:[~2026-03-11 10:27 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-11 7:58 Jianzhou Zhao
2026-03-11 10:11 ` Lorenzo Stoakes (Oracle)
2026-03-11 10:27 ` Pedro Falcato [this message]
2026-03-11 16:17 ` Lorenzo Stoakes (Oracle)
2026-03-11 16:39 ` Lorenzo Stoakes (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=u7wzfkse7au4aganufewx2ykz5xg2lgzaf2eu3tfkmowcmsobb@4xubw3ikany2 \
--to=pfalcato@suse.de \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=luckd0g@163.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox