* BUG: KCSAN: data-race in do_mremap / vma_complete
@ 2026-03-11 7:58 Jianzhou Zhao
2026-03-11 10:11 ` Lorenzo Stoakes (Oracle)
0 siblings, 1 reply; 5+ messages in thread
From: Jianzhou Zhao @ 2026-03-11 7:58 UTC (permalink / raw)
To: pfalcato, akpm, Liam.Howlett, lorenzo.stoakes, vbabka, jannh,
linux-mm, linux-kernel
Subject: [BUG] mm/mremap: KCSAN: data-race in do_mremap / vma_complete
Dear Maintainers,
We are writing to report a KCSAN-detected data race vulnerability within the memory management subsystem, specifically involving `vma_complete` and `check_mremap_params`. This bug was found by our custom fuzzing tool, RacePilot. The race occurs when `vma_complete` increments the `mm->map_count` concurrently while `check_mremap_params` evaluates the same `current->mm->map_count` without holding the appropriate `mmap_lock` or using atomic snapshot primitives (`READ_ONCE`). We observed this bug on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty.
Call Trace & Context
==================================================================
BUG: KCSAN: data-race in do_mremap / vma_complete
write to 0xffff88800c232348 of 4 bytes by task 27920 on cpu 1:
vma_complete+0x6d2/0x8a0 home/kfuzz/linux/mm/vma.c:354
__split_vma+0x5fb/0x6f0 home/kfuzz/linux/mm/vma.c:567
vms_gather_munmap_vmas+0xe5/0x6a0 home/kfuzz/linux/mm/vma.c:1369
do_vmi_align_munmap+0x2a3/0x450 home/kfuzz/linux/mm/vma.c:1538
do_vmi_munmap+0x19c/0x2e0 home/kfuzz/linux/mm/vma.c:1596
do_munmap+0x97/0xc0 home/kfuzz/linux/mm/mmap.c:1068
mremap_to+0x179/0x240 home/kfuzz/linux/mm/mremap.c:1374
...
__x64_sys_mremap+0x66/0x80 home/kfuzz/linux/mm/mremap.c:1961
read to 0xffff88800c232348 of 4 bytes by task 27919 on cpu 0:
check_mremap_params home/kfuzz/linux/mm/mremap.c:1816 [inline]
do_mremap+0x352/0x1090 home/kfuzz/linux/mm/mremap.c:1920
__do_sys_mremap+0x129/0x160 home/kfuzz/linux/mm/mremap.c:1993
__se_sys_mremap home/kfuzz/linux/mm/mremap.c:1961 [inline]
__x64_sys_mremap+0x66/0x80 home/kfuzz/linux/mm/mremap.c:1961
...
value changed: 0x0000001f -> 0x00000020
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 UID: 0 PID: 27919 Comm: syz.7.1375 Not tainted 6.18.0-08691-g2061f18ad76e-dirty #42 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
==================================================================
Execution Flow & Code Context
In `mm/vma.c`, the `vma_complete()` function finalizes VMA alterations such as insertions. When a new VMA is successfully attached (e.g., during splitting), the function increments the process's `map_count` while holding the necessary `mmap_lock` in write mode from the calling context:
```c
// mm/vma.c
static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
struct mm_struct *mm)
{
...
} else if (vp->insert) {
/* ... split ... */
vma_iter_store_new(vmi, vp->insert);
mm->map_count++; // <-- Plain concurrent write
}
...
}
```
Conversely, the `mremap` syscall validation sequence preemptively evaluates `check_mremap_params()` *before* acquiring the `mmap_lock`. This allows dropping malformed syscalls fast but leaves the map quota check unsynchronized:
```c
// mm/mremap.c
static unsigned long check_mremap_params(struct vma_remap_struct *vrm)
{
...
/* Worst-scenario case ... */
if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3) // <-- Plain concurrent read
return -ENOMEM;
return 0;
}
```
At `mm/mremap.c:1924`, the `mmap_write_lock_killable(mm)` is only acquired *after* `check_mremap_params()` successfully returns.
Root Cause Analysis
A KCSAN data race arises because the `mremap` parameters validator attempts to enact an early heuristic rejection based on the current threshold of `mm->map_count`. However, this evaluation executes entirely without locks (`mmap_lock` is taken subsequently in `do_mremap`). This establishes a plain, lockless read racing against concurrent threads legitimately mutating `mm->map_count` (such as `vma_complete` splitting areas and incrementing the count under the protection of `mmap_lock`). The lack of `READ_ONCE()` combined with a mutating operation provokes the KCSAN alarm and potentially permits compiler load shearing.
Unfortunately, we were unable to generate a reproducer for this bug.
Potential Impact
This data race technically threatens the deterministic outcome of the `mremap` heuristic limit guard. Because `map_count` spans 4 bytes, severe compiler load tearing across cache lines theoretically could trick `check_mremap_params` into accepting or rejecting expansions erratically. Functionally, as a heuristic pre-check, it is virtually benign since a stricter bounded evaluation takes place later under safety locks, but fixing it stops sanitizing infrastructure exhaustion and formalizes the lockless memory access.
Proposed Fix
To inform the compiler and memory models that the read access of `map_count` inside `check_mremap_params` deliberately operates locklessly, we should wrap the evaluation using the `data_race()` macro to suppress KCSAN warnings effectively while conveying intent.
```diff
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1813,7 +1813,7 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm)
* Check whether current map count plus 2 still leads us to 4 maps below
* the threshold, otherwise return -ENOMEM here to be more safe.
*/
- if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3)
+ if ((data_race(current->mm->map_count) + 2) >= sysctl_max_map_count - 3)
return -ENOMEM;
return 0;
```
We would be highly honored if this could be of any help.
Best regards,
RacePilot Team
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: BUG: KCSAN: data-race in do_mremap / vma_complete 2026-03-11 7:58 BUG: KCSAN: data-race in do_mremap / vma_complete Jianzhou Zhao @ 2026-03-11 10:11 ` Lorenzo Stoakes (Oracle) 2026-03-11 10:27 ` Pedro Falcato 0 siblings, 1 reply; 5+ messages in thread From: Lorenzo Stoakes (Oracle) @ 2026-03-11 10:11 UTC (permalink / raw) To: Jianzhou Zhao Cc: pfalcato, akpm, Liam.Howlett, vbabka, jannh, linux-mm, linux-kernel (Removing incorrect mail, I know it'll take a while to propagate the mail change :) On Wed, Mar 11, 2026 at 03:58:55PM +0800, Jianzhou Zhao wrote: > > Subject: [BUG] mm/mremap: KCSAN: data-race in do_mremap / vma_complete > Dear Maintainers, > We are writing to report a KCSAN-detected data race vulnerability within the memory management subsystem, specifically involving `vma_complete` and `check_mremap_params`. This bug was found by our custom fuzzing tool, RacePilot. The race occurs when `vma_complete` increments the `mm->map_count` concurrently while `check_mremap_params` evaluates the same `current->mm->map_count` without holding the appropriate `mmap_lock` or using atomic snapshot primitives (`READ_ONCE`). We observed this bug on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty. > Call Trace & Context > ================================================================== > BUG: KCSAN: data-race in do_mremap / vma_complete > write to 0xffff88800c232348 of 4 bytes by task 27920 on cpu 1: > vma_complete+0x6d2/0x8a0 home/kfuzz/linux/mm/vma.c:354 > __split_vma+0x5fb/0x6f0 home/kfuzz/linux/mm/vma.c:567 > vms_gather_munmap_vmas+0xe5/0x6a0 home/kfuzz/linux/mm/vma.c:1369 > do_vmi_align_munmap+0x2a3/0x450 home/kfuzz/linux/mm/vma.c:1538 > do_vmi_munmap+0x19c/0x2e0 home/kfuzz/linux/mm/vma.c:1596 > do_munmap+0x97/0xc0 home/kfuzz/linux/mm/mmap.c:1068 > mremap_to+0x179/0x240 home/kfuzz/linux/mm/mremap.c:1374 > ... > __x64_sys_mremap+0x66/0x80 home/kfuzz/linux/mm/mremap.c:1961 > read to 0xffff88800c232348 of 4 bytes by task 27919 on cpu 0: > check_mremap_params home/kfuzz/linux/mm/mremap.c:1816 [inline] > do_mremap+0x352/0x1090 home/kfuzz/linux/mm/mremap.c:1920 > __do_sys_mremap+0x129/0x160 home/kfuzz/linux/mm/mremap.c:1993 > __se_sys_mremap home/kfuzz/linux/mm/mremap.c:1961 [inline] > __x64_sys_mremap+0x66/0x80 home/kfuzz/linux/mm/mremap.c:1961 > ... > value changed: 0x0000001f -> 0x00000020 > Reported by Kernel Concurrency Sanitizer on: > CPU: 0 UID: 0 PID: 27919 Comm: syz.7.1375 Not tainted 6.18.0-08691-g2061f18ad76e-dirty #42 PREEMPT(voluntary) > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > ================================================================== > Execution Flow & Code Context > In `mm/vma.c`, the `vma_complete()` function finalizes VMA alterations such as insertions. When a new VMA is successfully attached (e.g., during splitting), the function increments the process's `map_count` while holding the necessary `mmap_lock` in write mode from the calling context: > ```c > // mm/vma.c > static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi, > struct mm_struct *mm) > { > ... > } else if (vp->insert) { > /* ... split ... */ > vma_iter_store_new(vmi, vp->insert); > mm->map_count++; // <-- Plain concurrent write > } > ... > } > ``` > Conversely, the `mremap` syscall validation sequence preemptively evaluates `check_mremap_params()` *before* acquiring the `mmap_lock`. This allows dropping malformed syscalls fast but leaves the map quota check unsynchronized: > ```c > // mm/mremap.c > static unsigned long check_mremap_params(struct vma_remap_struct *vrm) > { > ... > /* Worst-scenario case ... */ > if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3) // <-- Plain concurrent read > return -ENOMEM; > return 0; > } > ``` > At `mm/mremap.c:1924`, the `mmap_write_lock_killable(mm)` is only acquired *after* `check_mremap_params()` successfully returns. > Root Cause Analysis > A KCSAN data race arises because the `mremap` parameters validator attempts to enact an early heuristic rejection based on the current threshold of `mm->map_count`. However, this evaluation executes entirely without locks (`mmap_lock` is taken subsequently in `do_mremap`). This establishes a plain, lockless read racing against concurrent threads legitimately mutating `mm->map_count` (such as `vma_complete` splitting areas and incrementing the count under the protection of `mmap_lock`). The lack of `READ_ONCE()` combined with a mutating operation provokes the KCSAN alarm and potentially permits compiler load shearing. > Unfortunately, we were unable to generate a reproducer for this bug. > Potential Impact > This data race technically threatens the deterministic outcome of the `mremap` heuristic limit guard. Because `map_count` spans 4 bytes, severe compiler load tearing across cache lines theoretically could trick `check_mremap_params` into accepting or rejecting expansions erratically. Functionally, as a heuristic pre-check, it is virtually benign since a stricter bounded evaluation takes place later under safety locks, but fixing it stops sanitizing infrastructure exhaustion and formalizes the lockless memory access. > Proposed Fix > To inform the compiler and memory models that the read access of `map_count` inside `check_mremap_params` deliberately operates locklessly, we should wrap the evaluation using the `data_race()` macro to suppress KCSAN warnings effectively while conveying intent. > ```diff > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -1813,7 +1813,7 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm) > * Check whether current map count plus 2 still leads us to 4 maps below > * the threshold, otherwise return -ENOMEM here to be more safe. > */ > - if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3) > + if ((data_race(current->mm->map_count) + 2) >= sysctl_max_map_count - 3) > return -ENOMEM; Ack, this used to be checked under the mmap write lock. I'll send a patch that factors out these kinds of checks + potentially does a speculative check ahead of time and then re-checks once lock established. With a: Suggested-by: Jianzhou Zhao <luckd0g@163.com> > return 0; > ``` > We would be highly honored if this could be of any help. Thanks, much appreciated :) > Best regards, > RacePilot Team Cheers, Lorenzo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG: KCSAN: data-race in do_mremap / vma_complete 2026-03-11 10:11 ` Lorenzo Stoakes (Oracle) @ 2026-03-11 10:27 ` Pedro Falcato 2026-03-11 16:17 ` Lorenzo Stoakes (Oracle) 0 siblings, 1 reply; 5+ messages in thread From: Pedro Falcato @ 2026-03-11 10:27 UTC (permalink / raw) To: Lorenzo Stoakes (Oracle) Cc: Jianzhou Zhao, akpm, Liam.Howlett, vbabka, jannh, linux-mm, linux-kernel On Wed, Mar 11, 2026 at 10:11:20AM +0000, Lorenzo Stoakes (Oracle) wrote: > (Removing incorrect mail, I know it'll take a while to propagate the mail > change :) > > On Wed, Mar 11, 2026 at 03:58:55PM +0800, Jianzhou Zhao wrote: > > > > Subject: [BUG] mm/mremap: KCSAN: data-race in do_mremap / vma_complete > > Dear Maintainers, > > We are writing to report a KCSAN-detected data race vulnerability within the memory management subsystem, specifically involving `vma_complete` and `check_mremap_params`. This bug was found by our custom fuzzing tool, RacePilot. The race occurs when `vma_complete` increments the `mm->map_count` concurrently while `check_mremap_params` evaluates the same `current->mm->map_count` without holding the appropriate `mmap_lock` or using atomic snapshot primitives (`READ_ONCE`). We observed this bug on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty. > > Call Trace & Context > > ================================================================== > > BUG: KCSAN: data-race in do_mremap / vma_complete > > write to 0xffff88800c232348 of 4 bytes by task 27920 on cpu 1: > > vma_complete+0x6d2/0x8a0 home/kfuzz/linux/mm/vma.c:354 > > __split_vma+0x5fb/0x6f0 home/kfuzz/linux/mm/vma.c:567 > > vms_gather_munmap_vmas+0xe5/0x6a0 home/kfuzz/linux/mm/vma.c:1369 > > do_vmi_align_munmap+0x2a3/0x450 home/kfuzz/linux/mm/vma.c:1538 > > do_vmi_munmap+0x19c/0x2e0 home/kfuzz/linux/mm/vma.c:1596 > > do_munmap+0x97/0xc0 home/kfuzz/linux/mm/mmap.c:1068 > > mremap_to+0x179/0x240 home/kfuzz/linux/mm/mremap.c:1374 > > ... > > __x64_sys_mremap+0x66/0x80 home/kfuzz/linux/mm/mremap.c:1961 > > read to 0xffff88800c232348 of 4 bytes by task 27919 on cpu 0: > > check_mremap_params home/kfuzz/linux/mm/mremap.c:1816 [inline] > > do_mremap+0x352/0x1090 home/kfuzz/linux/mm/mremap.c:1920 > > __do_sys_mremap+0x129/0x160 home/kfuzz/linux/mm/mremap.c:1993 > > __se_sys_mremap home/kfuzz/linux/mm/mremap.c:1961 [inline] > > __x64_sys_mremap+0x66/0x80 home/kfuzz/linux/mm/mremap.c:1961 > > ... > > value changed: 0x0000001f -> 0x00000020 > > Reported by Kernel Concurrency Sanitizer on: > > CPU: 0 UID: 0 PID: 27919 Comm: syz.7.1375 Not tainted 6.18.0-08691-g2061f18ad76e-dirty #42 PREEMPT(voluntary) > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > > ================================================================== > > Execution Flow & Code Context > > In `mm/vma.c`, the `vma_complete()` function finalizes VMA alterations such as insertions. When a new VMA is successfully attached (e.g., during splitting), the function increments the process's `map_count` while holding the necessary `mmap_lock` in write mode from the calling context: > > ```c > > // mm/vma.c > > static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi, > > struct mm_struct *mm) > > { > > ... > > } else if (vp->insert) { > > /* ... split ... */ > > vma_iter_store_new(vmi, vp->insert); > > mm->map_count++; // <-- Plain concurrent write > > } > > ... > > } > > ``` > > Conversely, the `mremap` syscall validation sequence preemptively evaluates `check_mremap_params()` *before* acquiring the `mmap_lock`. This allows dropping malformed syscalls fast but leaves the map quota check unsynchronized: > > ```c > > // mm/mremap.c > > static unsigned long check_mremap_params(struct vma_remap_struct *vrm) > > { > > ... > > /* Worst-scenario case ... */ > > if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3) // <-- Plain concurrent read > > return -ENOMEM; > > return 0; > > } > > ``` > > At `mm/mremap.c:1924`, the `mmap_write_lock_killable(mm)` is only acquired *after* `check_mremap_params()` successfully returns. > > Root Cause Analysis > > A KCSAN data race arises because the `mremap` parameters validator attempts to enact an early heuristic rejection based on the current threshold of `mm->map_count`. However, this evaluation executes entirely without locks (`mmap_lock` is taken subsequently in `do_mremap`). This establishes a plain, lockless read racing against concurrent threads legitimately mutating `mm->map_count` (such as `vma_complete` splitting areas and incrementing the count under the protection of `mmap_lock`). The lack of `READ_ONCE()` combined with a mutating operation provokes the KCSAN alarm and potentially permits compiler load shearing. > > Unfortunately, we were unable to generate a reproducer for this bug. > > Potential Impact > > This data race technically threatens the deterministic outcome of the `mremap` heuristic limit guard. Because `map_count` spans 4 bytes, severe compiler load tearing across cache lines theoretically could trick `check_mremap_params` into accepting or rejecting expansions erratically. Functionally, as a heuristic pre-check, it is virtually benign since a stricter bounded evaluation takes place later under safety locks, but fixing it stops sanitizing infrastructure exhaustion and formalizes the lockless memory access. > > Proposed Fix > > To inform the compiler and memory models that the read access of `map_count` inside `check_mremap_params` deliberately operates locklessly, we should wrap the evaluation using the `data_race()` macro to suppress KCSAN warnings effectively while conveying intent. PLEASE WRAP YOUR LINES. thank you. > > ```diff > > --- a/mm/mremap.c > > +++ b/mm/mremap.c > > @@ -1813,7 +1813,7 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm) > > * Check whether current map count plus 2 still leads us to 4 maps below > > * the threshold, otherwise return -ENOMEM here to be more safe. > > */ > > - if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3) > > + if ((data_race(current->mm->map_count) + 2) >= sysctl_max_map_count - 3) > > return -ENOMEM; > > Ack, this used to be checked under the mmap write lock. > > I'll send a patch that factors out these kinds of checks + potentially does a > speculative check ahead of time and then re-checks once lock established. > Well, the problem is that the data_race() is incorrect. It would only be okay if the check could fail (with no bad side-effects). Otherwise, we need READ_ONCE() and WRITE_ONCE(). -- Pedro ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG: KCSAN: data-race in do_mremap / vma_complete 2026-03-11 10:27 ` Pedro Falcato @ 2026-03-11 16:17 ` Lorenzo Stoakes (Oracle) 2026-03-11 16:39 ` Lorenzo Stoakes (Oracle) 0 siblings, 1 reply; 5+ messages in thread From: Lorenzo Stoakes (Oracle) @ 2026-03-11 16:17 UTC (permalink / raw) To: Pedro Falcato Cc: Jianzhou Zhao, akpm, Liam.Howlett, vbabka, jannh, linux-mm, linux-kernel On Wed, Mar 11, 2026 at 10:27:32AM +0000, Pedro Falcato wrote: > On Wed, Mar 11, 2026 at 10:11:20AM +0000, Lorenzo Stoakes (Oracle) wrote: > > (Removing incorrect mail, I know it'll take a while to propagate the mail > > change :) > > > > On Wed, Mar 11, 2026 at 03:58:55PM +0800, Jianzhou Zhao wrote: > > > > > > Subject: [BUG] mm/mremap: KCSAN: data-race in do_mremap / vma_complete > > > Dear Maintainers, > > > We are writing to report a KCSAN-detected data race vulnerability within the memory management subsystem, specifically involving `vma_complete` and `check_mremap_params`. This bug was found by our custom fuzzing tool, RacePilot. The race occurs when `vma_complete` increments the `mm->map_count` concurrently while `check_mremap_params` evaluates the same `current->mm->map_count` without holding the appropriate `mmap_lock` or using atomic snapshot primitives (`READ_ONCE`). We observed this bug on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty. > > > Call Trace & Context > > > ================================================================== > > > BUG: KCSAN: data-race in do_mremap / vma_complete > > > write to 0xffff88800c232348 of 4 bytes by task 27920 on cpu 1: > > > vma_complete+0x6d2/0x8a0 home/kfuzz/linux/mm/vma.c:354 > > > __split_vma+0x5fb/0x6f0 home/kfuzz/linux/mm/vma.c:567 > > > vms_gather_munmap_vmas+0xe5/0x6a0 home/kfuzz/linux/mm/vma.c:1369 > > > do_vmi_align_munmap+0x2a3/0x450 home/kfuzz/linux/mm/vma.c:1538 > > > do_vmi_munmap+0x19c/0x2e0 home/kfuzz/linux/mm/vma.c:1596 > > > do_munmap+0x97/0xc0 home/kfuzz/linux/mm/mmap.c:1068 > > > mremap_to+0x179/0x240 home/kfuzz/linux/mm/mremap.c:1374 > > > ... > > > __x64_sys_mremap+0x66/0x80 home/kfuzz/linux/mm/mremap.c:1961 > > > read to 0xffff88800c232348 of 4 bytes by task 27919 on cpu 0: > > > check_mremap_params home/kfuzz/linux/mm/mremap.c:1816 [inline] > > > do_mremap+0x352/0x1090 home/kfuzz/linux/mm/mremap.c:1920 > > > __do_sys_mremap+0x129/0x160 home/kfuzz/linux/mm/mremap.c:1993 > > > __se_sys_mremap home/kfuzz/linux/mm/mremap.c:1961 [inline] > > > __x64_sys_mremap+0x66/0x80 home/kfuzz/linux/mm/mremap.c:1961 > > > ... > > > value changed: 0x0000001f -> 0x00000020 > > > Reported by Kernel Concurrency Sanitizer on: > > > CPU: 0 UID: 0 PID: 27919 Comm: syz.7.1375 Not tainted 6.18.0-08691-g2061f18ad76e-dirty #42 PREEMPT(voluntary) > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > > > ================================================================== > > > Execution Flow & Code Context > > > In `mm/vma.c`, the `vma_complete()` function finalizes VMA alterations such as insertions. When a new VMA is successfully attached (e.g., during splitting), the function increments the process's `map_count` while holding the necessary `mmap_lock` in write mode from the calling context: > > > ```c > > > // mm/vma.c > > > static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi, > > > struct mm_struct *mm) > > > { > > > ... > > > } else if (vp->insert) { > > > /* ... split ... */ > > > vma_iter_store_new(vmi, vp->insert); > > > mm->map_count++; // <-- Plain concurrent write > > > } > > > ... > > > } > > > ``` > > > Conversely, the `mremap` syscall validation sequence preemptively evaluates `check_mremap_params()` *before* acquiring the `mmap_lock`. This allows dropping malformed syscalls fast but leaves the map quota check unsynchronized: > > > ```c > > > // mm/mremap.c > > > static unsigned long check_mremap_params(struct vma_remap_struct *vrm) > > > { > > > ... > > > /* Worst-scenario case ... */ > > > if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3) // <-- Plain concurrent read > > > return -ENOMEM; > > > return 0; > > > } > > > ``` > > > At `mm/mremap.c:1924`, the `mmap_write_lock_killable(mm)` is only acquired *after* `check_mremap_params()` successfully returns. > > > Root Cause Analysis > > > A KCSAN data race arises because the `mremap` parameters validator attempts to enact an early heuristic rejection based on the current threshold of `mm->map_count`. However, this evaluation executes entirely without locks (`mmap_lock` is taken subsequently in `do_mremap`). This establishes a plain, lockless read racing against concurrent threads legitimately mutating `mm->map_count` (such as `vma_complete` splitting areas and incrementing the count under the protection of `mmap_lock`). The lack of `READ_ONCE()` combined with a mutating operation provokes the KCSAN alarm and potentially permits compiler load shearing. > > > Unfortunately, we were unable to generate a reproducer for this bug. > > > Potential Impact > > > This data race technically threatens the deterministic outcome of the `mremap` heuristic limit guard. Because `map_count` spans 4 bytes, severe compiler load tearing across cache lines theoretically could trick `check_mremap_params` into accepting or rejecting expansions erratically. Functionally, as a heuristic pre-check, it is virtually benign since a stricter bounded evaluation takes place later under safety locks, but fixing it stops sanitizing infrastructure exhaustion and formalizes the lockless memory access. > > > Proposed Fix > > > To inform the compiler and memory models that the read access of `map_count` inside `check_mremap_params` deliberately operates locklessly, we should wrap the evaluation using the `data_race()` macro to suppress KCSAN warnings effectively while conveying intent. > > PLEASE WRAP YOUR LINES. thank you. :>) Please. > > > > ```diff > > > --- a/mm/mremap.c > > > +++ b/mm/mremap.c > > > @@ -1813,7 +1813,7 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm) > > > * Check whether current map count plus 2 still leads us to 4 maps below > > > * the threshold, otherwise return -ENOMEM here to be more safe. > > > */ > > > - if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3) > > > + if ((data_race(current->mm->map_count) + 2) >= sysctl_max_map_count - 3) > > > return -ENOMEM; > > > > Ack, this used to be checked under the mmap write lock. > > > > I'll send a patch that factors out these kinds of checks + potentially does a > > speculative check ahead of time and then re-checks once lock established. > > > > Well, the problem is that the data_race() is incorrect. It would only be okay > if the check could fail (with no bad side-effects). Otherwise, we need READ_ONCE() > and WRITE_ONCE(). Yeah true, also a user can update sysctl_max_map_count without any mmap locks held obviously. So we're probably in a state of sin generally that we've previously tolerated. Anyway, that check seems to be wrong, so I'm going to send a patch that fixes it, and I'll update the logic to READ_ONCE() this variable. (proc_int_conv() already does a WRITE_ONCE()). > > -- > Pedro Cheers, Lorenzo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG: KCSAN: data-race in do_mremap / vma_complete 2026-03-11 16:17 ` Lorenzo Stoakes (Oracle) @ 2026-03-11 16:39 ` Lorenzo Stoakes (Oracle) 0 siblings, 0 replies; 5+ messages in thread From: Lorenzo Stoakes (Oracle) @ 2026-03-11 16:39 UTC (permalink / raw) To: Pedro Falcato Cc: Jianzhou Zhao, akpm, Liam.Howlett, vbabka, jannh, linux-mm, linux-kernel On Wed, Mar 11, 2026 at 04:17:13PM +0000, Lorenzo Stoakes (Oracle) wrote: > On Wed, Mar 11, 2026 at 10:27:32AM +0000, Pedro Falcato wrote: > > Well, the problem is that the data_race() is incorrect. It would only be okay > > if the check could fail (with no bad side-effects). Otherwise, we need READ_ONCE() > > and WRITE_ONCE(). > > Yeah true, also a user can update sysctl_max_map_count without any mmap locks > held obviously. > > So we're probably in a state of sin generally that we've previously tolerated. > > Anyway, that check seems to be wrong, so I'm going to send a patch that fixes > it, and I'll update the logic to READ_ONCE() this variable. (proc_int_conv() > already does a WRITE_ONCE()). Also, updating to only check this once mmap write lock held, so avoid the racey situation altogether. Cheers, Lorenzo ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-11 16:39 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-03-11 7:58 BUG: KCSAN: data-race in do_mremap / vma_complete Jianzhou Zhao 2026-03-11 10:11 ` Lorenzo Stoakes (Oracle) 2026-03-11 10:27 ` Pedro Falcato 2026-03-11 16:17 ` Lorenzo Stoakes (Oracle) 2026-03-11 16:39 ` Lorenzo Stoakes (Oracle)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox