* [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree
@ 2025-03-12 22:15 Yang Shi
2025-03-12 23:55 ` Vasily Gorbik
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Yang Shi @ 2025-03-12 22:15 UTC (permalink / raw)
To: Liam.Howlett, lorenzo.stoakes, vbabka, jannh, oliver.sang, akpm
Cc: yang, linux-mm, linux-kernel
LKP reported 800% performance improvement for small-allocs benchmark
from vm-scalability [1] with patch ("/dev/zero: make private mapping
full anonymous mapping") [2], but the patch was nack'ed since it changes
the output of smaps somewhat.
The profiling shows one of the major sources of the performance
improvement is the less contention to i_mmap_rwsem.
The small-allocs benchmark creates a lot of 40K size memory maps by
mmap'ing private /dev/zero then triggers page fault on the mappings.
When creating private mapping for /dev/zero, the anonymous VMA is
created, but it has valid vm_file. Kernel basically assumes anonymous
VMAs should have NULL vm_file, for example, mmap inserts VMA to the file
rmap tree if vm_file is not NULL. So the private /dev/zero mapping
will be inserted to the file rmap tree, this resulted in the contention
to i_mmap_rwsem. But it is actually anonymous VMA, so it is pointless
to insert it to file rmap tree.
Skip anonymous VMA for this case. Over 400% performance improvement was
reported [3].
It is not on par with the 800% improvement from the original patch. It is
because page fault handler needs to access some members of struct file
if vm_file is not NULL, for example, f_mode and f_mapping. They are in
the same cacheline with file refcount. When mmap'ing a file the file
refcount is inc'ed and dec'ed, this caused bad cache false sharing
problem. The further debug showed checking whether the VMA is anonymous
or not can alleviate the problem. But I'm not sure whether it is the
best way to handle it, maybe we should consider shuffle the layout of
struct file.
However it sounds rare that real life applications would create that
many maps with mmap'ing private /dev/zero and share the same struct
file, so the cache false sharing problem may be not that bad. But
i_mmap_rwsem contention problem seems more real since all /dev/zero
private mappings even from different applications share the same struct
address_space so the same i_mmap_rwsem. Inserting anonymous VMA into
file rmap tree is also a broken behavior. It is worth fixing from this
perspective too.
[1] https://lore.kernel.org/linux-mm/202501281038.617c6b60-lkp@intel.com/
[2] https://lore.kernel.org/linux-mm/20250113223033.4054534-1-yang@os.amperecomputing.com/
[3] https://lore.kernel.org/linux-mm/Z6RshwXCWhAGoMOK@xsang-OptiPlex-9020/#t
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
---
v2:
* Added the comments in code suggested by Lorenzo
* Collected R-b from Lorenze
mm/vma.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index c7abef5177cc..2fe99d181cfd 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1648,6 +1648,10 @@ static void unlink_file_vma_batch_process(struct unlink_vma_file_batch *vb)
void unlink_file_vma_batch_add(struct unlink_vma_file_batch *vb,
struct vm_area_struct *vma)
{
+ /* Rare, but e.g. /dev/zero sets vma->vm_file on an anon VMA */
+ if (vma_is_anonymous(vma))
+ return;
+
if (vma->vm_file == NULL)
return;
@@ -1671,8 +1675,13 @@ void unlink_file_vma_batch_final(struct unlink_vma_file_batch *vb)
*/
void unlink_file_vma(struct vm_area_struct *vma)
{
- struct file *file = vma->vm_file;
+ struct file *file;
+
+ /* Rare, but e.g. /dev/zero sets vma->vm_file on an anon VMA */
+ if (vma_is_anonymous(vma))
+ return;
+ file = vma->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
@@ -1684,9 +1693,14 @@ void unlink_file_vma(struct vm_area_struct *vma)
void vma_link_file(struct vm_area_struct *vma)
{
- struct file *file = vma->vm_file;
+ struct file *file;
struct address_space *mapping;
+ /* Rare, but e.g. /dev/zero sets vma->vm_file on an anon VMA */
+ if (vma_is_anonymous(vma))
+ return;
+
+ file = vma->vm_file;
if (file) {
mapping = file->f_mapping;
i_mmap_lock_write(mapping);
--
2.48.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree
2025-03-12 22:15 [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree Yang Shi
@ 2025-03-12 23:55 ` Vasily Gorbik
2025-03-13 3:04 ` Yang Shi
2025-03-14 3:16 ` Lai, Yi
2025-03-25 8:40 ` kernel test robot
2 siblings, 1 reply; 7+ messages in thread
From: Vasily Gorbik @ 2025-03-12 23:55 UTC (permalink / raw)
To: Yang Shi, Andrew Morton
Cc: Liam.Howlett, lorenzo.stoakes, vbabka, jannh, oliver.sang, akpm,
linux-mm, linux-kernel, Vasily Gorbik
On Wed, Mar 12, 2025 at 03:15:21PM -0700, Yang Shi wrote:
> LKP reported 800% performance improvement for small-allocs benchmark
> from vm-scalability [1] with patch ("/dev/zero: make private mapping
> full anonymous mapping") [2], but the patch was nack'ed since it changes
> the output of smaps somewhat.
...
> ---
> v2:
> * Added the comments in code suggested by Lorenzo
> * Collected R-b from Lorenze
>
> mm/vma.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
Hi Yang,
Replying to v2, as the code is the same as v1 in linux-next:
The LTP test "mmap10" consistently triggers a kernel NULL pointer
dereference with this change, at least on x86 and s390. Reverting just
this single patch from linux-next fixes the issue.
LTP: starting mmap10
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000010d22a067 P4D 800000010d22a067 PUD 11ff09067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 5 UID: 0 PID: 1719 Comm: mmap10 Not tainted 6.14.0-rc6-next-20250312 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
RIP: 0010:__rb_insert_augmented+0x2b/0x1d0
Code: 0f 1e fa 48 89 f8 48 8b 3f 48 85 ff 0f 84 a4 01 00 00 41 55 49 89 f5 41 54 49 89 d4 55 53 48 8b 1f f6 c3 01 0f 85 e1 00 00 00 <48> 8b 53 08 48 39 fa 74 67 48 85 d2 74 09 f6 02 01 0f 84 a0 00 00
RSP: 0018:ffffc90002b47cc8 EFLAGS: 00010246
RAX: ffff8881143ab788 RBX: 0000000000000000 RCX: 00000000000009ff
RDX: ffffffff814ad5d0 RSI: ffff888100bb5060 RDI: ffff8881143ab088
RBP: ffff8881053af8c0 R08: ffff8881143ab700 R09: 00007ff6433f2000
R10: 00007ff6433f2000 R11: ffff8881143ab000 R12: ffffffff814ad5d0
R13: ffff888100bb5060 R14: ffff8881143ab700 R15: ffff8881143ab000
FS: 00007ff643df1740(0000) GS:ffff8882b45bf000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000011b042000 CR4: 00000000000006f0
Call Trace:
<TASK>
? __die_body.cold+0x19/0x2b
? page_fault_oops+0xc4/0x1f0
? search_extable+0x26/0x30
? search_module_extables+0x3f/0x60
? exc_page_fault+0x6b/0x150
? asm_exc_page_fault+0x26/0x30
? __pfx_vma_interval_tree_augment_rotate+0x10/0x10
? __pfx_vma_interval_tree_augment_rotate+0x10/0x10
? __rb_insert_augmented+0x2b/0x1d0
copy_mm+0x48a/0x8c0
copy_process+0xf98/0x1930
kernel_clone+0xb7/0x3b0
__do_sys_clone+0x65/0x90
do_syscall_64+0x9e/0x1a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff643eb2b00
Code: 31 c0 31 d2 31 f6 bf 11 00 20 01 48 89 e5 53 48 83 ec 08 64 48 8b 04 25 10 00 00 00 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 89 c3 85 c0 75 31 64 48 8b 04 25 10 00 00
RSP: 002b:00007ffdac219010 EFLAGS: 00000202 ORIG_RAX: 0000000000000038
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007ff643eb2b00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
RBP: 00007ffdac219020 R08: 0000000000000000 R09: 0000000000000000
R10: 00007ff643df1a10 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 00007ff644036000 R15: 0000000000000000
</TASK>
Modules linked in:
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
RIP: 0010:__rb_insert_augmented+0x2b/0x1d0
Code: 0f 1e fa 48 89 f8 48 8b 3f 48 85 ff 0f 84 a4 01 00 00 41 55 49 89 f5 41 54 49 89 d4 55 53 48 8b 1f f6 c3 01 0f 85 e1 00 00 00 <48> 8b 53 08 48 39 fa 74 67 48 85 d2 74 09 f6 02 01 0f 84 a0 00 00
RSP: 0018:ffffc90002b47cc8 EFLAGS: 00010246
RAX: ffff8881143ab788 RBX: 0000000000000000 RCX: 00000000000009ff
RDX: ffffffff814ad5d0 RSI: ffff888100bb5060 RDI: ffff8881143ab088
RBP: ffff8881053af8c0 R08: ffff8881143ab700 R09: 00007ff6433f2000
R10: 00007ff6433f2000 R11: ffff8881143ab000 R12: ffffffff814ad5d0
R13: ffff888100bb5060 R14: ffff8881143ab700 R15: ffff8881143ab000
FS: 00007ff643df1740(0000) GS:ffff8882b45bf000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000011b042000 CR4: 00000000000006f0
LTP: starting mmap10
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000483
Fault in home space mode while using kernel ASCE.
AS:000000000247c007 R3:00000001ffffc007 S:00000001ffffb801 P:000000000000013d
Oops: 0004 ilc:3 [#1] SMP
Modules linked in:
CPU: 0 UID: 0 PID: 665 Comm: mmap10 Not tainted 6.14.0-rc6-next-20250312 #16
Hardware name: IBM 3931 A01 704 (KVM/Linux)
Krnl PSW : 0704c00180000000 000003ffe0ee0440 (__rb_insert_augmented+0x60/0x210)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 00000000009ff000 0000000000000000 000000008e5f7508 0000000084a7ed08
00000000000009fe 0000000000000000 0000000000000000 0000037fe06c7b68
00000000801d0e90 000003ffe04158d0 0000000084a7ed08 0000000000000000
000003ffbb700000 00000000801d0e48 000003ffe0ee057c 0000037fe06c7a40
Krnl Code: 000003ffe0ee0430: e31030080004 lg %r1,8(%r3)
000003ffe0ee0436: ec1200888064 cgrj %r1,%r2,8,000003ffe0ee0546
#000003ffe0ee043c: b90400a3 lgr %r10,%r3
>000003ffe0ee0440: e310b0100024 stg %r1,16(%r11)
000003ffe0ee0446: e3b030080024 stg %r11,8(%r3)
000003ffe0ee044c: ec180009007c cgij %r1,0,8,000003ffe0ee045e
000003ffe0ee0452: ec2b000100d9 aghik %r2,%r11,1
000003ffe0ee0458: e32010000024 stg %r2,0(%r1)
Call Trace:
[<000003ffe0ee0440>] __rb_insert_augmented+0x60/0x210
[<000003ffe016d6c4>] dup_mmap+0x424/0x8c0
[<000003ffe016dc62>] copy_mm+0x102/0x1c0
[<000003ffe016e8ae>] copy_process+0x7ce/0x12b0
[<000003ffe016f458>] kernel_clone+0x68/0x380
[<000003ffe016f84a>] __do_sys_clone+0x5a/0x70
[<000003ffe016faa0>] __s390x_sys_clone+0x40/0x50
[<000003ffe011c9b6>] do_syscall.constprop.0+0x116/0x140
[<000003ffe0ef1d64>] __do_syscall+0xd4/0x1c0
[<000003ffe0efd044>] system_call+0x74/0x98
Last Breaking-Event-Address:
[<000003ffe0ee058a>] __rb_insert_augmented+0x1aa/0x210
Kernel panic - not syncing: Fatal exception: panic_on_oops
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree
2025-03-12 23:55 ` Vasily Gorbik
@ 2025-03-13 3:04 ` Yang Shi
2025-03-13 5:16 ` Lorenzo Stoakes
0 siblings, 1 reply; 7+ messages in thread
From: Yang Shi @ 2025-03-13 3:04 UTC (permalink / raw)
To: Vasily Gorbik, Andrew Morton
Cc: Liam.Howlett, lorenzo.stoakes, vbabka, jannh, oliver.sang,
linux-mm, linux-kernel, Vasily Gorbik
On 3/12/25 4:55 PM, Vasily Gorbik wrote:
> On Wed, Mar 12, 2025 at 03:15:21PM -0700, Yang Shi wrote:
>> LKP reported 800% performance improvement for small-allocs benchmark
>> from vm-scalability [1] with patch ("/dev/zero: make private mapping
>> full anonymous mapping") [2], but the patch was nack'ed since it changes
>> the output of smaps somewhat.
> ...
>> ---
>> v2:
>> * Added the comments in code suggested by Lorenzo
>> * Collected R-b from Lorenze
>>
>> mm/vma.c | 18 ++++++++++++++++--
>> 1 file changed, 16 insertions(+), 2 deletions(-)
> Hi Yang,
>
> Replying to v2, as the code is the same as v1 in linux-next:
>
> The LTP test "mmap10" consistently triggers a kernel NULL pointer
> dereference with this change, at least on x86 and s390. Reverting just
> this single patch from linux-next fixes the issue.
Hi Vasily,
Thanks for the report. It is because dup_mmap() inserts the VMA into
file rmap by checking whether vma->vm_file is NULL or not. This splat
can be killed by skipping anonymous vma, but this actually will expose a
more severe problem. The struct file refcount may be imbalance. The
refcount is inc'ed in mmap, then inc'ed again by fork(), it is dec'ed
when unmap or process exit. If we skip refcount inc in fork, we need
skip refcount dec in unmap too, but there is still one refcount from mmap.
Can we dec refcount in mmap if we see it is anonymous vma finally?
Unfortunately, no. If the refcount reaches 0, the struct file will be
freed. We will run into UAF when looking up smaps IIUC. It may point to
anything.
Lorenzo,
This problem seems more complicated than what I thought in the first
place. Making it is a real anonymous vma (vm_file is NULL) may be still
the best option. But we need figure out how we can keep compatible smaps.
Andrew,
Can you please drop this patch from your tree?
Thanks,
Yang
>
> LTP: starting mmap10
> BUG: kernel NULL pointer dereference, address: 0000000000000008
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 800000010d22a067 P4D 800000010d22a067 PUD 11ff09067 PMD 0
> Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 5 UID: 0 PID: 1719 Comm: mmap10 Not tainted 6.14.0-rc6-next-20250312 #3
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
> RIP: 0010:__rb_insert_augmented+0x2b/0x1d0
> Code: 0f 1e fa 48 89 f8 48 8b 3f 48 85 ff 0f 84 a4 01 00 00 41 55 49 89 f5 41 54 49 89 d4 55 53 48 8b 1f f6 c3 01 0f 85 e1 00 00 00 <48> 8b 53 08 48 39 fa 74 67 48 85 d2 74 09 f6 02 01 0f 84 a0 00 00
> RSP: 0018:ffffc90002b47cc8 EFLAGS: 00010246
> RAX: ffff8881143ab788 RBX: 0000000000000000 RCX: 00000000000009ff
> RDX: ffffffff814ad5d0 RSI: ffff888100bb5060 RDI: ffff8881143ab088
> RBP: ffff8881053af8c0 R08: ffff8881143ab700 R09: 00007ff6433f2000
> R10: 00007ff6433f2000 R11: ffff8881143ab000 R12: ffffffff814ad5d0
> R13: ffff888100bb5060 R14: ffff8881143ab700 R15: ffff8881143ab000
> FS: 00007ff643df1740(0000) GS:ffff8882b45bf000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 000000011b042000 CR4: 00000000000006f0
> Call Trace:
> <TASK>
> ? __die_body.cold+0x19/0x2b
> ? page_fault_oops+0xc4/0x1f0
> ? search_extable+0x26/0x30
> ? search_module_extables+0x3f/0x60
> ? exc_page_fault+0x6b/0x150
> ? asm_exc_page_fault+0x26/0x30
> ? __pfx_vma_interval_tree_augment_rotate+0x10/0x10
> ? __pfx_vma_interval_tree_augment_rotate+0x10/0x10
> ? __rb_insert_augmented+0x2b/0x1d0
> copy_mm+0x48a/0x8c0
> copy_process+0xf98/0x1930
> kernel_clone+0xb7/0x3b0
> __do_sys_clone+0x65/0x90
> do_syscall_64+0x9e/0x1a0
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7ff643eb2b00
> Code: 31 c0 31 d2 31 f6 bf 11 00 20 01 48 89 e5 53 48 83 ec 08 64 48 8b 04 25 10 00 00 00 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 89 c3 85 c0 75 31 64 48 8b 04 25 10 00 00
> RSP: 002b:00007ffdac219010 EFLAGS: 00000202 ORIG_RAX: 0000000000000038
> RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007ff643eb2b00
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
> RBP: 00007ffdac219020 R08: 0000000000000000 R09: 0000000000000000
> R10: 00007ff643df1a10 R11: 0000000000000202 R12: 0000000000000001
> R13: 0000000000000000 R14: 00007ff644036000 R15: 0000000000000000
> </TASK>
> Modules linked in:
> CR2: 0000000000000008
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:__rb_insert_augmented+0x2b/0x1d0
> Code: 0f 1e fa 48 89 f8 48 8b 3f 48 85 ff 0f 84 a4 01 00 00 41 55 49 89 f5 41 54 49 89 d4 55 53 48 8b 1f f6 c3 01 0f 85 e1 00 00 00 <48> 8b 53 08 48 39 fa 74 67 48 85 d2 74 09 f6 02 01 0f 84 a0 00 00
> RSP: 0018:ffffc90002b47cc8 EFLAGS: 00010246
> RAX: ffff8881143ab788 RBX: 0000000000000000 RCX: 00000000000009ff
> RDX: ffffffff814ad5d0 RSI: ffff888100bb5060 RDI: ffff8881143ab088
> RBP: ffff8881053af8c0 R08: ffff8881143ab700 R09: 00007ff6433f2000
> R10: 00007ff6433f2000 R11: ffff8881143ab000 R12: ffffffff814ad5d0
> R13: ffff888100bb5060 R14: ffff8881143ab700 R15: ffff8881143ab000
> FS: 00007ff643df1740(0000) GS:ffff8882b45bf000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 000000011b042000 CR4: 00000000000006f0
>
>
>
> LTP: starting mmap10
> Unable to handle kernel pointer dereference in virtual kernel address space
> Failing address: 0000000000000000 TEID: 0000000000000483
> Fault in home space mode while using kernel ASCE.
> AS:000000000247c007 R3:00000001ffffc007 S:00000001ffffb801 P:000000000000013d
> Oops: 0004 ilc:3 [#1] SMP
> Modules linked in:
> CPU: 0 UID: 0 PID: 665 Comm: mmap10 Not tainted 6.14.0-rc6-next-20250312 #16
> Hardware name: IBM 3931 A01 704 (KVM/Linux)
> Krnl PSW : 0704c00180000000 000003ffe0ee0440 (__rb_insert_augmented+0x60/0x210)
> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> Krnl GPRS: 00000000009ff000 0000000000000000 000000008e5f7508 0000000084a7ed08
> 00000000000009fe 0000000000000000 0000000000000000 0000037fe06c7b68
> 00000000801d0e90 000003ffe04158d0 0000000084a7ed08 0000000000000000
> 000003ffbb700000 00000000801d0e48 000003ffe0ee057c 0000037fe06c7a40
> Krnl Code: 000003ffe0ee0430: e31030080004 lg %r1,8(%r3)
> 000003ffe0ee0436: ec1200888064 cgrj %r1,%r2,8,000003ffe0ee0546
> #000003ffe0ee043c: b90400a3 lgr %r10,%r3
> >000003ffe0ee0440: e310b0100024 stg %r1,16(%r11)
> 000003ffe0ee0446: e3b030080024 stg %r11,8(%r3)
> 000003ffe0ee044c: ec180009007c cgij %r1,0,8,000003ffe0ee045e
> 000003ffe0ee0452: ec2b000100d9 aghik %r2,%r11,1
> 000003ffe0ee0458: e32010000024 stg %r2,0(%r1)
> Call Trace:
> [<000003ffe0ee0440>] __rb_insert_augmented+0x60/0x210
> [<000003ffe016d6c4>] dup_mmap+0x424/0x8c0
> [<000003ffe016dc62>] copy_mm+0x102/0x1c0
> [<000003ffe016e8ae>] copy_process+0x7ce/0x12b0
> [<000003ffe016f458>] kernel_clone+0x68/0x380
> [<000003ffe016f84a>] __do_sys_clone+0x5a/0x70
> [<000003ffe016faa0>] __s390x_sys_clone+0x40/0x50
> [<000003ffe011c9b6>] do_syscall.constprop.0+0x116/0x140
> [<000003ffe0ef1d64>] __do_syscall+0xd4/0x1c0
> [<000003ffe0efd044>] system_call+0x74/0x98
> Last Breaking-Event-Address:
> [<000003ffe0ee058a>] __rb_insert_augmented+0x1aa/0x210
> Kernel panic - not syncing: Fatal exception: panic_on_oops
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree
2025-03-13 3:04 ` Yang Shi
@ 2025-03-13 5:16 ` Lorenzo Stoakes
2025-03-13 17:42 ` Yang Shi
0 siblings, 1 reply; 7+ messages in thread
From: Lorenzo Stoakes @ 2025-03-13 5:16 UTC (permalink / raw)
To: Yang Shi
Cc: Vasily Gorbik, Andrew Morton, Liam.Howlett, vbabka, jannh,
oliver.sang, linux-mm, linux-kernel, Vasily Gorbik
On Wed, Mar 12, 2025 at 08:04:23PM -0700, Yang Shi wrote:
>
>
> On 3/12/25 4:55 PM, Vasily Gorbik wrote:
> > On Wed, Mar 12, 2025 at 03:15:21PM -0700, Yang Shi wrote:
> > > LKP reported 800% performance improvement for small-allocs benchmark
> > > from vm-scalability [1] with patch ("/dev/zero: make private mapping
> > > full anonymous mapping") [2], but the patch was nack'ed since it changes
> > > the output of smaps somewhat.
> > ...
> > > ---
> > > v2:
> > > * Added the comments in code suggested by Lorenzo
> > > * Collected R-b from Lorenze
> > >
> > > mm/vma.c | 18 ++++++++++++++++--
> > > 1 file changed, 16 insertions(+), 2 deletions(-)
> > Hi Yang,
> >
> > Replying to v2, as the code is the same as v1 in linux-next:
> >
> > The LTP test "mmap10" consistently triggers a kernel NULL pointer
> > dereference with this change, at least on x86 and s390. Reverting just
> > this single patch from linux-next fixes the issue.
>
> Hi Vasily,
>
> Thanks for the report. It is because dup_mmap() inserts the VMA into file
> rmap by checking whether vma->vm_file is NULL or not. This splat can be
> killed by skipping anonymous vma, but this actually will expose a more
> severe problem. The struct file refcount may be imbalance. The refcount is
> inc'ed in mmap, then inc'ed again by fork(), it is dec'ed when unmap or
> process exit. If we skip refcount inc in fork, we need skip refcount dec in
> unmap too, but there is still one refcount from mmap.
>
> Can we dec refcount in mmap if we see it is anonymous vma finally?
> Unfortunately, no. If the refcount reaches 0, the struct file will be freed.
> We will run into UAF when looking up smaps IIUC. It may point to anything.
>
> Lorenzo,
>
> This problem seems more complicated than what I thought in the first place.
> Making it is a real anonymous vma (vm_file is NULL) may be still the best
> option. But we need figure out how we can keep compatible smaps.
Ugh lord. I am not in favour of this for reasons aforementioned, and I _really_
don't want to special case this any more than we already do...
Let me think a bit about this also.
Maybe if you're at LSF we can chat about it there?
Thanks!
>
> Andrew,
>
> Can you please drop this patch from your tree?
>
> Thanks,
> Yang
>
> >
> > LTP: starting mmap10
> > BUG: kernel NULL pointer dereference, address: 0000000000000008
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 800000010d22a067 P4D 800000010d22a067 PUD 11ff09067 PMD 0
> > Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> > CPU: 5 UID: 0 PID: 1719 Comm: mmap10 Not tainted 6.14.0-rc6-next-20250312 #3
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
> > RIP: 0010:__rb_insert_augmented+0x2b/0x1d0
> > Code: 0f 1e fa 48 89 f8 48 8b 3f 48 85 ff 0f 84 a4 01 00 00 41 55 49 89 f5 41 54 49 89 d4 55 53 48 8b 1f f6 c3 01 0f 85 e1 00 00 00 <48> 8b 53 08 48 39 fa 74 67 48 85 d2 74 09 f6 02 01 0f 84 a0 00 00
> > RSP: 0018:ffffc90002b47cc8 EFLAGS: 00010246
> > RAX: ffff8881143ab788 RBX: 0000000000000000 RCX: 00000000000009ff
> > RDX: ffffffff814ad5d0 RSI: ffff888100bb5060 RDI: ffff8881143ab088
> > RBP: ffff8881053af8c0 R08: ffff8881143ab700 R09: 00007ff6433f2000
> > R10: 00007ff6433f2000 R11: ffff8881143ab000 R12: ffffffff814ad5d0
> > R13: ffff888100bb5060 R14: ffff8881143ab700 R15: ffff8881143ab000
> > FS: 00007ff643df1740(0000) GS:ffff8882b45bf000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000008 CR3: 000000011b042000 CR4: 00000000000006f0
> > Call Trace:
> > <TASK>
> > ? __die_body.cold+0x19/0x2b
> > ? page_fault_oops+0xc4/0x1f0
> > ? search_extable+0x26/0x30
> > ? search_module_extables+0x3f/0x60
> > ? exc_page_fault+0x6b/0x150
> > ? asm_exc_page_fault+0x26/0x30
> > ? __pfx_vma_interval_tree_augment_rotate+0x10/0x10
> > ? __pfx_vma_interval_tree_augment_rotate+0x10/0x10
> > ? __rb_insert_augmented+0x2b/0x1d0
> > copy_mm+0x48a/0x8c0
> > copy_process+0xf98/0x1930
> > kernel_clone+0xb7/0x3b0
> > __do_sys_clone+0x65/0x90
> > do_syscall_64+0x9e/0x1a0
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7ff643eb2b00
> > Code: 31 c0 31 d2 31 f6 bf 11 00 20 01 48 89 e5 53 48 83 ec 08 64 48 8b 04 25 10 00 00 00 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 89 c3 85 c0 75 31 64 48 8b 04 25 10 00 00
> > RSP: 002b:00007ffdac219010 EFLAGS: 00000202 ORIG_RAX: 0000000000000038
> > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007ff643eb2b00
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
> > RBP: 00007ffdac219020 R08: 0000000000000000 R09: 0000000000000000
> > R10: 00007ff643df1a10 R11: 0000000000000202 R12: 0000000000000001
> > R13: 0000000000000000 R14: 00007ff644036000 R15: 0000000000000000
> > </TASK>
> > Modules linked in:
> > CR2: 0000000000000008
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:__rb_insert_augmented+0x2b/0x1d0
> > Code: 0f 1e fa 48 89 f8 48 8b 3f 48 85 ff 0f 84 a4 01 00 00 41 55 49 89 f5 41 54 49 89 d4 55 53 48 8b 1f f6 c3 01 0f 85 e1 00 00 00 <48> 8b 53 08 48 39 fa 74 67 48 85 d2 74 09 f6 02 01 0f 84 a0 00 00
> > RSP: 0018:ffffc90002b47cc8 EFLAGS: 00010246
> > RAX: ffff8881143ab788 RBX: 0000000000000000 RCX: 00000000000009ff
> > RDX: ffffffff814ad5d0 RSI: ffff888100bb5060 RDI: ffff8881143ab088
> > RBP: ffff8881053af8c0 R08: ffff8881143ab700 R09: 00007ff6433f2000
> > R10: 00007ff6433f2000 R11: ffff8881143ab000 R12: ffffffff814ad5d0
> > R13: ffff888100bb5060 R14: ffff8881143ab700 R15: ffff8881143ab000
> > FS: 00007ff643df1740(0000) GS:ffff8882b45bf000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000008 CR3: 000000011b042000 CR4: 00000000000006f0
> >
> >
> >
> > LTP: starting mmap10
> > Unable to handle kernel pointer dereference in virtual kernel address space
> > Failing address: 0000000000000000 TEID: 0000000000000483
> > Fault in home space mode while using kernel ASCE.
> > AS:000000000247c007 R3:00000001ffffc007 S:00000001ffffb801 P:000000000000013d
> > Oops: 0004 ilc:3 [#1] SMP
> > Modules linked in:
> > CPU: 0 UID: 0 PID: 665 Comm: mmap10 Not tainted 6.14.0-rc6-next-20250312 #16
> > Hardware name: IBM 3931 A01 704 (KVM/Linux)
> > Krnl PSW : 0704c00180000000 000003ffe0ee0440 (__rb_insert_augmented+0x60/0x210)
> > R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > Krnl GPRS: 00000000009ff000 0000000000000000 000000008e5f7508 0000000084a7ed08
> > 00000000000009fe 0000000000000000 0000000000000000 0000037fe06c7b68
> > 00000000801d0e90 000003ffe04158d0 0000000084a7ed08 0000000000000000
> > 000003ffbb700000 00000000801d0e48 000003ffe0ee057c 0000037fe06c7a40
> > Krnl Code: 000003ffe0ee0430: e31030080004 lg %r1,8(%r3)
> > 000003ffe0ee0436: ec1200888064 cgrj %r1,%r2,8,000003ffe0ee0546
> > #000003ffe0ee043c: b90400a3 lgr %r10,%r3
> > >000003ffe0ee0440: e310b0100024 stg %r1,16(%r11)
> > 000003ffe0ee0446: e3b030080024 stg %r11,8(%r3)
> > 000003ffe0ee044c: ec180009007c cgij %r1,0,8,000003ffe0ee045e
> > 000003ffe0ee0452: ec2b000100d9 aghik %r2,%r11,1
> > 000003ffe0ee0458: e32010000024 stg %r2,0(%r1)
> > Call Trace:
> > [<000003ffe0ee0440>] __rb_insert_augmented+0x60/0x210
> > [<000003ffe016d6c4>] dup_mmap+0x424/0x8c0
> > [<000003ffe016dc62>] copy_mm+0x102/0x1c0
> > [<000003ffe016e8ae>] copy_process+0x7ce/0x12b0
> > [<000003ffe016f458>] kernel_clone+0x68/0x380
> > [<000003ffe016f84a>] __do_sys_clone+0x5a/0x70
> > [<000003ffe016faa0>] __s390x_sys_clone+0x40/0x50
> > [<000003ffe011c9b6>] do_syscall.constprop.0+0x116/0x140
> > [<000003ffe0ef1d64>] __do_syscall+0xd4/0x1c0
> > [<000003ffe0efd044>] system_call+0x74/0x98
> > Last Breaking-Event-Address:
> > [<000003ffe0ee058a>] __rb_insert_augmented+0x1aa/0x210
> > Kernel panic - not syncing: Fatal exception: panic_on_oops
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree
2025-03-13 5:16 ` Lorenzo Stoakes
@ 2025-03-13 17:42 ` Yang Shi
0 siblings, 0 replies; 7+ messages in thread
From: Yang Shi @ 2025-03-13 17:42 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Vasily Gorbik, Andrew Morton, Liam.Howlett, vbabka, jannh,
oliver.sang, linux-mm, linux-kernel, Vasily Gorbik
On 3/12/25 10:16 PM, Lorenzo Stoakes wrote:
> On Wed, Mar 12, 2025 at 08:04:23PM -0700, Yang Shi wrote:
>>
>> On 3/12/25 4:55 PM, Vasily Gorbik wrote:
>>> On Wed, Mar 12, 2025 at 03:15:21PM -0700, Yang Shi wrote:
>>>> LKP reported 800% performance improvement for small-allocs benchmark
>>>> from vm-scalability [1] with patch ("/dev/zero: make private mapping
>>>> full anonymous mapping") [2], but the patch was nack'ed since it changes
>>>> the output of smaps somewhat.
>>> ...
>>>> ---
>>>> v2:
>>>> * Added the comments in code suggested by Lorenzo
>>>> * Collected R-b from Lorenze
>>>>
>>>> mm/vma.c | 18 ++++++++++++++++--
>>>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>> Hi Yang,
>>>
>>> Replying to v2, as the code is the same as v1 in linux-next:
>>>
>>> The LTP test "mmap10" consistently triggers a kernel NULL pointer
>>> dereference with this change, at least on x86 and s390. Reverting just
>>> this single patch from linux-next fixes the issue.
>> Hi Vasily,
>>
>> Thanks for the report. It is because dup_mmap() inserts the VMA into file
>> rmap by checking whether vma->vm_file is NULL or not. This splat can be
>> killed by skipping anonymous vma, but this actually will expose a more
>> severe problem. The struct file refcount may be imbalance. The refcount is
>> inc'ed in mmap, then inc'ed again by fork(), it is dec'ed when unmap or
>> process exit. If we skip refcount inc in fork, we need skip refcount dec in
>> unmap too, but there is still one refcount from mmap.
>>
>> Can we dec refcount in mmap if we see it is anonymous vma finally?
>> Unfortunately, no. If the refcount reaches 0, the struct file will be freed.
>> We will run into UAF when looking up smaps IIUC. It may point to anything.
>>
>> Lorenzo,
>>
>> This problem seems more complicated than what I thought in the first place.
>> Making it is a real anonymous vma (vm_file is NULL) may be still the best
>> option. But we need figure out how we can keep compatible smaps.
> Ugh lord. I am not in favour of this for reasons aforementioned, and I _really_
> don't want to special case this any more than we already do...
Yeah, understood. I meant we should find a way to make smaps unchanged
or compatible.
>
> Let me think a bit about this also.
>
> Maybe if you're at LSF we can chat about it there?
Unfortunately I can't make it this year. Have a fun!
Thanks,
Yang
>
> Thanks!
>
>> Andrew,
>>
>> Can you please drop this patch from your tree?
>>
>> Thanks,
>> Yang
>>
>>> LTP: starting mmap10
>>> BUG: kernel NULL pointer dereference, address: 0000000000000008
>>> #PF: supervisor read access in kernel mode
>>> #PF: error_code(0x0000) - not-present page
>>> PGD 800000010d22a067 P4D 800000010d22a067 PUD 11ff09067 PMD 0
>>> Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>>> CPU: 5 UID: 0 PID: 1719 Comm: mmap10 Not tainted 6.14.0-rc6-next-20250312 #3
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
>>> RIP: 0010:__rb_insert_augmented+0x2b/0x1d0
>>> Code: 0f 1e fa 48 89 f8 48 8b 3f 48 85 ff 0f 84 a4 01 00 00 41 55 49 89 f5 41 54 49 89 d4 55 53 48 8b 1f f6 c3 01 0f 85 e1 00 00 00 <48> 8b 53 08 48 39 fa 74 67 48 85 d2 74 09 f6 02 01 0f 84 a0 00 00
>>> RSP: 0018:ffffc90002b47cc8 EFLAGS: 00010246
>>> RAX: ffff8881143ab788 RBX: 0000000000000000 RCX: 00000000000009ff
>>> RDX: ffffffff814ad5d0 RSI: ffff888100bb5060 RDI: ffff8881143ab088
>>> RBP: ffff8881053af8c0 R08: ffff8881143ab700 R09: 00007ff6433f2000
>>> R10: 00007ff6433f2000 R11: ffff8881143ab000 R12: ffffffff814ad5d0
>>> R13: ffff888100bb5060 R14: ffff8881143ab700 R15: ffff8881143ab000
>>> FS: 00007ff643df1740(0000) GS:ffff8882b45bf000(0000) knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000000000008 CR3: 000000011b042000 CR4: 00000000000006f0
>>> Call Trace:
>>> <TASK>
>>> ? __die_body.cold+0x19/0x2b
>>> ? page_fault_oops+0xc4/0x1f0
>>> ? search_extable+0x26/0x30
>>> ? search_module_extables+0x3f/0x60
>>> ? exc_page_fault+0x6b/0x150
>>> ? asm_exc_page_fault+0x26/0x30
>>> ? __pfx_vma_interval_tree_augment_rotate+0x10/0x10
>>> ? __pfx_vma_interval_tree_augment_rotate+0x10/0x10
>>> ? __rb_insert_augmented+0x2b/0x1d0
>>> copy_mm+0x48a/0x8c0
>>> copy_process+0xf98/0x1930
>>> kernel_clone+0xb7/0x3b0
>>> __do_sys_clone+0x65/0x90
>>> do_syscall_64+0x9e/0x1a0
>>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>> RIP: 0033:0x7ff643eb2b00
>>> Code: 31 c0 31 d2 31 f6 bf 11 00 20 01 48 89 e5 53 48 83 ec 08 64 48 8b 04 25 10 00 00 00 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 89 c3 85 c0 75 31 64 48 8b 04 25 10 00 00
>>> RSP: 002b:00007ffdac219010 EFLAGS: 00000202 ORIG_RAX: 0000000000000038
>>> RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007ff643eb2b00
>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
>>> RBP: 00007ffdac219020 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 00007ff643df1a10 R11: 0000000000000202 R12: 0000000000000001
>>> R13: 0000000000000000 R14: 00007ff644036000 R15: 0000000000000000
>>> </TASK>
>>> Modules linked in:
>>> CR2: 0000000000000008
>>> ---[ end trace 0000000000000000 ]---
>>> RIP: 0010:__rb_insert_augmented+0x2b/0x1d0
>>> Code: 0f 1e fa 48 89 f8 48 8b 3f 48 85 ff 0f 84 a4 01 00 00 41 55 49 89 f5 41 54 49 89 d4 55 53 48 8b 1f f6 c3 01 0f 85 e1 00 00 00 <48> 8b 53 08 48 39 fa 74 67 48 85 d2 74 09 f6 02 01 0f 84 a0 00 00
>>> RSP: 0018:ffffc90002b47cc8 EFLAGS: 00010246
>>> RAX: ffff8881143ab788 RBX: 0000000000000000 RCX: 00000000000009ff
>>> RDX: ffffffff814ad5d0 RSI: ffff888100bb5060 RDI: ffff8881143ab088
>>> RBP: ffff8881053af8c0 R08: ffff8881143ab700 R09: 00007ff6433f2000
>>> R10: 00007ff6433f2000 R11: ffff8881143ab000 R12: ffffffff814ad5d0
>>> R13: ffff888100bb5060 R14: ffff8881143ab700 R15: ffff8881143ab000
>>> FS: 00007ff643df1740(0000) GS:ffff8882b45bf000(0000) knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000000000008 CR3: 000000011b042000 CR4: 00000000000006f0
>>>
>>>
>>>
>>> LTP: starting mmap10
>>> Unable to handle kernel pointer dereference in virtual kernel address space
>>> Failing address: 0000000000000000 TEID: 0000000000000483
>>> Fault in home space mode while using kernel ASCE.
>>> AS:000000000247c007 R3:00000001ffffc007 S:00000001ffffb801 P:000000000000013d
>>> Oops: 0004 ilc:3 [#1] SMP
>>> Modules linked in:
>>> CPU: 0 UID: 0 PID: 665 Comm: mmap10 Not tainted 6.14.0-rc6-next-20250312 #16
>>> Hardware name: IBM 3931 A01 704 (KVM/Linux)
>>> Krnl PSW : 0704c00180000000 000003ffe0ee0440 (__rb_insert_augmented+0x60/0x210)
>>> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>>> Krnl GPRS: 00000000009ff000 0000000000000000 000000008e5f7508 0000000084a7ed08
>>> 00000000000009fe 0000000000000000 0000000000000000 0000037fe06c7b68
>>> 00000000801d0e90 000003ffe04158d0 0000000084a7ed08 0000000000000000
>>> 000003ffbb700000 00000000801d0e48 000003ffe0ee057c 0000037fe06c7a40
>>> Krnl Code: 000003ffe0ee0430: e31030080004 lg %r1,8(%r3)
>>> 000003ffe0ee0436: ec1200888064 cgrj %r1,%r2,8,000003ffe0ee0546
>>> #000003ffe0ee043c: b90400a3 lgr %r10,%r3
>>> >000003ffe0ee0440: e310b0100024 stg %r1,16(%r11)
>>> 000003ffe0ee0446: e3b030080024 stg %r11,8(%r3)
>>> 000003ffe0ee044c: ec180009007c cgij %r1,0,8,000003ffe0ee045e
>>> 000003ffe0ee0452: ec2b000100d9 aghik %r2,%r11,1
>>> 000003ffe0ee0458: e32010000024 stg %r2,0(%r1)
>>> Call Trace:
>>> [<000003ffe0ee0440>] __rb_insert_augmented+0x60/0x210
>>> [<000003ffe016d6c4>] dup_mmap+0x424/0x8c0
>>> [<000003ffe016dc62>] copy_mm+0x102/0x1c0
>>> [<000003ffe016e8ae>] copy_process+0x7ce/0x12b0
>>> [<000003ffe016f458>] kernel_clone+0x68/0x380
>>> [<000003ffe016f84a>] __do_sys_clone+0x5a/0x70
>>> [<000003ffe016faa0>] __s390x_sys_clone+0x40/0x50
>>> [<000003ffe011c9b6>] do_syscall.constprop.0+0x116/0x140
>>> [<000003ffe0ef1d64>] __do_syscall+0xd4/0x1c0
>>> [<000003ffe0efd044>] system_call+0x74/0x98
>>> Last Breaking-Event-Address:
>>> [<000003ffe0ee058a>] __rb_insert_augmented+0x1aa/0x210
>>> Kernel panic - not syncing: Fatal exception: panic_on_oops
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree
2025-03-12 22:15 [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree Yang Shi
2025-03-12 23:55 ` Vasily Gorbik
@ 2025-03-14 3:16 ` Lai, Yi
2025-03-25 8:40 ` kernel test robot
2 siblings, 0 replies; 7+ messages in thread
From: Lai, Yi @ 2025-03-14 3:16 UTC (permalink / raw)
To: Yang Shi
Cc: Liam.Howlett, lorenzo.stoakes, vbabka, jannh, oliver.sang, akpm,
linux-mm, linux-kernel, yi1.lai, syzkaller-bugs
On Wed, Mar 12, 2025 at 03:15:21PM -0700, Yang Shi wrote:
> LKP reported 800% performance improvement for small-allocs benchmark
> from vm-scalability [1] with patch ("/dev/zero: make private mapping
> full anonymous mapping") [2], but the patch was nack'ed since it changes
> the output of smaps somewhat.
>
> The profiling shows one of the major sources of the performance
> improvement is the less contention to i_mmap_rwsem.
>
> The small-allocs benchmark creates a lot of 40K size memory maps by
> mmap'ing private /dev/zero then triggers page fault on the mappings.
> When creating private mapping for /dev/zero, the anonymous VMA is
> created, but it has valid vm_file. Kernel basically assumes anonymous
> VMAs should have NULL vm_file, for example, mmap inserts VMA to the file
> rmap tree if vm_file is not NULL. So the private /dev/zero mapping
> will be inserted to the file rmap tree, this resulted in the contention
> to i_mmap_rwsem. But it is actually anonymous VMA, so it is pointless
> to insert it to file rmap tree.
>
> Skip anonymous VMA for this case. Over 400% performance improvement was
> reported [3].
>
> It is not on par with the 800% improvement from the original patch. It is
> because page fault handler needs to access some members of struct file
> if vm_file is not NULL, for example, f_mode and f_mapping. They are in
> the same cacheline with file refcount. When mmap'ing a file the file
> refcount is inc'ed and dec'ed, this caused bad cache false sharing
> problem. The further debug showed checking whether the VMA is anonymous
> or not can alleviate the problem. But I'm not sure whether it is the
> best way to handle it, maybe we should consider shuffle the layout of
> struct file.
>
> However it sounds rare that real life applications would create that
> many maps with mmap'ing private /dev/zero and share the same struct
> file, so the cache false sharing problem may be not that bad. But
> i_mmap_rwsem contention problem seems more real since all /dev/zero
> private mappings even from different applications share the same struct
> address_space so the same i_mmap_rwsem. Inserting anonymous VMA into
> file rmap tree is also a broken behavior. It is worth fixing from this
> perspective too.
>
> [1] https://lore.kernel.org/linux-mm/202501281038.617c6b60-lkp@intel.com/
> [2] https://lore.kernel.org/linux-mm/20250113223033.4054534-1-yang@os.amperecomputing.com/
> [3] https://lore.kernel.org/linux-mm/Z6RshwXCWhAGoMOK@xsang-OptiPlex-9020/#t
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
> ---
> v2:
> * Added the comments in code suggested by Lorenzo
> * Collected R-b from Lorenze
>
> mm/vma.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index c7abef5177cc..2fe99d181cfd 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -1648,6 +1648,10 @@ static void unlink_file_vma_batch_process(struct unlink_vma_file_batch *vb)
> void unlink_file_vma_batch_add(struct unlink_vma_file_batch *vb,
> struct vm_area_struct *vma)
> {
> + /* Rare, but e.g. /dev/zero sets vma->vm_file on an anon VMA */
> + if (vma_is_anonymous(vma))
> + return;
> +
> if (vma->vm_file == NULL)
> return;
>
> @@ -1671,8 +1675,13 @@ void unlink_file_vma_batch_final(struct unlink_vma_file_batch *vb)
> */
> void unlink_file_vma(struct vm_area_struct *vma)
> {
> - struct file *file = vma->vm_file;
> + struct file *file;
> +
> + /* Rare, but e.g. /dev/zero sets vma->vm_file on an anon VMA */
> + if (vma_is_anonymous(vma))
> + return;
>
> + file = vma->vm_file;
> if (file) {
> struct address_space *mapping = file->f_mapping;
>
> @@ -1684,9 +1693,14 @@ void unlink_file_vma(struct vm_area_struct *vma)
>
> void vma_link_file(struct vm_area_struct *vma)
> {
> - struct file *file = vma->vm_file;
> + struct file *file;
> struct address_space *mapping;
>
> + /* Rare, but e.g. /dev/zero sets vma->vm_file on an anon VMA */
> + if (vma_is_anonymous(vma))
> + return;
> +
> + file = vma->vm_file;
> if (file) {
> mapping = file->f_mapping;
> i_mmap_lock_write(mapping);
> --
> 2.48.1
>
Hi Yang Shi,
Greetings!
I used Syzkaller and found that there are two issues in v6.14-rc6 and were bisected to your patch as the first bad commit:
general protection fault in vma_interval_tree_insert_after
KASAN: slab-use-after-free Read in vma_interval_tree_insert
I see that you asked the patch to be dropped in maintainer's tree. I hope the issue dmesg can be insightful to you and the reproduction binary can be served to test your new design.
Issue one - general protection fault in vma_interval_tree_insert_after:
"
[ 26.488762] ? __rb_insert_augmented+0x7a/0x9d0
[ 26.489380] ? down_write+0x155/0x210
[ 26.489879] ? __pfx_down_write+0x10/0x10
[ 26.490444] vma_interval_tree_insert_after+0x2a2/0x370
[ 26.491190] copy_mm+0x11f6/0x2740
[ 26.491702] ? __pfx_copy_mm+0x10/0x10
[ 26.492242] ? _raw_spin_unlock_irqrestore+0x35/0x70
[ 26.492934] ? lockdep_hardirqs_on+0x89/0x110
[ 26.493559] ? __raw_spin_lock_init+0x44/0x120
[ 26.494201] copy_process+0x29d8/0x69c0
[ 26.494752] ? __pfx_copy_process+0x10/0x10
[ 26.495352] ? lock_is_held_type+0xef/0x150
[ 26.495947] ? __kasan_check_read+0x15/0x20
[ 26.496548] ? __lock_acquire+0x1bad/0x5d60
[ 26.497202] kernel_clone+0xfc/0x8c0
[ 26.497574] ? __pfx_kernel_clone+0x10/0x10
[ 26.497979] ? __pfx___lock_acquire+0x10/0x10
[ 26.498370] ? __pfx_do_mmap+0x10/0x10
[ 26.498722] __do_sys_clone+0xf5/0x140
[ 26.499074] ? __pfx___do_sys_clone+0x10/0x10
[ 26.499478] ? seqcount_lockdep_reader_access.constprop.0+0xc0/0xd0
[ 26.500051] ? __sanitizer_cov_trace_cmp4+0x1a/0x20
[ 26.500485] ? ktime_get_coarse_real_ts64+0xb6/0x100
[ 26.500961] __x64_sys_clone+0xc7/0x150
[ 26.501445] ? syscall_trace_enter+0x14d/0x280
[ 26.501985] x64_sys_call+0x1acf/0x2150
[ 26.502475] do_syscall_64+0x6d/0x140
[ 26.502956] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 26.503410] RIP: 0033:0x7f286523ee5d
[ 26.503739] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
[ 26.505269] RSP: 002b:00007ffff77719f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[ 26.505908] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f286523ee5d
[ 26.506509] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 26.507116] RBP: 00007ffff7771a40 R08: 0000000000000000 R09: 0000000000000000
[ 26.507717] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffff7771bb8
[ 26.508309] R13: 000000000040183a R14: 0000000000403e08 R15: 00007f28655e3000
[ 26.508911] </TASK>
[ 26.509112] Modules linked in:
[ 26.509761] ---[ end trace 0000000000000000 ]---
[ 26.510167] RIP: 0010:__rb_insert_augmented+0x7a/0x9d0
[ 26.510615] Code: 89 e2 48 c1 ea 03 42 80 3c 32 00 0f 85 9c 05 00 00 4d 8b 2c 24 41 f6 c5 01 0f 85 88 01 00 00 4d 8d 45 08 4c 89 c2 48 c1 ea 03 <42> 80 3c 32 00 0f 85 95 05 00 00 4d 8b 7d 08 4d 39 e7 0f 84 78 01
[ 26.512126] RSP: 0018:ffff88801d53f8d0 EFLAGS: 00010202
[ 26.512569] RAX: ffffffff81d744d0 RBX: ffff888013c26970 RCX: ffff88800ed4ea80
[ 26.513192] RDX: 0000000000000001 RSI: 1ffff11002784d2e RDI: ffff888013c26970
[ 26.513787] RBP: ffff88801d53f918 R08: 0000000000000008 R09: ffffed1001da9d62
[ 26.514376] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888021604830
[ 26.514974] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888021604838
[ 26.515568] FS: 00007f2865596740(0000) GS:ffff8880e368d000(0000) knlGS:0000000000000000
[ 26.516235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 26.516742] CR2: 0000000020000000 CR3: 000000001f1f4003 CR4: 0000000000770ef0
[ 26.517340] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 26.517930] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 26.518522] PKRU: 55555554
"
All detailed into can be found at:
https://github.com/laifryiee/syzkaller_logs/tree/main/250313_011927_vma_interval_tree_insert_after
Syzkaller repro code:
https://github.com/laifryiee/syzkaller_logs/tree/main/250313_011927_vma_interval_tree_insert_after/repro.c
Syzkaller repro syscall steps:
https://github.com/laifryiee/syzkaller_logs/tree/main/250313_011927_vma_interval_tree_insert_after/repro.prog
Syzkaller report:
https://github.com/laifryiee/syzkaller_logs/tree/main/250313_011927_vma_interval_tree_insert_after/repro.report
Kconfig(make olddefconfig):
https://github.com/laifryiee/syzkaller_logs/tree/main/250313_011927_vma_interval_tree_insert_after/kconfig_origin
Bisect info:
https://github.com/laifryiee/syzkaller_logs/tree/main/250313_011927_vma_interval_tree_insert_after/bisect_info.log
bzImage:
https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/250313_011927_vma_interval_tree_insert_after/bzImage_eea255893718268e1ab852fb52f70c613d109b99
Issue dmesg:
https://github.com/laifryiee/syzkaller_logs/blob/main/250313_011927_vma_interval_tree_insert_after/eea255893718268e1ab852fb52f70c613d109b99_dmesg.log
Issue two - KASAN: slab-use-after-free Read in vma_interval_tree_insert
"
[ 18.362663] ==================================================================
[ 18.363058] BUG: KASAN: slab-use-after-free in vma_interval_tree_insert+0x3ac/0x460
[ 18.363448] Read of size 8 at addr ffff8880178025c8 by task repro/731
[ 18.363756]
[ 18.363850] CPU: 1 UID: 0 PID: 731 Comm: repro Not tainted 6.14.0-rc6-next-20250311-eea255893718 #1
[ 18.363858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 18.363865] Call Trace:
[ 18.363872] <TASK>
[ 18.363877] dump_stack_lvl+0xea/0x150
[ 18.363905] print_report+0xce/0x660
[ 18.363918] ? vma_interval_tree_insert+0x3ac/0x460
[ 18.363926] ? kasan_complete_mode_report_info+0x80/0x200
[ 18.363934] ? vma_interval_tree_insert+0x3ac/0x460
[ 18.363940] kasan_report+0xd6/0x110
[ 18.363946] ? vma_interval_tree_insert+0x3ac/0x460
[ 18.363955] __asan_report_load8_noabort+0x18/0x20
[ 18.363961] vma_interval_tree_insert+0x3ac/0x460
[ 18.363969] vma_prepare+0x23f/0x6b0
[ 18.363981] __split_vma+0x8df/0xe70
[ 18.363988] ? __pfx___split_vma+0x10/0x10
[ 18.363995] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[ 18.364007] ? mas_walk+0x6a7/0x8b0
[ 18.364016] vms_gather_munmap_vmas+0x17b/0xd40
[ 18.364024] __mmap_region+0x312/0x23e0
[ 18.364032] ? __pfx___mmap_region+0x10/0x10
[ 18.364039] ? __kasan_check_read+0x15/0x20
[ 18.364049] ? mark_lock.part.0+0xf2/0x17a0
[ 18.364063] ? __pfx_mark_lock.part.0+0x10/0x10
[ 18.364069] ? stack_trace_save+0x96/0xd0
[ 18.364094] ? __this_cpu_preempt_check+0x21/0x30
[ 18.364107] ? lock_is_held_type+0xef/0x150
[ 18.364114] mmap_region+0x1c0/0x3e0
[ 18.364120] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[ 18.364128] do_mmap+0xe0c/0x1270
[ 18.364137] ? __pfx_do_mmap+0x10/0x10
[ 18.364144] ? down_write_killable+0x163/0x250
[ 18.364152] ? __pfx_down_write_killable+0x10/0x10
[ 18.364157] ? __this_cpu_preempt_check+0x21/0x30
[ 18.364166] vm_mmap_pgoff+0x233/0x3d0
[ 18.364176] ? __pfx_vm_mmap_pgoff+0x10/0x10
[ 18.364182] ? __fget_files+0x204/0x3b0
[ 18.364196] ksys_mmap_pgoff+0x3dc/0x520
[ 18.364206] __x64_sys_mmap+0x139/0x1d0
[ 18.364218] x64_sys_call+0x200d/0x2150
[ 18.364226] do_syscall_64+0x6d/0x140
[ 18.364235] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 18.364241] RIP: 0033:0x7ff9b4c3ee5d
[ 18.364249] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
[ 18.364255] RSP: 002b:00007ffdbb3b6718 EFLAGS: 00000216 ORIG_RAX: 0000000000000009
[ 18.364269] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff9b4c3ee5d
[ 18.364272] RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000020ffc000
[ 18.364276] RBP: 00007ffdbb3b6740 R08: 0000000000000005 R09: 0000000000000000
[ 18.364279] R10: 0000000000000012 R11: 0000000000000216 R12: 00007ffdbb3b6898
[ 18.364283] R13: 000000000040181e R14: 0000000000403e08 R15: 00007ff9b4f19000
[ 18.364290] </TASK>
[ 18.364292]
[ 18.376442] Allocated by task 730:
[ 18.376616] kasan_save_stack+0x2c/0x60
[ 18.376812] kasan_save_track+0x18/0x40
[ 18.377004] kasan_save_alloc_info+0x3c/0x50
[ 18.377220] __kasan_slab_alloc+0x62/0x80
[ 18.377417] kmem_cache_alloc_noprof+0x13d/0x440
[ 18.377649] vm_area_alloc+0x29/0x180
[ 18.377839] __mmap_region+0xced/0x23e0
[ 18.378033] mmap_region+0x1c0/0x3e0
[ 18.378464] do_mmap+0xe0c/0x1270
[ 18.378807] vm_mmap_pgoff+0x233/0x3d0
[ 18.379188] ksys_mmap_pgoff+0x3dc/0x520
[ 18.379584] __x64_sys_mmap+0x139/0x1d0
[ 18.379968] x64_sys_call+0x200d/0x2150
[ 18.380353] do_syscall_64+0x6d/0x140
[ 18.380724] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 18.381219]
[ 18.381389] Freed by task 24:
[ 18.381694] kasan_save_stack+0x2c/0x60
[ 18.382191] kasan_save_track+0x18/0x40
[ 18.382387] kasan_save_free_info+0x3f/0x60
[ 18.382591] __kasan_slab_free+0x3d/0x60
[ 18.382784] slab_free_after_rcu_debug+0xdb/0x2b0
[ 18.383016] rcu_core+0x86b/0x1920
[ 18.383198] rcu_core_si+0x12/0x20
[ 18.383368] handle_softirqs+0x1c5/0x860
[ 18.383563] run_ksoftirqd+0x46/0x70
[ 18.383739] smpboot_thread_fn+0x666/0xa20
[ 18.383942] kthread+0x444/0x980
[ 18.384114] ret_from_fork+0x56/0x90
[ 18.384296] ret_from_fork_asm+0x1a/0x30
[ 18.384487]
[ 18.384572] Last potentially related work creation:
[ 18.384804] kasan_save_stack+0x2c/0x60
[ 18.384993] kasan_record_aux_stack+0x93/0xa0
[ 18.385211] kmem_cache_free+0x1b8/0x540
[ 18.385402] vm_area_free+0xa5/0xd0
[ 18.385584] remove_vma+0x135/0x180
[ 18.385763] vms_complete_munmap_vmas+0x432/0x810
[ 18.386000] __mmap_region+0x70c/0x23e0
[ 18.386193] mmap_region+0x1c0/0x3e0
[ 18.386377] do_mmap+0xe0c/0x1270
[ 18.386545] vm_mmap_pgoff+0x233/0x3d0
[ 18.386737] ksys_mmap_pgoff+0x3dc/0x520
[ 18.386937] __x64_sys_mmap+0x139/0x1d0
"
All detailed into can be found at:
https://github.com/laifryiee/syzkaller_logs/tree/main/250313_133334_vma_interval_tree_insert
Syzkaller repro code:
https://github.com/laifryiee/syzkaller_logs/tree/main/250313_133334_vma_interval_tree_insert/repro.c
Syzkaller repro syscall steps:
https://github.com/laifryiee/syzkaller_logs/tree/main/250313_133334_vma_interval_tree_insert/repro.prog
Syzkaller report:
https://github.com/laifryiee/syzkaller_logs/tree/main/250313_133334_vma_interval_tree_insert/repro.report
Kconfig(make olddefconfig):
https://github.com/laifryiee/syzkaller_logs/tree/main/250313_133334_vma_interval_tree_insert/kconfig_origin
Bisect info:
https://github.com/laifryiee/syzkaller_logs/tree/main/250313_133334_vma_interval_tree_insert/bisect_info.log
Issue dmesg:
https://github.com/laifryiee/syzkaller_logs/blob/main/250313_133334_vma_interval_tree_insert/eea255893718268e1ab852fb52f70c613d109b99_dmesg.log
Regards,
Yi Lai
---
If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.
How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
// Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost
After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/
Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has
Fill the bzImage file into above start3.sh to load the target kernel in vm.
Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree
2025-03-12 22:15 [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree Yang Shi
2025-03-12 23:55 ` Vasily Gorbik
2025-03-14 3:16 ` Lai, Yi
@ 2025-03-25 8:40 ` kernel test robot
2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-03-25 8:40 UTC (permalink / raw)
To: Yang Shi
Cc: oe-lkp, lkp, kernel test robot, Lorenzo Stoakes, linux-mm, ltp,
Liam.Howlett, vbabka, jannh, akpm, yang, linux-kernel
hi, Yang Shi,
just in case below report could supply any further useful information to you.
Hello,
kernel test robot noticed "Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]PREEMPT_SMP_KASAN_PTI" on:
commit: 13671c9499a4883f6bece7229463ff89a48709f6 ("[v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree")
url: https://github.com/intel-lab-lkp/linux/commits/Yang-Shi/mm-vma-skip-anonymous-vma-when-inserting-vma-to-file-rmap-tree/20250313-061727
base: v6.14-rc6
patch link: https://lore.kernel.org/all/20250312221521.1255690-1-yang@os.amperecomputing.com/
patch subject: [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree
in testcase: ltp
version: ltp-x86_64-042eff32a-1_20250322
with following parameters:
disk: 1HDD
test: mm-00
config: x86_64-rhel-9.4-ltp
compiler: gcc-12
test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202503251554.34a0b29b-lkp@intel.com
[ 557.087938][ T349] mmapstress10 0 TINFO : Using /tmp/ltp-2cGaEA7GG8/LTP_mmaUxM2MU as tmpdir (tmpfs filesystem)
[ 557.087946][ T349]
[ 557.102809][ T3834] LTP: starting mmap10
[ 557.103401][ T349] mmapstress10 0 TINFO : Using /tmp/ltp-2cGaEA7GG8/LTP_mmaUxM2MU as tmpdir (tmpfs filesystem)
[ 557.106782][ T349]
[ 557.119531][T141949] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
[ 557.121506][ T349] mmapstress10 0 TINFO : Using /tmp/ltp-2cGaEA7GG8/LTP_mmaUxM2MU as tmpdir (tmpfs filesystem)
[ 557.132309][T141949] KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
[ 557.132314][T141949] CPU: 1 UID: 0 PID: 141949 Comm: mmap10 Tainted: G I 6.14.0-rc6-00001-g13671c9499a4 #1
[ 557.132319][T141949] Tainted: [I]=FIRMWARE_WORKAROUND
[ 557.143013][ T349]
[ 557.151409][T141949] Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 1.2.0 12/22/2016
[557.151411][T141949] RIP: 0010:__rb_insert_augmented (kbuild/src/consumer/lib/rbtree.c:115 kbuild/src/consumer/lib/rbtree.c:459)
[ 557.164270][ T349] mmapstress10 0 TINFO : Using /tmp/ltp-2cGaEA7GG8/LTP_mmaUxM2MU as tmpdir (tmpfs filesystem)
[ 557.167616][T141949] Code: 00 48 89 da 48 c1 ea 03 80 3c 02 00 0f 85 a0 05 00 00 48 8b 2b 40 f6 c5 01 0f 85 44 05 00 00 48 8d 55 08 48 89 d1 48 c1 e9 03 <80> 3c 01 00 0f 85 94 05 00 00 4c 8b 6d 08 49 39 dd 0f 84 7f 01 00
All code
========
0: 00 48 89 add %cl,-0x77(%rax)
3: da 48 c1 fimull -0x3f(%rax)
6: ea (bad)
7: 03 80 3c 02 00 0f add 0xf00023c(%rax),%eax
d: 85 a0 05 00 00 48 test %esp,0x48000005(%rax)
13: 8b 2b mov (%rbx),%ebp
15: 40 f6 c5 01 test $0x1,%bpl
19: 0f 85 44 05 00 00 jne 0x563
1f: 48 8d 55 08 lea 0x8(%rbp),%rdx
23: 48 89 d1 mov %rdx,%rcx
26: 48 c1 e9 03 shr $0x3,%rcx
2a:* 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1) <-- trapping instruction
2e: 0f 85 94 05 00 00 jne 0x5c8
34: 4c 8b 6d 08 mov 0x8(%rbp),%r13
38: 49 39 dd cmp %rbx,%r13
3b: 0f .byte 0xf
3c: 84 7f 01 test %bh,0x1(%rdi)
...
Code starting with the faulting instruction
===========================================
0: 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1)
4: 0f 85 94 05 00 00 jne 0x59e
a: 4c 8b 6d 08 mov 0x8(%rbp),%r13
e: 49 39 dd cmp %rbx,%r13
11: 0f .byte 0xf
12: 84 7f 01 test %bh,0x1(%rdi)
...
[ 557.167620][T141949] RSP: 0018:ffffc9002edff800 EFLAGS: 00010202
[ 557.169827][ T349]
[ 557.178054][T141949] RAX: dffffc0000000000 RBX: ffff88810b878308 RCX: 0000000000000001
[ 557.178057][T141949] RDX: 0000000000000008 RSI: ffff8881051ec2f0 RDI: ffff8887de397c58
[ 557.178059][T141949] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1020a3d868
[ 557.178061][T141949] R10: ffff8881051ec347 R11: ffff8887de397c20 R12: ffff8887de397c58
[ 557.185709][ T349] mmapstress10 0 TINFO : Using /tmp/ltp-2cGaEA7GG8/LTP_mmaUxM2MU as tmpdir (tmpfs filesystem)
[ 557.194702][T141949] R13: ffff8881051ec2a8 R14: ffffffff81c1fa50 R15: ffff8881051ec2f0
[ 557.194704][T141949] FS: 00007f318f741740(0000) GS:ffff888759880000(0000) knlGS:0000000000000000
[ 557.194707][T141949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 557.214407][ T349]
[ 557.220440][T141949] CR2: 00007f318f917710 CR3: 000000015f928002 CR4: 00000000003726f0
[ 557.220442][T141949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 557.220444][T141949] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 557.220445][T141949] Call Trace:
[ 557.220447][T141949] <TASK>
[ 557.224320][ T349] mmapstress10 0 TINFO : Using /tmp/ltp-2cGaEA7GG8/LTP_mmaUxM2MU as tmpdir (tmpfs filesystem)
[557.230618][T141949] ? die_addr (kbuild/src/consumer/arch/x86/kernel/dumpstack.c:421 kbuild/src/consumer/arch/x86/kernel/dumpstack.c:460)
[557.230624][T141949] ? exc_general_protection (kbuild/src/consumer/arch/x86/kernel/traps.c:751 kbuild/src/consumer/arch/x86/kernel/traps.c:693)
[ 557.238606][ T349]
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250325/202503251554.34a0b29b-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-03-25 8:41 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-12 22:15 [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree Yang Shi
2025-03-12 23:55 ` Vasily Gorbik
2025-03-13 3:04 ` Yang Shi
2025-03-13 5:16 ` Lorenzo Stoakes
2025-03-13 17:42 ` Yang Shi
2025-03-14 3:16 ` Lai, Yi
2025-03-25 8:40 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox