linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] /dev/zero: make private mapping full anonymous mapping
@ 2025-01-13 22:30 Yang Shi
  2025-01-14 12:05 ` Lorenzo Stoakes
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Yang Shi @ 2025-01-13 22:30 UTC (permalink / raw)
  To: arnd, gregkh, Liam.Howlett, lorenzo.stoakes, vbabka, jannh,
	willy, liushixin2, akpm
  Cc: yang, linux-mm, linux-kernel

When creating private mapping for /dev/zero, the driver makes it an
anonymous mapping by calling set_vma_anonymous().  But it just sets
vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.

This is a special case and the VMA doesn't look like either anonymous VMA
or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].

It seems pointless to keep such special case.  Making private /dev/zero
mapping a full anonymous mapping doesn't change the semantic of
/dev/zero either.

The user visible effect is the mapping entry shown in /proc/<PID>/smaps
and /proc/<PID>/maps.

Before the change:
ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero

After the change:
ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0

[1]: https://lore.kernel.org/linux-mm/20250111034511.2223353-1-liushixin2@huawei.com/

Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
---
 drivers/char/mem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 169eed162a7f..dae113f7fc1b 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -527,6 +527,10 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
 	if (vma->vm_flags & VM_SHARED)
 		return shmem_zero_setup(vma);
 	vma_set_anonymous(vma);
+	fput(vma->vm_file);
+	vma->vm_file = NULL;
+	vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
+
 	return 0;
 }
 
-- 
2.47.0



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-13 22:30 [PATCH] /dev/zero: make private mapping full anonymous mapping Yang Shi
@ 2025-01-14 12:05 ` Lorenzo Stoakes
  2025-01-14 16:53   ` Yang Shi
  2025-01-14 13:01 ` David Hildenbrand
  2025-01-28  3:14 ` kernel test robot
  2 siblings, 1 reply; 35+ messages in thread
From: Lorenzo Stoakes @ 2025-01-14 12:05 UTC (permalink / raw)
  To: Yang Shi
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel

+ Willy for the fs/weirdness elements of this.

On Mon, Jan 13, 2025 at 02:30:33PM -0800, Yang Shi wrote:
> When creating private mapping for /dev/zero, the driver makes it an
> anonymous mapping by calling set_vma_anonymous().  But it just sets
> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.

Hm yikes.

>
> This is a special case and the VMA doesn't look like either anonymous VMA
> or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
>
> It seems pointless to keep such special case.  Making private /dev/zero
> mapping a full anonymous mapping doesn't change the semantic of
> /dev/zero either.

My concern is that ostensibly there _is_ a file right? Are we certain that by
not setting this we are not breaking something somewhere else?

Are we not creating a sort of other type of 'non-such-beast' here?

I mean already setting it anon and setting vm_file non-NULL is really strange.

>
> The user visible effect is the mapping entry shown in /proc/<PID>/smaps
> and /proc/<PID>/maps.
>
> Before the change:
> ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
>
> After the change:
> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>

Yeah this seems like it might break somebody to be honest, it's really
really really strange to map a file then for it not to be mapped.

But it's possibly EVEN WEIRDER to map a file and for it to seem mapped as a
file but for it to be marked anonymous.

God what a mess.

> [1]: https://lore.kernel.org/linux-mm/20250111034511.2223353-1-liushixin2@huawei.com/

I kind of hate that we have to mitigate like this for a case that should
never ever happen so I'm inclined towards your solution but a lot more
inclined towards us totally rethinking this.

Do we _have_ to make this anonymous?? Why can't we just reference the zero
page as if it were in the page cache (Willy - feel free to correct naive
misapprehension here).

>
> Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
> ---
>  drivers/char/mem.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index 169eed162a7f..dae113f7fc1b 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -527,6 +527,10 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
>  	if (vma->vm_flags & VM_SHARED)
>  		return shmem_zero_setup(vma);
>  	vma_set_anonymous(vma);
> +	fput(vma->vm_file);
> +	vma->vm_file = NULL;
> +	vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;

Hmm, this might have been mremap()'d _potentially_ though? And then now
this will be wrong? But then we'd have no way of tracking it correctly...

I've not checked the function but do we mark this as a special mapping of
some kind?

> +
>  	return 0;
>  }
>
> --
> 2.47.0
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-13 22:30 [PATCH] /dev/zero: make private mapping full anonymous mapping Yang Shi
  2025-01-14 12:05 ` Lorenzo Stoakes
@ 2025-01-14 13:01 ` David Hildenbrand
  2025-01-14 14:52   ` Lorenzo Stoakes
  2025-01-28  3:14 ` kernel test robot
  2 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2025-01-14 13:01 UTC (permalink / raw)
  To: Yang Shi, arnd, gregkh, Liam.Howlett, lorenzo.stoakes, vbabka,
	jannh, willy, liushixin2, akpm
  Cc: linux-mm, linux-kernel

On 13.01.25 23:30, Yang Shi wrote:
> When creating private mapping for /dev/zero, the driver makes it an
> anonymous mapping by calling set_vma_anonymous().  But it just sets
> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
> 
> This is a special case and the VMA doesn't look like either anonymous VMA
> or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
> 
 > It seems pointless to keep such special case.  Making private 
/dev/zero> mapping a full anonymous mapping doesn't change the semantic of
> /dev/zero either.
> 
> The user visible effect is the mapping entry shown in /proc/<PID>/smaps
> and /proc/<PID>/maps.
> 
> Before the change:
> ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
> 
> After the change:
> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
> 

Hm, not sure about this. It's actually quite consistent to have that 
output in smaps the way it is. You mapped a file at an offset, and it 
behaves like an anonymous mapping apart from that.

Not sure if the buggy khugepaged thing is a good indicator to warrant 
this change.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 13:01 ` David Hildenbrand
@ 2025-01-14 14:52   ` Lorenzo Stoakes
  2025-01-14 15:06     ` David Hildenbrand
  0 siblings, 1 reply; 35+ messages in thread
From: Lorenzo Stoakes @ 2025-01-14 14:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Yang Shi, arnd, gregkh, Liam.Howlett, vbabka, jannh, willy,
	liushixin2, akpm, linux-mm, linux-kernel

On Tue, Jan 14, 2025 at 02:01:32PM +0100, David Hildenbrand wrote:
> On 13.01.25 23:30, Yang Shi wrote:
> > When creating private mapping for /dev/zero, the driver makes it an
> > anonymous mapping by calling set_vma_anonymous().  But it just sets
> > vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
> >
> > This is a special case and the VMA doesn't look like either anonymous VMA
> > or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
> >
> > It seems pointless to keep such special case.  Making private /dev/zero>
> mapping a full anonymous mapping doesn't change the semantic of
> > /dev/zero either.
> >
> > The user visible effect is the mapping entry shown in /proc/<PID>/smaps
> > and /proc/<PID>/maps.
> >
> > Before the change:
> > ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
> >
> > After the change:
> > ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
> >
>
> Hm, not sure about this. It's actually quite consistent to have that output
> in smaps the way it is. You mapped a file at an offset, and it behaves like
> an anonymous mapping apart from that.
>
> Not sure if the buggy khugepaged thing is a good indicator to warrant this
> change.

Yeah, this is a user-facing fundamental change that hides information and
defies expectation so I mean - it's a no go really isn't it?

I'd rather we _not_ make this anon though, because isn't life confusing
enough David? I thought it was bad enough with 'anon, file and lol shmem'
but 'lol lol also /dev/zero' is enough to make me want to frolick in the
fields...

>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 14:52   ` Lorenzo Stoakes
@ 2025-01-14 15:06     ` David Hildenbrand
  2025-01-14 17:01       ` Yang Shi
  2025-01-14 17:02       ` David Hildenbrand
  0 siblings, 2 replies; 35+ messages in thread
From: David Hildenbrand @ 2025-01-14 15:06 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Yang Shi, arnd, gregkh, Liam.Howlett, vbabka, jannh, willy,
	liushixin2, akpm, linux-mm, linux-kernel

On 14.01.25 15:52, Lorenzo Stoakes wrote:
> On Tue, Jan 14, 2025 at 02:01:32PM +0100, David Hildenbrand wrote:
>> On 13.01.25 23:30, Yang Shi wrote:
>>> When creating private mapping for /dev/zero, the driver makes it an
>>> anonymous mapping by calling set_vma_anonymous().  But it just sets
>>> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
>>>
>>> This is a special case and the VMA doesn't look like either anonymous VMA
>>> or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
>>>
>>> It seems pointless to keep such special case.  Making private /dev/zero>
>> mapping a full anonymous mapping doesn't change the semantic of
>>> /dev/zero either.
>>>
>>> The user visible effect is the mapping entry shown in /proc/<PID>/smaps
>>> and /proc/<PID>/maps.
>>>
>>> Before the change:
>>> ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
>>>
>>> After the change:
>>> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>>>
>>
>> Hm, not sure about this. It's actually quite consistent to have that output
>> in smaps the way it is. You mapped a file at an offset, and it behaves like
>> an anonymous mapping apart from that.
>>
>> Not sure if the buggy khugepaged thing is a good indicator to warrant this
>> change.
> 
> Yeah, this is a user-facing fundamental change that hides information and
> defies expectation so I mean - it's a no go really isn't it?
> 
> I'd rather we _not_ make this anon though, because isn't life confusing
> enough David? I thought it was bad enough with 'anon, file and lol shmem'
> but 'lol lol also /dev/zero' is enough to make me want to frolick in the
> fields...

I recall there are users that rely on this memory to get the shared 
zeropage on reads etc (in comparison to shmem!), so I better not ... 
mess with this *at all* :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 12:05 ` Lorenzo Stoakes
@ 2025-01-14 16:53   ` Yang Shi
  2025-01-14 18:14     ` Lorenzo Stoakes
  0 siblings, 1 reply; 35+ messages in thread
From: Yang Shi @ 2025-01-14 16:53 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel




On 1/14/25 4:05 AM, Lorenzo Stoakes wrote:
> + Willy for the fs/weirdness elements of this.
>
> On Mon, Jan 13, 2025 at 02:30:33PM -0800, Yang Shi wrote:
>> When creating private mapping for /dev/zero, the driver makes it an
>> anonymous mapping by calling set_vma_anonymous().  But it just sets
>> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
> Hm yikes.
>
>> This is a special case and the VMA doesn't look like either anonymous VMA
>> or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
>>
>> It seems pointless to keep such special case.  Making private /dev/zero
>> mapping a full anonymous mapping doesn't change the semantic of
>> /dev/zero either.
> My concern is that ostensibly there _is_ a file right? Are we certain that by
> not setting this we are not breaking something somewhere else?
>
> Are we not creating a sort of other type of 'non-such-beast' here?

But the file is /dev/zero. I don't see this could break the semantic of 
/dev/zero. The shared mapping of /dev/zero is not affected by this 
change, kernel already treated private mapping of /dev/zero as anonymous 
mapping, but with some weird settings in VMA. When reading the mapping, 
it returns 0 with zero page, when writing the mapping, a new anonymous 
folio is allocated.

>
> I mean already setting it anon and setting vm_file non-NULL is really strange.
>
>> The user visible effect is the mapping entry shown in /proc/<PID>/smaps
>> and /proc/<PID>/maps.
>>
>> Before the change:
>> ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
>>
>> After the change:
>> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>>
> Yeah this seems like it might break somebody to be honest, it's really
> really really strange to map a file then for it not to be mapped.

Yes, it is possible if someone really care whether the anonymous-like 
mapping is mapped by /dev/zero or just created by malloc(). But I don't 
know who really do...

>
> But it's possibly EVEN WEIRDER to map a file and for it to seem mapped as a
> file but for it to be marked anonymous.
>
> God what a mess.
>
>> [1]: https://lore.kernel.org/linux-mm/20250111034511.2223353-1-liushixin2@huawei.com/
> I kind of hate that we have to mitigate like this for a case that should
> never ever happen so I'm inclined towards your solution but a lot more
> inclined towards us totally rethinking this.
>
> Do we _have_ to make this anonymous?? Why can't we just reference the zero
> page as if it were in the page cache (Willy - feel free to correct naive
> misapprehension here).

TBH, I don't see why page cache has to be involved. When reading, 0 is 
returned by zero page. When writing a CoW is triggered if page cache is 
involved, but the content of the page cache should be just 0, so we copy 
0 to the new folio then write to it. It doesn't make too much sense. I 
think this is why private /dev/zero mapping is treated as anonymous 
mapping in the first place.

>
>> Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
>> ---
>>   drivers/char/mem.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
>> index 169eed162a7f..dae113f7fc1b 100644
>> --- a/drivers/char/mem.c
>> +++ b/drivers/char/mem.c
>> @@ -527,6 +527,10 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
>>   	if (vma->vm_flags & VM_SHARED)
>>   		return shmem_zero_setup(vma);
>>   	vma_set_anonymous(vma);
>> +	fput(vma->vm_file);
>> +	vma->vm_file = NULL;
>> +	vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
> Hmm, this might have been mremap()'d _potentially_ though? And then now
> this will be wrong? But then we'd have no way of tracking it correctly...

I'm not quite familiar with the subtle details and corner cases of 
meremap(). But mmap_zero() should be called by mmap(), so the VMA has 
not been visible to user yet at this point IIUC. How come mremap() could 
move it?

>
> I've not checked the function but do we mark this as a special mapping of
> some kind?
>
>> +
>>   	return 0;
>>   }
>>
>> --
>> 2.47.0
>>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 15:06     ` David Hildenbrand
@ 2025-01-14 17:01       ` Yang Shi
  2025-01-14 17:23         ` David Hildenbrand
  2025-01-14 17:02       ` David Hildenbrand
  1 sibling, 1 reply; 35+ messages in thread
From: Yang Shi @ 2025-01-14 17:01 UTC (permalink / raw)
  To: David Hildenbrand, Lorenzo Stoakes
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel




On 1/14/25 7:06 AM, David Hildenbrand wrote:
> On 14.01.25 15:52, Lorenzo Stoakes wrote:
>> On Tue, Jan 14, 2025 at 02:01:32PM +0100, David Hildenbrand wrote:
>>> On 13.01.25 23:30, Yang Shi wrote:
>>>> When creating private mapping for /dev/zero, the driver makes it an
>>>> anonymous mapping by calling set_vma_anonymous().  But it just sets
>>>> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file 
>>>> offset.
>>>>
>>>> This is a special case and the VMA doesn't look like either 
>>>> anonymous VMA
>>>> or file VMA.  It confused other kernel subsystem, for example, 
>>>> khugepaged [1].
>>>>
>>>> It seems pointless to keep such special case.  Making private 
>>>> /dev/zero>
>>> mapping a full anonymous mapping doesn't change the semantic of
>>>> /dev/zero either.
>>>>
>>>> The user visible effect is the mapping entry shown in 
>>>> /proc/<PID>/smaps
>>>> and /proc/<PID>/maps.
>>>>
>>>> Before the change:
>>>> ffffb7190000-ffffb7590000 rw-p 00001000 00:06 
>>>> 8                          /dev/zero
>>>>
>>>> After the change:
>>>> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>>>>
>>>
>>> Hm, not sure about this. It's actually quite consistent to have that 
>>> output
>>> in smaps the way it is. You mapped a file at an offset, and it 
>>> behaves like
>>> an anonymous mapping apart from that.
>>>
>>> Not sure if the buggy khugepaged thing is a good indicator to 
>>> warrant this
>>> change.

I admit this may be a concern, but I doubt who really care about it...

>>
>> Yeah, this is a user-facing fundamental change that hides information 
>> and
>> defies expectation so I mean - it's a no go really isn't it?
>>
>> I'd rather we _not_ make this anon though, because isn't life confusing
>> enough David? I thought it was bad enough with 'anon, file and lol 
>> shmem'
>> but 'lol lol also /dev/zero' is enough to make me want to frolick in the
>> fields...
>
> I recall there are users that rely on this memory to get the shared 
> zeropage on reads etc (in comparison to shmem!), so I better not ... 
> mess with this *at all* :)

The behavior won't be changed.




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 15:06     ` David Hildenbrand
  2025-01-14 17:01       ` Yang Shi
@ 2025-01-14 17:02       ` David Hildenbrand
  2025-01-14 17:20         ` Yang Shi
  1 sibling, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2025-01-14 17:02 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Yang Shi, arnd, gregkh, Liam.Howlett, vbabka, jannh, willy,
	liushixin2, akpm, linux-mm, linux-kernel

On 14.01.25 16:06, David Hildenbrand wrote:
> On 14.01.25 15:52, Lorenzo Stoakes wrote:
>> On Tue, Jan 14, 2025 at 02:01:32PM +0100, David Hildenbrand wrote:
>>> On 13.01.25 23:30, Yang Shi wrote:
>>>> When creating private mapping for /dev/zero, the driver makes it an
>>>> anonymous mapping by calling set_vma_anonymous().  But it just sets
>>>> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
>>>>
>>>> This is a special case and the VMA doesn't look like either anonymous VMA
>>>> or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
>>>>
>>>> It seems pointless to keep such special case.  Making private /dev/zero>
>>> mapping a full anonymous mapping doesn't change the semantic of
>>>> /dev/zero either.
>>>>
>>>> The user visible effect is the mapping entry shown in /proc/<PID>/smaps
>>>> and /proc/<PID>/maps.
>>>>
>>>> Before the change:
>>>> ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
>>>>
>>>> After the change:
>>>> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>>>>
>>>
>>> Hm, not sure about this. It's actually quite consistent to have that output
>>> in smaps the way it is. You mapped a file at an offset, and it behaves like
>>> an anonymous mapping apart from that.
>>>
>>> Not sure if the buggy khugepaged thing is a good indicator to warrant this
>>> change.
>>
>> Yeah, this is a user-facing fundamental change that hides information and
>> defies expectation so I mean - it's a no go really isn't it?
>>
>> I'd rather we _not_ make this anon though, because isn't life confusing
>> enough David? I thought it was bad enough with 'anon, file and lol shmem'
>> but 'lol lol also /dev/zero' is enough to make me want to frolick in the
>> fields...
> 
> I recall there are users that rely on this memory to get the shared
> zeropage on reads etc (in comparison to shmem!), so I better not ...
> mess with this *at all* :)

Heh, and I recall reading something about odd behavior of /dev/zero and 
some interesting history [1].

"
Unlike /dev/null, /dev/zero may be used as a source, not only as a sink 
for data. All write operations to /dev/zero succeed with no other 
effects. However, /dev/null is more commonly used for this purpose.

When /dev/zero is memory-mapped, e.g., with mmap, to the virtual address 
space, it is equivalent to using anonymous memory; i.e. memory not 
connected to any file.
"

"equivalent to using anonymous memory" is interesting.


Also, /dev/zero was there before MAP_ANONYMOUS was invented according to 
[1], which is quite interesting.

... so this is anonymous memory as "real" as it can get :)


[1] https://en.wikipedia.org/wiki//dev/zero

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 17:02       ` David Hildenbrand
@ 2025-01-14 17:20         ` Yang Shi
  2025-01-14 17:24           ` David Hildenbrand
  0 siblings, 1 reply; 35+ messages in thread
From: Yang Shi @ 2025-01-14 17:20 UTC (permalink / raw)
  To: David Hildenbrand, Lorenzo Stoakes
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel




On 1/14/25 9:02 AM, David Hildenbrand wrote:
> On 14.01.25 16:06, David Hildenbrand wrote:
>> On 14.01.25 15:52, Lorenzo Stoakes wrote:
>>> On Tue, Jan 14, 2025 at 02:01:32PM +0100, David Hildenbrand wrote:
>>>> On 13.01.25 23:30, Yang Shi wrote:
>>>>> When creating private mapping for /dev/zero, the driver makes it an
>>>>> anonymous mapping by calling set_vma_anonymous().  But it just sets
>>>>> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file 
>>>>> offset.
>>>>>
>>>>> This is a special case and the VMA doesn't look like either 
>>>>> anonymous VMA
>>>>> or file VMA.  It confused other kernel subsystem, for example, 
>>>>> khugepaged [1].
>>>>>
>>>>> It seems pointless to keep such special case.  Making private 
>>>>> /dev/zero>
>>>> mapping a full anonymous mapping doesn't change the semantic of
>>>>> /dev/zero either.
>>>>>
>>>>> The user visible effect is the mapping entry shown in 
>>>>> /proc/<PID>/smaps
>>>>> and /proc/<PID>/maps.
>>>>>
>>>>> Before the change:
>>>>> ffffb7190000-ffffb7590000 rw-p 00001000 00:06 
>>>>> 8                          /dev/zero
>>>>>
>>>>> After the change:
>>>>> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>>>>>
>>>>
>>>> Hm, not sure about this. It's actually quite consistent to have 
>>>> that output
>>>> in smaps the way it is. You mapped a file at an offset, and it 
>>>> behaves like
>>>> an anonymous mapping apart from that.
>>>>
>>>> Not sure if the buggy khugepaged thing is a good indicator to 
>>>> warrant this
>>>> change.
>>>
>>> Yeah, this is a user-facing fundamental change that hides 
>>> information and
>>> defies expectation so I mean - it's a no go really isn't it?
>>>
>>> I'd rather we _not_ make this anon though, because isn't life confusing
>>> enough David? I thought it was bad enough with 'anon, file and lol 
>>> shmem'
>>> but 'lol lol also /dev/zero' is enough to make me want to frolick in 
>>> the
>>> fields...
>>
>> I recall there are users that rely on this memory to get the shared
>> zeropage on reads etc (in comparison to shmem!), so I better not ...
>> mess with this *at all* :)
>
> Heh, and I recall reading something about odd behavior of /dev/zero 
> and some interesting history [1].
>
> "
> Unlike /dev/null, /dev/zero may be used as a source, not only as a 
> sink for data. All write operations to /dev/zero succeed with no other 
> effects. However, /dev/null is more commonly used for this purpose.
>
> When /dev/zero is memory-mapped, e.g., with mmap, to the virtual 
> address space, it is equivalent to using anonymous memory; i.e. memory 
> not connected to any file.
> "
>
> "equivalent to using anonymous memory" is interesting.

For private mapping. Shared mapping is equivalent to shmem.

>
>
> Also, /dev/zero was there before MAP_ANONYMOUS was invented according 
> to [1], which is quite interesting.

Interesting... Didn't know this before.

>
> ... so this is anonymous memory as "real" as it can get :)

Let's make /dev/zero as real as anonymous memory :)

>
>
> [1] https://en.wikipedia.org/wiki//dev/zero
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 17:01       ` Yang Shi
@ 2025-01-14 17:23         ` David Hildenbrand
  2025-01-14 17:38           ` Yang Shi
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2025-01-14 17:23 UTC (permalink / raw)
  To: Yang Shi, Lorenzo Stoakes
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel

On 14.01.25 18:01, Yang Shi wrote:
> 
> 
> 
> On 1/14/25 7:06 AM, David Hildenbrand wrote:
>> On 14.01.25 15:52, Lorenzo Stoakes wrote:
>>> On Tue, Jan 14, 2025 at 02:01:32PM +0100, David Hildenbrand wrote:
>>>> On 13.01.25 23:30, Yang Shi wrote:
>>>>> When creating private mapping for /dev/zero, the driver makes it an
>>>>> anonymous mapping by calling set_vma_anonymous().  But it just sets
>>>>> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file
>>>>> offset.
>>>>>
>>>>> This is a special case and the VMA doesn't look like either
>>>>> anonymous VMA
>>>>> or file VMA.  It confused other kernel subsystem, for example,
>>>>> khugepaged [1].
>>>>>
>>>>> It seems pointless to keep such special case.  Making private
>>>>> /dev/zero>
>>>> mapping a full anonymous mapping doesn't change the semantic of
>>>>> /dev/zero either.
>>>>>
>>>>> The user visible effect is the mapping entry shown in
>>>>> /proc/<PID>/smaps
>>>>> and /proc/<PID>/maps.
>>>>>
>>>>> Before the change:
>>>>> ffffb7190000-ffffb7590000 rw-p 00001000 00:06
>>>>> 8                          /dev/zero
>>>>>
>>>>> After the change:
>>>>> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>>>>>
>>>>
>>>> Hm, not sure about this. It's actually quite consistent to have that
>>>> output
>>>> in smaps the way it is. You mapped a file at an offset, and it
>>>> behaves like
>>>> an anonymous mapping apart from that.
>>>>
>>>> Not sure if the buggy khugepaged thing is a good indicator to
>>>> warrant this
>>>> change.
> 
> I admit this may be a concern, but I doubt who really care about it...
> 

There is an example in the man page [1] about /proc/self/map_files/.

I assume that will also change here.

It's always hard to tell who that could affect, but I'm not convinced 
this is worth it to find it out :)

>>>
>>> Yeah, this is a user-facing fundamental change that hides information
>>> and
>>> defies expectation so I mean - it's a no go really isn't it?
>>>
>>> I'd rather we _not_ make this anon though, because isn't life confusing
>>> enough David? I thought it was bad enough with 'anon, file and lol
>>> shmem'
>>> but 'lol lol also /dev/zero' is enough to make me want to frolick in the
>>> fields...
>>
>> I recall there are users that rely on this memory to get the shared
>> zeropage on reads etc (in comparison to shmem!), so I better not ...
>> mess with this *at all* :)
> 
> The behavior won't be changed.

Yes, I know. And that's good ;)


[1] https://man7.org/linux/man-pages/man5/proc_pid_map_files.5.html

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 17:20         ` Yang Shi
@ 2025-01-14 17:24           ` David Hildenbrand
  0 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2025-01-14 17:24 UTC (permalink / raw)
  To: Yang Shi, Lorenzo Stoakes
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel

On 14.01.25 18:20, Yang Shi wrote:
> 
> 
> 
> On 1/14/25 9:02 AM, David Hildenbrand wrote:
>> On 14.01.25 16:06, David Hildenbrand wrote:
>>> On 14.01.25 15:52, Lorenzo Stoakes wrote:
>>>> On Tue, Jan 14, 2025 at 02:01:32PM +0100, David Hildenbrand wrote:
>>>>> On 13.01.25 23:30, Yang Shi wrote:
>>>>>> When creating private mapping for /dev/zero, the driver makes it an
>>>>>> anonymous mapping by calling set_vma_anonymous().  But it just sets
>>>>>> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file
>>>>>> offset.
>>>>>>
>>>>>> This is a special case and the VMA doesn't look like either
>>>>>> anonymous VMA
>>>>>> or file VMA.  It confused other kernel subsystem, for example,
>>>>>> khugepaged [1].
>>>>>>
>>>>>> It seems pointless to keep such special case.  Making private
>>>>>> /dev/zero>
>>>>> mapping a full anonymous mapping doesn't change the semantic of
>>>>>> /dev/zero either.
>>>>>>
>>>>>> The user visible effect is the mapping entry shown in
>>>>>> /proc/<PID>/smaps
>>>>>> and /proc/<PID>/maps.
>>>>>>
>>>>>> Before the change:
>>>>>> ffffb7190000-ffffb7590000 rw-p 00001000 00:06
>>>>>> 8                          /dev/zero
>>>>>>
>>>>>> After the change:
>>>>>> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>>>>>>
>>>>>
>>>>> Hm, not sure about this. It's actually quite consistent to have
>>>>> that output
>>>>> in smaps the way it is. You mapped a file at an offset, and it
>>>>> behaves like
>>>>> an anonymous mapping apart from that.
>>>>>
>>>>> Not sure if the buggy khugepaged thing is a good indicator to
>>>>> warrant this
>>>>> change.
>>>>
>>>> Yeah, this is a user-facing fundamental change that hides
>>>> information and
>>>> defies expectation so I mean - it's a no go really isn't it?
>>>>
>>>> I'd rather we _not_ make this anon though, because isn't life confusing
>>>> enough David? I thought it was bad enough with 'anon, file and lol
>>>> shmem'
>>>> but 'lol lol also /dev/zero' is enough to make me want to frolick in
>>>> the
>>>> fields...
>>>
>>> I recall there are users that rely on this memory to get the shared
>>> zeropage on reads etc (in comparison to shmem!), so I better not ...
>>> mess with this *at all* :)
>>
>> Heh, and I recall reading something about odd behavior of /dev/zero
>> and some interesting history [1].
>>
>> "
>> Unlike /dev/null, /dev/zero may be used as a source, not only as a
>> sink for data. All write operations to /dev/zero succeed with no other
>> effects. However, /dev/null is more commonly used for this purpose.
>>
>> When /dev/zero is memory-mapped, e.g., with mmap, to the virtual
>> address space, it is equivalent to using anonymous memory; i.e. memory
>> not connected to any file.
>> "
>>
>> "equivalent to using anonymous memory" is interesting.
> 
> For private mapping. Shared mapping is equivalent to shmem.

"shared anonymous memory", yes.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 17:23         ` David Hildenbrand
@ 2025-01-14 17:38           ` Yang Shi
  2025-01-14 17:46             ` David Hildenbrand
  0 siblings, 1 reply; 35+ messages in thread
From: Yang Shi @ 2025-01-14 17:38 UTC (permalink / raw)
  To: David Hildenbrand, Lorenzo Stoakes
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel




On 1/14/25 9:23 AM, David Hildenbrand wrote:
> On 14.01.25 18:01, Yang Shi wrote:
>>
>>
>>
>> On 1/14/25 7:06 AM, David Hildenbrand wrote:
>>> On 14.01.25 15:52, Lorenzo Stoakes wrote:
>>>> On Tue, Jan 14, 2025 at 02:01:32PM +0100, David Hildenbrand wrote:
>>>>> On 13.01.25 23:30, Yang Shi wrote:
>>>>>> When creating private mapping for /dev/zero, the driver makes it an
>>>>>> anonymous mapping by calling set_vma_anonymous().  But it just sets
>>>>>> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file
>>>>>> offset.
>>>>>>
>>>>>> This is a special case and the VMA doesn't look like either
>>>>>> anonymous VMA
>>>>>> or file VMA.  It confused other kernel subsystem, for example,
>>>>>> khugepaged [1].
>>>>>>
>>>>>> It seems pointless to keep such special case.  Making private
>>>>>> /dev/zero>
>>>>> mapping a full anonymous mapping doesn't change the semantic of
>>>>>> /dev/zero either.
>>>>>>
>>>>>> The user visible effect is the mapping entry shown in
>>>>>> /proc/<PID>/smaps
>>>>>> and /proc/<PID>/maps.
>>>>>>
>>>>>> Before the change:
>>>>>> ffffb7190000-ffffb7590000 rw-p 00001000 00:06
>>>>>> 8                          /dev/zero
>>>>>>
>>>>>> After the change:
>>>>>> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>>>>>>
>>>>>
>>>>> Hm, not sure about this. It's actually quite consistent to have that
>>>>> output
>>>>> in smaps the way it is. You mapped a file at an offset, and it
>>>>> behaves like
>>>>> an anonymous mapping apart from that.
>>>>>
>>>>> Not sure if the buggy khugepaged thing is a good indicator to
>>>>> warrant this
>>>>> change.
>>
>> I admit this may be a concern, but I doubt who really care about it...
>>
>
> There is an example in the man page [1] about /proc/self/map_files/.
>
> I assume that will also change here.

IIUC, that example is specific to "anonymous shared memory" created by 
shared mapping of /dev/zero.

>
> It's always hard to tell who that could affect, but I'm not convinced 
> this is worth it to find it out :)
>
>>>>
>>>> Yeah, this is a user-facing fundamental change that hides information
>>>> and
>>>> defies expectation so I mean - it's a no go really isn't it?
>>>>
>>>> I'd rather we _not_ make this anon though, because isn't life 
>>>> confusing
>>>> enough David? I thought it was bad enough with 'anon, file and lol
>>>> shmem'
>>>> but 'lol lol also /dev/zero' is enough to make me want to frolick 
>>>> in the
>>>> fields...
>>>
>>> I recall there are users that rely on this memory to get the shared
>>> zeropage on reads etc (in comparison to shmem!), so I better not ...
>>> mess with this *at all* :)
>>
>> The behavior won't be changed.
>
> Yes, I know. And that's good ;)
>
>
> [1] https://man7.org/linux/man-pages/man5/proc_pid_map_files.5.html
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 17:38           ` Yang Shi
@ 2025-01-14 17:46             ` David Hildenbrand
  2025-01-14 18:05               ` Yang Shi
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2025-01-14 17:46 UTC (permalink / raw)
  To: Yang Shi, Lorenzo Stoakes
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel

On 14.01.25 18:38, Yang Shi wrote:
> 
> 
> 
> On 1/14/25 9:23 AM, David Hildenbrand wrote:
>> On 14.01.25 18:01, Yang Shi wrote:
>>>
>>>
>>>
>>> On 1/14/25 7:06 AM, David Hildenbrand wrote:
>>>> On 14.01.25 15:52, Lorenzo Stoakes wrote:
>>>>> On Tue, Jan 14, 2025 at 02:01:32PM +0100, David Hildenbrand wrote:
>>>>>> On 13.01.25 23:30, Yang Shi wrote:
>>>>>>> When creating private mapping for /dev/zero, the driver makes it an
>>>>>>> anonymous mapping by calling set_vma_anonymous().  But it just sets
>>>>>>> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file
>>>>>>> offset.
>>>>>>>
>>>>>>> This is a special case and the VMA doesn't look like either
>>>>>>> anonymous VMA
>>>>>>> or file VMA.  It confused other kernel subsystem, for example,
>>>>>>> khugepaged [1].
>>>>>>>
>>>>>>> It seems pointless to keep such special case.  Making private
>>>>>>> /dev/zero>
>>>>>> mapping a full anonymous mapping doesn't change the semantic of
>>>>>>> /dev/zero either.
>>>>>>>
>>>>>>> The user visible effect is the mapping entry shown in
>>>>>>> /proc/<PID>/smaps
>>>>>>> and /proc/<PID>/maps.
>>>>>>>
>>>>>>> Before the change:
>>>>>>> ffffb7190000-ffffb7590000 rw-p 00001000 00:06
>>>>>>> 8                          /dev/zero
>>>>>>>
>>>>>>> After the change:
>>>>>>> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>>>>>>>
>>>>>>
>>>>>> Hm, not sure about this. It's actually quite consistent to have that
>>>>>> output
>>>>>> in smaps the way it is. You mapped a file at an offset, and it
>>>>>> behaves like
>>>>>> an anonymous mapping apart from that.
>>>>>>
>>>>>> Not sure if the buggy khugepaged thing is a good indicator to
>>>>>> warrant this
>>>>>> change.
>>>
>>> I admit this may be a concern, but I doubt who really care about it...
>>>
>>
>> There is an example in the man page [1] about /proc/self/map_files/.
>>
>> I assume that will also change here.
> 
> IIUC, that example is specific to "anonymous shared memory" created by
> shared mapping of /dev/zero.

Note that MAP_PRIVATE of /dev/zero will also make it appear in the same 
way right now (I just tried).

The example is about MAP_FILE in general, not just MAP_SHARED IIUC.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 17:46             ` David Hildenbrand
@ 2025-01-14 18:05               ` Yang Shi
  0 siblings, 0 replies; 35+ messages in thread
From: Yang Shi @ 2025-01-14 18:05 UTC (permalink / raw)
  To: David Hildenbrand, Lorenzo Stoakes
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel




On 1/14/25 9:46 AM, David Hildenbrand wrote:
> On 14.01.25 18:38, Yang Shi wrote:
>>
>>
>>
>> On 1/14/25 9:23 AM, David Hildenbrand wrote:
>>> On 14.01.25 18:01, Yang Shi wrote:
>>>>
>>>>
>>>>
>>>> On 1/14/25 7:06 AM, David Hildenbrand wrote:
>>>>> On 14.01.25 15:52, Lorenzo Stoakes wrote:
>>>>>> On Tue, Jan 14, 2025 at 02:01:32PM +0100, David Hildenbrand wrote:
>>>>>>> On 13.01.25 23:30, Yang Shi wrote:
>>>>>>>> When creating private mapping for /dev/zero, the driver makes 
>>>>>>>> it an
>>>>>>>> anonymous mapping by calling set_vma_anonymous(). But it just sets
>>>>>>>> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file
>>>>>>>> offset.
>>>>>>>>
>>>>>>>> This is a special case and the VMA doesn't look like either
>>>>>>>> anonymous VMA
>>>>>>>> or file VMA.  It confused other kernel subsystem, for example,
>>>>>>>> khugepaged [1].
>>>>>>>>
>>>>>>>> It seems pointless to keep such special case. Making private
>>>>>>>> /dev/zero>
>>>>>>> mapping a full anonymous mapping doesn't change the semantic of
>>>>>>>> /dev/zero either.
>>>>>>>>
>>>>>>>> The user visible effect is the mapping entry shown in
>>>>>>>> /proc/<PID>/smaps
>>>>>>>> and /proc/<PID>/maps.
>>>>>>>>
>>>>>>>> Before the change:
>>>>>>>> ffffb7190000-ffffb7590000 rw-p 00001000 00:06
>>>>>>>> 8                          /dev/zero
>>>>>>>>
>>>>>>>> After the change:
>>>>>>>> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>>>>>>>>
>>>>>>>
>>>>>>> Hm, not sure about this. It's actually quite consistent to have 
>>>>>>> that
>>>>>>> output
>>>>>>> in smaps the way it is. You mapped a file at an offset, and it
>>>>>>> behaves like
>>>>>>> an anonymous mapping apart from that.
>>>>>>>
>>>>>>> Not sure if the buggy khugepaged thing is a good indicator to
>>>>>>> warrant this
>>>>>>> change.
>>>>
>>>> I admit this may be a concern, but I doubt who really care about it...
>>>>
>>>
>>> There is an example in the man page [1] about /proc/self/map_files/.
>>>
>>> I assume that will also change here.
>>
>> IIUC, that example is specific to "anonymous shared memory" created by
>> shared mapping of /dev/zero.
>
> Note that MAP_PRIVATE of /dev/zero will also make it appear in the 
> same way right now (I just tried).

Yes, I will add this in the commit log as another user visible change.

>
> The example is about MAP_FILE in general, not just MAP_SHARED IIUC.

MAP_FILE is actually ignored on Linux per 
https://man7.org/linux/man-pages/man2/mmap.2.html. It also says 
"(regions created with the MAP_ANON | MAP_SHARED flags)". Anyway it 
looks like this man page may be a little bit outdated. We can clean it 
up later.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 16:53   ` Yang Shi
@ 2025-01-14 18:14     ` Lorenzo Stoakes
  2025-01-14 18:19       ` Lorenzo Stoakes
                         ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Lorenzo Stoakes @ 2025-01-14 18:14 UTC (permalink / raw)
  To: Yang Shi
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel

This is getting into realms of discussion so to risk sounding rude - to be
clear - NACK.

The user-visible change to /proc/$pid/[s]maps kills this patch dead. This
is regardless of any other discussed issue.

But more importantly, I hadn't realise mmap_zero() was on the .mmap()
callback (sorry my mistake) - you're simply not permitted to change
vm_pgoff and vm_file fields here, the mapping logic doesn't expect it, and
it's broken.

To me the alternative would be to have a custom fault handler that hands
back the zero page, but I"m not sure that's workable, you'd have to install
a special mapping etc. and huge pages are weird and...

I do appreciate you raising this especially as I was blissfully unaware,
but I don't see how this patch can possibly work, sorry :(

On Tue, Jan 14, 2025 at 08:53:01AM -0800, Yang Shi wrote:
>
>
>
> On 1/14/25 4:05 AM, Lorenzo Stoakes wrote:
> > + Willy for the fs/weirdness elements of this.
> >
> > On Mon, Jan 13, 2025 at 02:30:33PM -0800, Yang Shi wrote:
> > > When creating private mapping for /dev/zero, the driver makes it an
> > > anonymous mapping by calling set_vma_anonymous().  But it just sets
> > > vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
> > Hm yikes.
> >
> > > This is a special case and the VMA doesn't look like either anonymous VMA
> > > or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
> > >
> > > It seems pointless to keep such special case.  Making private /dev/zero
> > > mapping a full anonymous mapping doesn't change the semantic of
> > > /dev/zero either.
> > My concern is that ostensibly there _is_ a file right? Are we certain that by
> > not setting this we are not breaking something somewhere else?
> >
> > Are we not creating a sort of other type of 'non-such-beast' here?
>
> But the file is /dev/zero. I don't see this could break the semantic of
> /dev/zero. The shared mapping of /dev/zero is not affected by this change,
> kernel already treated private mapping of /dev/zero as anonymous mapping,
> but with some weird settings in VMA. When reading the mapping, it returns 0
> with zero page, when writing the mapping, a new anonymous folio is
> allocated.

You're creating a new concept of an anon but not anon but also now with
anon vm_pgoff and missing vm_file even though it does reference a file
and... yeah.

This is not usual :)

>
> >
> > I mean already setting it anon and setting vm_file non-NULL is really strange.
> >
> > > The user visible effect is the mapping entry shown in /proc/<PID>/smaps
> > > and /proc/<PID>/maps.
> > >
> > > Before the change:
> > > ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
> > >
> > > After the change:
> > > ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
> > >
> > Yeah this seems like it might break somebody to be honest, it's really
> > really really strange to map a file then for it not to be mapped.
>
> Yes, it is possible if someone really care whether the anonymous-like
> mapping is mapped by /dev/zero or just created by malloc(). But I don't know
> who really do...
>
> >
> > But it's possibly EVEN WEIRDER to map a file and for it to seem mapped as a
> > file but for it to be marked anonymous.
> >
> > God what a mess.
> >
> > > [1]: https://lore.kernel.org/linux-mm/20250111034511.2223353-1-liushixin2@huawei.com/
> > I kind of hate that we have to mitigate like this for a case that should
> > never ever happen so I'm inclined towards your solution but a lot more
> > inclined towards us totally rethinking this.
> >
> > Do we _have_ to make this anonymous?? Why can't we just reference the zero
> > page as if it were in the page cache (Willy - feel free to correct naive
> > misapprehension here).
>
> TBH, I don't see why page cache has to be involved. When reading, 0 is
> returned by zero page. When writing a CoW is triggered if page cache is
> involved, but the content of the page cache should be just 0, so we copy 0
> to the new folio then write to it. It doesn't make too much sense. I think
> this is why private /dev/zero mapping is treated as anonymous mapping in the
> first place.

I'm obviously not suggesting allocating a bunch of extra folios, I was
thinking there would be some means of handing back the actual zero
page. But I am not sure this is workable.

>
> >
> > > Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
> > > ---
> > >   drivers/char/mem.c | 4 ++++
> > >   1 file changed, 4 insertions(+)
> > >
> > > diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> > > index 169eed162a7f..dae113f7fc1b 100644
> > > --- a/drivers/char/mem.c
> > > +++ b/drivers/char/mem.c
> > > @@ -527,6 +527,10 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
> > >   	if (vma->vm_flags & VM_SHARED)
> > >   		return shmem_zero_setup(vma);
> > >   	vma_set_anonymous(vma);
> > > +	fput(vma->vm_file);
> > > +	vma->vm_file = NULL;
> > > +	vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;

This is just not permitted. We maintain mmap state which contains the file
and pgoff state which gets threaded through the mapping operation, and
simply do not expect you to change these fields.

In future we will assert on this or preferably, restrict users to only
changing VMA flags, the private field and vm_ops.

> > Hmm, this might have been mremap()'d _potentially_ though? And then now
> > this will be wrong? But then we'd have no way of tracking it correctly...
>
> I'm not quite familiar with the subtle details and corner cases of
> meremap(). But mmap_zero() should be called by mmap(), so the VMA has not
> been visible to user yet at this point IIUC. How come mremap() could move
> it?

Ah OK, in that case fine on that front.

But you are not permitted to touch this field (we need to enforce this...)

>
> >
> > I've not checked the function but do we mark this as a special mapping of
> > some kind?
> >
> > > +
> > >   	return 0;
> > >   }
> > >
> > > --
> > > 2.47.0
> > >
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 18:14     ` Lorenzo Stoakes
@ 2025-01-14 18:19       ` Lorenzo Stoakes
  2025-01-14 18:21         ` Lorenzo Stoakes
  2025-01-14 18:22         ` Matthew Wilcox
  2025-01-14 18:32       ` Jann Horn
  2025-01-14 19:03       ` Yang Shi
  2 siblings, 2 replies; 35+ messages in thread
From: Lorenzo Stoakes @ 2025-01-14 18:19 UTC (permalink / raw)
  To: Yang Shi
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel

On Tue, Jan 14, 2025 at 06:14:57PM +0000, Lorenzo Stoakes wrote:
> This is getting into realms of discussion so to risk sounding rude - to be
> clear - NACK.
>
> The user-visible change to /proc/$pid/[s]maps kills this patch dead. This
> is regardless of any other discussed issue.
>
> But more importantly, I hadn't realise mmap_zero() was on the .mmap()
> callback (sorry my mistake) - you're simply not permitted to change
> vm_pgoff and vm_file fields here, the mapping logic doesn't expect it, and
> it's broken.

I see shmem_zero_page() does change vma->vm_page, this is broken... ugh. I
will audit this code (and have a look through _all_ mmap() callbacks I
guess). Duly added to TODO. But definitely can't have _another_ case of
doing this.

>
> To me the alternative would be to have a custom fault handler that hands
> back the zero page, but I"m not sure that's workable, you'd have to install
> a special mapping etc. and huge pages are weird and...
>
> I do appreciate you raising this especially as I was blissfully unaware,
> but I don't see how this patch can possibly work, sorry :(
>
> On Tue, Jan 14, 2025 at 08:53:01AM -0800, Yang Shi wrote:
> >
> >
> >
> > On 1/14/25 4:05 AM, Lorenzo Stoakes wrote:
> > > + Willy for the fs/weirdness elements of this.
> > >
> > > On Mon, Jan 13, 2025 at 02:30:33PM -0800, Yang Shi wrote:
> > > > When creating private mapping for /dev/zero, the driver makes it an
> > > > anonymous mapping by calling set_vma_anonymous().  But it just sets
> > > > vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
> > > Hm yikes.
> > >
> > > > This is a special case and the VMA doesn't look like either anonymous VMA
> > > > or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
> > > >
> > > > It seems pointless to keep such special case.  Making private /dev/zero
> > > > mapping a full anonymous mapping doesn't change the semantic of
> > > > /dev/zero either.
> > > My concern is that ostensibly there _is_ a file right? Are we certain that by
> > > not setting this we are not breaking something somewhere else?
> > >
> > > Are we not creating a sort of other type of 'non-such-beast' here?
> >
> > But the file is /dev/zero. I don't see this could break the semantic of
> > /dev/zero. The shared mapping of /dev/zero is not affected by this change,
> > kernel already treated private mapping of /dev/zero as anonymous mapping,
> > but with some weird settings in VMA. When reading the mapping, it returns 0
> > with zero page, when writing the mapping, a new anonymous folio is
> > allocated.
>
> You're creating a new concept of an anon but not anon but also now with
> anon vm_pgoff and missing vm_file even though it does reference a file
> and... yeah.
>
> This is not usual :)
>
> >
> > >
> > > I mean already setting it anon and setting vm_file non-NULL is really strange.
> > >
> > > > The user visible effect is the mapping entry shown in /proc/<PID>/smaps
> > > > and /proc/<PID>/maps.
> > > >
> > > > Before the change:
> > > > ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
> > > >
> > > > After the change:
> > > > ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
> > > >
> > > Yeah this seems like it might break somebody to be honest, it's really
> > > really really strange to map a file then for it not to be mapped.
> >
> > Yes, it is possible if someone really care whether the anonymous-like
> > mapping is mapped by /dev/zero or just created by malloc(). But I don't know
> > who really do...
> >
> > >
> > > But it's possibly EVEN WEIRDER to map a file and for it to seem mapped as a
> > > file but for it to be marked anonymous.
> > >
> > > God what a mess.
> > >
> > > > [1]: https://lore.kernel.org/linux-mm/20250111034511.2223353-1-liushixin2@huawei.com/
> > > I kind of hate that we have to mitigate like this for a case that should
> > > never ever happen so I'm inclined towards your solution but a lot more
> > > inclined towards us totally rethinking this.
> > >
> > > Do we _have_ to make this anonymous?? Why can't we just reference the zero
> > > page as if it were in the page cache (Willy - feel free to correct naive
> > > misapprehension here).
> >
> > TBH, I don't see why page cache has to be involved. When reading, 0 is
> > returned by zero page. When writing a CoW is triggered if page cache is
> > involved, but the content of the page cache should be just 0, so we copy 0
> > to the new folio then write to it. It doesn't make too much sense. I think
> > this is why private /dev/zero mapping is treated as anonymous mapping in the
> > first place.
>
> I'm obviously not suggesting allocating a bunch of extra folios, I was
> thinking there would be some means of handing back the actual zero
> page. But I am not sure this is workable.
>
> >
> > >
> > > > Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
> > > > ---
> > > >   drivers/char/mem.c | 4 ++++
> > > >   1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> > > > index 169eed162a7f..dae113f7fc1b 100644
> > > > --- a/drivers/char/mem.c
> > > > +++ b/drivers/char/mem.c
> > > > @@ -527,6 +527,10 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
> > > >   	if (vma->vm_flags & VM_SHARED)
> > > >   		return shmem_zero_setup(vma);
> > > >   	vma_set_anonymous(vma);
> > > > +	fput(vma->vm_file);
> > > > +	vma->vm_file = NULL;
> > > > +	vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
>
> This is just not permitted. We maintain mmap state which contains the file
> and pgoff state which gets threaded through the mapping operation, and
> simply do not expect you to change these fields.
>
> In future we will assert on this or preferably, restrict users to only
> changing VMA flags, the private field and vm_ops.
>
> > > Hmm, this might have been mremap()'d _potentially_ though? And then now
> > > this will be wrong? But then we'd have no way of tracking it correctly...
> >
> > I'm not quite familiar with the subtle details and corner cases of
> > meremap(). But mmap_zero() should be called by mmap(), so the VMA has not
> > been visible to user yet at this point IIUC. How come mremap() could move
> > it?
>
> Ah OK, in that case fine on that front.
>
> But you are not permitted to touch this field (we need to enforce this...)
>
> >
> > >
> > > I've not checked the function but do we mark this as a special mapping of
> > > some kind?
> > >
> > > > +
> > > >   	return 0;
> > > >   }
> > > >
> > > > --
> > > > 2.47.0
> > > >
> >


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 18:19       ` Lorenzo Stoakes
@ 2025-01-14 18:21         ` Lorenzo Stoakes
  2025-01-14 18:22         ` Matthew Wilcox
  1 sibling, 0 replies; 35+ messages in thread
From: Lorenzo Stoakes @ 2025-01-14 18:21 UTC (permalink / raw)
  To: Yang Shi
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel

On Tue, Jan 14, 2025 at 06:19:32PM +0000, Lorenzo Stoakes wrote:
> On Tue, Jan 14, 2025 at 06:14:57PM +0000, Lorenzo Stoakes wrote:
> > This is getting into realms of discussion so to risk sounding rude - to be
> > clear - NACK.
> >
> > The user-visible change to /proc/$pid/[s]maps kills this patch dead. This
> > is regardless of any other discussed issue.
> >
> > But more importantly, I hadn't realise mmap_zero() was on the .mmap()
> > callback (sorry my mistake) - you're simply not permitted to change
> > vm_pgoff and vm_file fields here, the mapping logic doesn't expect it, and
> > it's broken.
>
> I see shmem_zero_page() does change vma->vm_page, this is broken... ugh. I
> will audit this code (and have a look through _all_ mmap() callbacks I
> guess). Duly added to TODO. But definitely can't have _another_ case of
> doing this.

* vma->vm_file... it is late here :)

>
> >
> > To me the alternative would be to have a custom fault handler that hands
> > back the zero page, but I"m not sure that's workable, you'd have to install
> > a special mapping etc. and huge pages are weird and...
> >
> > I do appreciate you raising this especially as I was blissfully unaware,
> > but I don't see how this patch can possibly work, sorry :(
> >
> > On Tue, Jan 14, 2025 at 08:53:01AM -0800, Yang Shi wrote:
> > >
> > >
> > >
> > > On 1/14/25 4:05 AM, Lorenzo Stoakes wrote:
> > > > + Willy for the fs/weirdness elements of this.
> > > >
> > > > On Mon, Jan 13, 2025 at 02:30:33PM -0800, Yang Shi wrote:
> > > > > When creating private mapping for /dev/zero, the driver makes it an
> > > > > anonymous mapping by calling set_vma_anonymous().  But it just sets
> > > > > vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
> > > > Hm yikes.
> > > >
> > > > > This is a special case and the VMA doesn't look like either anonymous VMA
> > > > > or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
> > > > >
> > > > > It seems pointless to keep such special case.  Making private /dev/zero
> > > > > mapping a full anonymous mapping doesn't change the semantic of
> > > > > /dev/zero either.
> > > > My concern is that ostensibly there _is_ a file right? Are we certain that by
> > > > not setting this we are not breaking something somewhere else?
> > > >
> > > > Are we not creating a sort of other type of 'non-such-beast' here?
> > >
> > > But the file is /dev/zero. I don't see this could break the semantic of
> > > /dev/zero. The shared mapping of /dev/zero is not affected by this change,
> > > kernel already treated private mapping of /dev/zero as anonymous mapping,
> > > but with some weird settings in VMA. When reading the mapping, it returns 0
> > > with zero page, when writing the mapping, a new anonymous folio is
> > > allocated.
> >
> > You're creating a new concept of an anon but not anon but also now with
> > anon vm_pgoff and missing vm_file even though it does reference a file
> > and... yeah.
> >
> > This is not usual :)
> >
> > >
> > > >
> > > > I mean already setting it anon and setting vm_file non-NULL is really strange.
> > > >
> > > > > The user visible effect is the mapping entry shown in /proc/<PID>/smaps
> > > > > and /proc/<PID>/maps.
> > > > >
> > > > > Before the change:
> > > > > ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
> > > > >
> > > > > After the change:
> > > > > ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
> > > > >
> > > > Yeah this seems like it might break somebody to be honest, it's really
> > > > really really strange to map a file then for it not to be mapped.
> > >
> > > Yes, it is possible if someone really care whether the anonymous-like
> > > mapping is mapped by /dev/zero or just created by malloc(). But I don't know
> > > who really do...
> > >
> > > >
> > > > But it's possibly EVEN WEIRDER to map a file and for it to seem mapped as a
> > > > file but for it to be marked anonymous.
> > > >
> > > > God what a mess.
> > > >
> > > > > [1]: https://lore.kernel.org/linux-mm/20250111034511.2223353-1-liushixin2@huawei.com/
> > > > I kind of hate that we have to mitigate like this for a case that should
> > > > never ever happen so I'm inclined towards your solution but a lot more
> > > > inclined towards us totally rethinking this.
> > > >
> > > > Do we _have_ to make this anonymous?? Why can't we just reference the zero
> > > > page as if it were in the page cache (Willy - feel free to correct naive
> > > > misapprehension here).
> > >
> > > TBH, I don't see why page cache has to be involved. When reading, 0 is
> > > returned by zero page. When writing a CoW is triggered if page cache is
> > > involved, but the content of the page cache should be just 0, so we copy 0
> > > to the new folio then write to it. It doesn't make too much sense. I think
> > > this is why private /dev/zero mapping is treated as anonymous mapping in the
> > > first place.
> >
> > I'm obviously not suggesting allocating a bunch of extra folios, I was
> > thinking there would be some means of handing back the actual zero
> > page. But I am not sure this is workable.
> >
> > >
> > > >
> > > > > Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
> > > > > ---
> > > > >   drivers/char/mem.c | 4 ++++
> > > > >   1 file changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> > > > > index 169eed162a7f..dae113f7fc1b 100644
> > > > > --- a/drivers/char/mem.c
> > > > > +++ b/drivers/char/mem.c
> > > > > @@ -527,6 +527,10 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
> > > > >   	if (vma->vm_flags & VM_SHARED)
> > > > >   		return shmem_zero_setup(vma);
> > > > >   	vma_set_anonymous(vma);
> > > > > +	fput(vma->vm_file);
> > > > > +	vma->vm_file = NULL;
> > > > > +	vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
> >
> > This is just not permitted. We maintain mmap state which contains the file
> > and pgoff state which gets threaded through the mapping operation, and
> > simply do not expect you to change these fields.
> >
> > In future we will assert on this or preferably, restrict users to only
> > changing VMA flags, the private field and vm_ops.
> >
> > > > Hmm, this might have been mremap()'d _potentially_ though? And then now
> > > > this will be wrong? But then we'd have no way of tracking it correctly...
> > >
> > > I'm not quite familiar with the subtle details and corner cases of
> > > meremap(). But mmap_zero() should be called by mmap(), so the VMA has not
> > > been visible to user yet at this point IIUC. How come mremap() could move
> > > it?
> >
> > Ah OK, in that case fine on that front.
> >
> > But you are not permitted to touch this field (we need to enforce this...)
> >
> > >
> > > >
> > > > I've not checked the function but do we mark this as a special mapping of
> > > > some kind?
> > > >
> > > > > +
> > > > >   	return 0;
> > > > >   }
> > > > >
> > > > > --
> > > > > 2.47.0
> > > > >
> > >


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 18:19       ` Lorenzo Stoakes
  2025-01-14 18:21         ` Lorenzo Stoakes
@ 2025-01-14 18:22         ` Matthew Wilcox
  2025-01-14 18:26           ` Lorenzo Stoakes
  1 sibling, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2025-01-14 18:22 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Yang Shi, arnd, gregkh, Liam.Howlett, vbabka, jannh, liushixin2,
	akpm, linux-mm, linux-kernel

On Tue, Jan 14, 2025 at 06:19:32PM +0000, Lorenzo Stoakes wrote:
> I see shmem_zero_page() does change vma->vm_page, this is broken... ugh. I

I think you mean shmem_zero_setup() and vma->vm_file, right?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 18:22         ` Matthew Wilcox
@ 2025-01-14 18:26           ` Lorenzo Stoakes
  0 siblings, 0 replies; 35+ messages in thread
From: Lorenzo Stoakes @ 2025-01-14 18:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yang Shi, arnd, gregkh, Liam.Howlett, vbabka, jannh, liushixin2,
	akpm, linux-mm, linux-kernel

On Tue, Jan 14, 2025 at 06:22:14PM +0000, Matthew Wilcox wrote:
> On Tue, Jan 14, 2025 at 06:19:32PM +0000, Lorenzo Stoakes wrote:
> > I see shmem_zero_page() does change vma->vm_page, this is broken... ugh. I
>
> I think you mean shmem_zero_setup() and vma->vm_file, right?

Yes, correct. Sorry it's late here and it's showing haha!

The reason I am concerned about this is because we thread mmap state
through the operation which has a separate file pointer which this makes
into a potential UAF.

Will audit all this and for any other problematic .mmap() callback
behaviour.

My view is ideally this should be a callback with a const pointer to the
VMA (or some other mechanism, perhaps) which accepts a change in
_permitted_ fields only.

The 'anything could happen and anybody could manipulate any field of the
VMA' in this callback is highly problematic.

But we definitely shouldn't be adding a _new_ case here.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 18:14     ` Lorenzo Stoakes
  2025-01-14 18:19       ` Lorenzo Stoakes
@ 2025-01-14 18:32       ` Jann Horn
  2025-01-14 18:38         ` Lorenzo Stoakes
  2025-01-14 19:03       ` Yang Shi
  2 siblings, 1 reply; 35+ messages in thread
From: Jann Horn @ 2025-01-14 18:32 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Yang Shi, arnd, gregkh, Liam.Howlett, vbabka, willy, liushixin2,
	akpm, linux-mm, linux-kernel

On Tue, Jan 14, 2025 at 7:15 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> On Tue, Jan 14, 2025 at 08:53:01AM -0800, Yang Shi wrote:
> > On 1/14/25 4:05 AM, Lorenzo Stoakes wrote:
> > > On Mon, Jan 13, 2025 at 02:30:33PM -0800, Yang Shi wrote:
> > > > + fput(vma->vm_file);
> > > > + vma->vm_file = NULL;
> > > > + vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
>
> This is just not permitted. We maintain mmap state which contains the file
> and pgoff state which gets threaded through the mapping operation, and
> simply do not expect you to change these fields.
>
> In future we will assert on this or preferably, restrict users to only
> changing VMA flags, the private field and vm_ops.
>
> > > Hmm, this might have been mremap()'d _potentially_ though? And then now
> > > this will be wrong? But then we'd have no way of tracking it correctly...
> >
> > I'm not quite familiar with the subtle details and corner cases of
> > meremap(). But mmap_zero() should be called by mmap(), so the VMA has not
> > been visible to user yet at this point IIUC. How come mremap() could move
> > it?
>
> Ah OK, in that case fine on that front.
>
> But you are not permitted to touch this field (we need to enforce this...)

Sidenote: I think the GPU DRM subsystem relies on changing pgoff in
some of their mmap handlers; maybe talk to them about this if you
haven't already. See for example drm_gem_prime_mmap() and
dma_buf_mmap().


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 18:32       ` Jann Horn
@ 2025-01-14 18:38         ` Lorenzo Stoakes
  0 siblings, 0 replies; 35+ messages in thread
From: Lorenzo Stoakes @ 2025-01-14 18:38 UTC (permalink / raw)
  To: Jann Horn
  Cc: Yang Shi, arnd, gregkh, Liam.Howlett, vbabka, willy, liushixin2,
	akpm, linux-mm, linux-kernel

On Tue, Jan 14, 2025 at 07:32:51PM +0100, Jann Horn wrote:
> On Tue, Jan 14, 2025 at 7:15 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> > On Tue, Jan 14, 2025 at 08:53:01AM -0800, Yang Shi wrote:
> > > On 1/14/25 4:05 AM, Lorenzo Stoakes wrote:
> > > > On Mon, Jan 13, 2025 at 02:30:33PM -0800, Yang Shi wrote:
> > > > > + fput(vma->vm_file);
> > > > > + vma->vm_file = NULL;
> > > > > + vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
> >
> > This is just not permitted. We maintain mmap state which contains the file
> > and pgoff state which gets threaded through the mapping operation, and
> > simply do not expect you to change these fields.
> >
> > In future we will assert on this or preferably, restrict users to only
> > changing VMA flags, the private field and vm_ops.
> >
> > > > Hmm, this might have been mremap()'d _potentially_ though? And then now
> > > > this will be wrong? But then we'd have no way of tracking it correctly...
> > >
> > > I'm not quite familiar with the subtle details and corner cases of
> > > meremap(). But mmap_zero() should be called by mmap(), so the VMA has not
> > > been visible to user yet at this point IIUC. How come mremap() could move
> > > it?
> >
> > Ah OK, in that case fine on that front.
> >
> > But you are not permitted to touch this field (we need to enforce this...)
>
> Sidenote: I think the GPU DRM subsystem relies on changing pgoff in
> some of their mmap handlers; maybe talk to them about this if you
> haven't already. See for example drm_gem_prime_mmap() and
> dma_buf_mmap().

Thanks Jann , I feel like I've opened up a can of worms with this :) I will
note these as things to prioritise in the audit.

It might be worth both auditing and then actually doing the change to
restrict what can be done here too.

The problem is it requires changing a trillion callers, but hey I'm
Mr. Churn after all... ;)

Sorry Yang - I realise this is a pain and not at all obvious. Something we
in mm need to sort out (by which I mean _me_ :) your contribution and ideas
here are very valued!


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 18:14     ` Lorenzo Stoakes
  2025-01-14 18:19       ` Lorenzo Stoakes
  2025-01-14 18:32       ` Jann Horn
@ 2025-01-14 19:03       ` Yang Shi
  2025-01-14 19:13         ` Lorenzo Stoakes
  2 siblings, 1 reply; 35+ messages in thread
From: Yang Shi @ 2025-01-14 19:03 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel




On 1/14/25 10:14 AM, Lorenzo Stoakes wrote:
> This is getting into realms of discussion so to risk sounding rude - to be
> clear - NACK.
>
> The user-visible change to /proc/$pid/[s]maps kills this patch dead. This
> is regardless of any other discussed issue.

I admit this is a concern, but I don't think this is really that bad to 
kill this patch. May this change result in userspace regression? Maybe, 
likely happens to some debugging and monitoring scripts, typically we 
don't worry them that much. Of course, I can't completely guarantee no 
regression for real life applications, it should just be unlikely IMHO.

>
> But more importantly, I hadn't realise mmap_zero() was on the .mmap()
> callback (sorry my mistake) - you're simply not permitted to change
> vm_pgoff and vm_file fields here, the mapping logic doesn't expect it, and
> it's broken.
>
> To me the alternative would be to have a custom fault handler that hands
> back the zero page, but I"m not sure that's workable, you'd have to install
> a special mapping etc. and huge pages are weird and...

TBH, I don't think we need to make fault handler more complicated, it is 
just handled as anonymous fault handler.

I understand your concern about changing those vma filed outside core 
mm. An alternative is to move such change to vma.c. For example:

diff --git a/mm/vma.c b/mm/vma.c
index bb2119e5a0d0..2a7ea9901f57 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2358,6 +2358,12 @@ static int __mmap_new_vma(struct mmap_state *map, 
struct vm_area_struct **vmap)
         else
                 vma_set_anonymous(vma);

+       if (vma_is_anonymous(vma) && vma->vm_file) {
+               fput(vma->vm_file);
+               vma->vm_file = NULL;
+               vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
+       }
+
         if (error)
                 goto free_iter_vma;


>
> I do appreciate you raising this especially as I was blissfully unaware,
> but I don't see how this patch can possibly work, sorry :(
>
> On Tue, Jan 14, 2025 at 08:53:01AM -0800, Yang Shi wrote:
>>
>>
>> On 1/14/25 4:05 AM, Lorenzo Stoakes wrote:
>>> + Willy for the fs/weirdness elements of this.
>>>
>>> On Mon, Jan 13, 2025 at 02:30:33PM -0800, Yang Shi wrote:
>>>> When creating private mapping for /dev/zero, the driver makes it an
>>>> anonymous mapping by calling set_vma_anonymous().  But it just sets
>>>> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
>>> Hm yikes.
>>>
>>>> This is a special case and the VMA doesn't look like either anonymous VMA
>>>> or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
>>>>
>>>> It seems pointless to keep such special case.  Making private /dev/zero
>>>> mapping a full anonymous mapping doesn't change the semantic of
>>>> /dev/zero either.
>>> My concern is that ostensibly there _is_ a file right? Are we certain that by
>>> not setting this we are not breaking something somewhere else?
>>>
>>> Are we not creating a sort of other type of 'non-such-beast' here?
>> But the file is /dev/zero. I don't see this could break the semantic of
>> /dev/zero. The shared mapping of /dev/zero is not affected by this change,
>> kernel already treated private mapping of /dev/zero as anonymous mapping,
>> but with some weird settings in VMA. When reading the mapping, it returns 0
>> with zero page, when writing the mapping, a new anonymous folio is
>> allocated.
> You're creating a new concept of an anon but not anon but also now with
> anon vm_pgoff and missing vm_file even though it does reference a file
> and... yeah.
>
> This is not usual :)

It does reference a file, but the file is /dev/zero... And if kernel 
already treated it as anonymous mapping, it sounds like the file may not 
matter that much, so why not make it as a real anonymous mapping?  Then 
we end up having anonymous VMA and file VMA only instead of anonymous 
VMA, file VMA and hybrid special VMA. So we have less thing to worry 
about. If VMA is anonymous VMA, it is guaranteed vm_file is NULL, vm_ops 
is NULL and vm_pgoff is linear pgoff. But it is not true now.

>
>>> I mean already setting it anon and setting vm_file non-NULL is really strange.
>>>
>>>> The user visible effect is the mapping entry shown in /proc/<PID>/smaps
>>>> and /proc/<PID>/maps.
>>>>
>>>> Before the change:
>>>> ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
>>>>
>>>> After the change:
>>>> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>>>>
>>> Yeah this seems like it might break somebody to be honest, it's really
>>> really really strange to map a file then for it not to be mapped.
>> Yes, it is possible if someone really care whether the anonymous-like
>> mapping is mapped by /dev/zero or just created by malloc(). But I don't know
>> who really do...
>>
>>> But it's possibly EVEN WEIRDER to map a file and for it to seem mapped as a
>>> file but for it to be marked anonymous.
>>>
>>> God what a mess.
>>>
>>>> [1]: https://lore.kernel.org/linux-mm/20250111034511.2223353-1-liushixin2@huawei.com/
>>> I kind of hate that we have to mitigate like this for a case that should
>>> never ever happen so I'm inclined towards your solution but a lot more
>>> inclined towards us totally rethinking this.
>>>
>>> Do we _have_ to make this anonymous?? Why can't we just reference the zero
>>> page as if it were in the page cache (Willy - feel free to correct naive
>>> misapprehension here).
>> TBH, I don't see why page cache has to be involved. When reading, 0 is
>> returned by zero page. When writing a CoW is triggered if page cache is
>> involved, but the content of the page cache should be just 0, so we copy 0
>> to the new folio then write to it. It doesn't make too much sense. I think
>> this is why private /dev/zero mapping is treated as anonymous mapping in the
>> first place.
> I'm obviously not suggesting allocating a bunch of extra folios, I was
> thinking there would be some means of handing back the actual zero
> page. But I am not sure this is workable.

As I mentioned above, even handing back zero page should be not needed.

>
>>>> Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
>>>> ---
>>>>    drivers/char/mem.c | 4 ++++
>>>>    1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
>>>> index 169eed162a7f..dae113f7fc1b 100644
>>>> --- a/drivers/char/mem.c
>>>> +++ b/drivers/char/mem.c
>>>> @@ -527,6 +527,10 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
>>>>    	if (vma->vm_flags & VM_SHARED)
>>>>    		return shmem_zero_setup(vma);
>>>>    	vma_set_anonymous(vma);
>>>> +	fput(vma->vm_file);
>>>> +	vma->vm_file = NULL;
>>>> +	vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
> This is just not permitted. We maintain mmap state which contains the file
> and pgoff state which gets threaded through the mapping operation, and
> simply do not expect you to change these fields.
>
> In future we will assert on this or preferably, restrict users to only
> changing VMA flags, the private field and vm_ops.

Sure, hardening the VMA initialization code and making less surprising 
corner case is definitely helpful.

>
>>> Hmm, this might have been mremap()'d _potentially_ though? And then now
>>> this will be wrong? But then we'd have no way of tracking it correctly...
>> I'm not quite familiar with the subtle details and corner cases of
>> meremap(). But mmap_zero() should be called by mmap(), so the VMA has not
>> been visible to user yet at this point IIUC. How come mremap() could move
>> it?
> Ah OK, in that case fine on that front.
>
> But you are not permitted to touch this field (we need to enforce this...)
>
>>> I've not checked the function but do we mark this as a special mapping of
>>> some kind?
>>>
>>>> +
>>>>    	return 0;
>>>>    }
>>>>
>>>> --
>>>> 2.47.0
>>>>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 19:03       ` Yang Shi
@ 2025-01-14 19:13         ` Lorenzo Stoakes
  2025-01-14 21:24           ` Yang Shi
  0 siblings, 1 reply; 35+ messages in thread
From: Lorenzo Stoakes @ 2025-01-14 19:13 UTC (permalink / raw)
  To: Yang Shi
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel

On Tue, Jan 14, 2025 at 11:03:48AM -0800, Yang Shi wrote:
>
>
>
> On 1/14/25 10:14 AM, Lorenzo Stoakes wrote:
> > This is getting into realms of discussion so to risk sounding rude - to be
> > clear - NACK.
> >
> > The user-visible change to /proc/$pid/[s]maps kills this patch dead. This
> > is regardless of any other discussed issue.
>
> I admit this is a concern, but I don't think this is really that bad to kill
> this patch. May this change result in userspace regression? Maybe, likely
> happens to some debugging and monitoring scripts, typically we don't worry
> them that much. Of course, I can't completely guarantee no regression for
> real life applications, it should just be unlikely IMHO.

Yeah, I don't think we can accept this unfortunately.

This patch is SUPER important though even if rejected, because you've made
me realise we really need to audit all of these mmap handlers... so it's
all super appreciated regardless :)

>
> >
> > But more importantly, I hadn't realise mmap_zero() was on the .mmap()
> > callback (sorry my mistake) - you're simply not permitted to change
> > vm_pgoff and vm_file fields here, the mapping logic doesn't expect it, and
> > it's broken.
> >
> > To me the alternative would be to have a custom fault handler that hands
> > back the zero page, but I"m not sure that's workable, you'd have to install
> > a special mapping etc. and huge pages are weird and...
>
> TBH, I don't think we need to make fault handler more complicated, it is
> just handled as anonymous fault handler.
>
> I understand your concern about changing those vma filed outside core mm. An
> alternative is to move such change to vma.c. For example:
>
> diff --git a/mm/vma.c b/mm/vma.c
> index bb2119e5a0d0..2a7ea9901f57 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -2358,6 +2358,12 @@ static int __mmap_new_vma(struct mmap_state *map,
> struct vm_area_struct **vmap)
>         else
>                 vma_set_anonymous(vma);
>
> +       if (vma_is_anonymous(vma) && vma->vm_file) {
> +               fput(vma->vm_file);
> +               vma->vm_file = NULL;
> +               vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
> +       }
> +

OK that's more interesting. Though the user-facing thing remains...

It's possiible we could detect that the underlying thing is a zero page and
manually print out /dev/zero, but can somebody create a zero page file
elsewhere? In which case they might find this confusing.

It's actually a nice idea to have this _explicitly_ covered off as we could
then also add a comment explaining 'hey there's this weird type of VMA' and
have it in a place where it's actually obvious to mm folk anyway.

But this maps thing is just a killer. Somebody somewhere will be
confused. And it is not for us to judge whether that's silly or not...

>         if (error)
>                 goto free_iter_vma;
>
>
> >
> > I do appreciate you raising this especially as I was blissfully unaware,
> > but I don't see how this patch can possibly work, sorry :(
> >
> > On Tue, Jan 14, 2025 at 08:53:01AM -0800, Yang Shi wrote:
> > >
> > >
> > > On 1/14/25 4:05 AM, Lorenzo Stoakes wrote:
> > > > + Willy for the fs/weirdness elements of this.
> > > >
> > > > On Mon, Jan 13, 2025 at 02:30:33PM -0800, Yang Shi wrote:
> > > > > When creating private mapping for /dev/zero, the driver makes it an
> > > > > anonymous mapping by calling set_vma_anonymous().  But it just sets
> > > > > vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
> > > > Hm yikes.
> > > >
> > > > > This is a special case and the VMA doesn't look like either anonymous VMA
> > > > > or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
> > > > >
> > > > > It seems pointless to keep such special case.  Making private /dev/zero
> > > > > mapping a full anonymous mapping doesn't change the semantic of
> > > > > /dev/zero either.
> > > > My concern is that ostensibly there _is_ a file right? Are we certain that by
> > > > not setting this we are not breaking something somewhere else?
> > > >
> > > > Are we not creating a sort of other type of 'non-such-beast' here?
> > > But the file is /dev/zero. I don't see this could break the semantic of
> > > /dev/zero. The shared mapping of /dev/zero is not affected by this change,
> > > kernel already treated private mapping of /dev/zero as anonymous mapping,
> > > but with some weird settings in VMA. When reading the mapping, it returns 0
> > > with zero page, when writing the mapping, a new anonymous folio is
> > > allocated.
> > You're creating a new concept of an anon but not anon but also now with
> > anon vm_pgoff and missing vm_file even though it does reference a file
> > and... yeah.
> >
> > This is not usual :)
>
> It does reference a file, but the file is /dev/zero... And if kernel already
> treated it as anonymous mapping, it sounds like the file may not matter that
> much, so why not make it as a real anonymous mapping?  Then we end up having
> anonymous VMA and file VMA only instead of anonymous VMA, file VMA and
> hybrid special VMA. So we have less thing to worry about. If VMA is
> anonymous VMA, it is guaranteed vm_file is NULL, vm_ops is NULL and vm_pgoff
> is linear pgoff. But it is not true now.

It's about user confusion for me really.

>
> >
> > > > I mean already setting it anon and setting vm_file non-NULL is really strange.
> > > >
> > > > > The user visible effect is the mapping entry shown in /proc/<PID>/smaps
> > > > > and /proc/<PID>/maps.
> > > > >
> > > > > Before the change:
> > > > > ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
> > > > >
> > > > > After the change:
> > > > > ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
> > > > >
> > > > Yeah this seems like it might break somebody to be honest, it's really
> > > > really really strange to map a file then for it not to be mapped.
> > > Yes, it is possible if someone really care whether the anonymous-like
> > > mapping is mapped by /dev/zero or just created by malloc(). But I don't know
> > > who really do...
> > >
> > > > But it's possibly EVEN WEIRDER to map a file and for it to seem mapped as a
> > > > file but for it to be marked anonymous.
> > > >
> > > > God what a mess.
> > > >
> > > > > [1]: https://lore.kernel.org/linux-mm/20250111034511.2223353-1-liushixin2@huawei.com/
> > > > I kind of hate that we have to mitigate like this for a case that should
> > > > never ever happen so I'm inclined towards your solution but a lot more
> > > > inclined towards us totally rethinking this.
> > > >
> > > > Do we _have_ to make this anonymous?? Why can't we just reference the zero
> > > > page as if it were in the page cache (Willy - feel free to correct naive
> > > > misapprehension here).
> > > TBH, I don't see why page cache has to be involved. When reading, 0 is
> > > returned by zero page. When writing a CoW is triggered if page cache is
> > > involved, but the content of the page cache should be just 0, so we copy 0
> > > to the new folio then write to it. It doesn't make too much sense. I think
> > > this is why private /dev/zero mapping is treated as anonymous mapping in the
> > > first place.
> > I'm obviously not suggesting allocating a bunch of extra folios, I was
> > thinking there would be some means of handing back the actual zero
> > page. But I am not sure this is workable.
>
> As I mentioned above, even handing back zero page should be not needed.

Ack.

>
> >
> > > > > Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
> > > > > ---
> > > > >    drivers/char/mem.c | 4 ++++
> > > > >    1 file changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> > > > > index 169eed162a7f..dae113f7fc1b 100644
> > > > > --- a/drivers/char/mem.c
> > > > > +++ b/drivers/char/mem.c
> > > > > @@ -527,6 +527,10 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
> > > > >    	if (vma->vm_flags & VM_SHARED)
> > > > >    		return shmem_zero_setup(vma);
> > > > >    	vma_set_anonymous(vma);
> > > > > +	fput(vma->vm_file);
> > > > > +	vma->vm_file = NULL;
> > > > > +	vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
> > This is just not permitted. We maintain mmap state which contains the file
> > and pgoff state which gets threaded through the mapping operation, and
> > simply do not expect you to change these fields.
> >
> > In future we will assert on this or preferably, restrict users to only
> > changing VMA flags, the private field and vm_ops.
>
> Sure, hardening the VMA initialization code and making less surprising
> corner case is definitely helpful.

Yes and I've opened a can of worms and the worms have jumped out and on to
my face and were not worms but in fact an alien facehugger :P

In other words, I am going to be looking into this very seriously and
auditing this whole thing... yay for making work for myself... :>)

>
> >
> > > > Hmm, this might have been mremap()'d _potentially_ though? And then now
> > > > this will be wrong? But then we'd have no way of tracking it correctly...
> > > I'm not quite familiar with the subtle details and corner cases of
> > > meremap(). But mmap_zero() should be called by mmap(), so the VMA has not
> > > been visible to user yet at this point IIUC. How come mremap() could move
> > > it?
> > Ah OK, in that case fine on that front.
> >
> > But you are not permitted to touch this field (we need to enforce this...)
> >
> > > > I've not checked the function but do we mark this as a special mapping of
> > > > some kind?
> > > >
> > > > > +
> > > > >    	return 0;
> > > > >    }
> > > > >
> > > > > --
> > > > > 2.47.0
> > > > >
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 19:13         ` Lorenzo Stoakes
@ 2025-01-14 21:24           ` Yang Shi
  2025-01-15 12:10             ` Lorenzo Stoakes
  0 siblings, 1 reply; 35+ messages in thread
From: Yang Shi @ 2025-01-14 21:24 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel




On 1/14/25 11:13 AM, Lorenzo Stoakes wrote:
> On Tue, Jan 14, 2025 at 11:03:48AM -0800, Yang Shi wrote:
>>
>>
>> On 1/14/25 10:14 AM, Lorenzo Stoakes wrote:
>>> This is getting into realms of discussion so to risk sounding rude - to be
>>> clear - NACK.
>>>
>>> The user-visible change to /proc/$pid/[s]maps kills this patch dead. This
>>> is regardless of any other discussed issue.
>> I admit this is a concern, but I don't think this is really that bad to kill
>> this patch. May this change result in userspace regression? Maybe, likely
>> happens to some debugging and monitoring scripts, typically we don't worry
>> them that much. Of course, I can't completely guarantee no regression for
>> real life applications, it should just be unlikely IMHO.
> Yeah, I don't think we can accept this unfortunately.
>
> This patch is SUPER important though even if rejected, because you've made
> me realise we really need to audit all of these mmap handlers... so it's
> all super appreciated regardless :)

:-)

>
>>> But more importantly, I hadn't realise mmap_zero() was on the .mmap()
>>> callback (sorry my mistake) - you're simply not permitted to change
>>> vm_pgoff and vm_file fields here, the mapping logic doesn't expect it, and
>>> it's broken.
>>>
>>> To me the alternative would be to have a custom fault handler that hands
>>> back the zero page, but I"m not sure that's workable, you'd have to install
>>> a special mapping etc. and huge pages are weird and...
>> TBH, I don't think we need to make fault handler more complicated, it is
>> just handled as anonymous fault handler.
>>
>> I understand your concern about changing those vma filed outside core mm. An
>> alternative is to move such change to vma.c. For example:
>>
>> diff --git a/mm/vma.c b/mm/vma.c
>> index bb2119e5a0d0..2a7ea9901f57 100644
>> --- a/mm/vma.c
>> +++ b/mm/vma.c
>> @@ -2358,6 +2358,12 @@ static int __mmap_new_vma(struct mmap_state *map,
>> struct vm_area_struct **vmap)
>>          else
>>                  vma_set_anonymous(vma);
>>
>> +       if (vma_is_anonymous(vma) && vma->vm_file) {
>> +               fput(vma->vm_file);
>> +               vma->vm_file = NULL;
>> +               vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
>> +       }
>> +
> OK that's more interesting. Though the user-facing thing remains...
>
> It's possiible we could detect that the underlying thing is a zero page and
> manually print out /dev/zero, but can somebody create a zero page file
> elsewhere? In which case they might find this confusing.

I'm not sure about file mapping. However reading an anonymous mapping 
will instantiate zero page. It should not be marked as /dev/zero mapping.

>
> It's actually a nice idea to have this _explicitly_ covered off as we could
> then also add a comment explaining 'hey there's this weird type of VMA' and
> have it in a place where it's actually obvious to mm folk anyway.
>
> But this maps thing is just a killer. Somebody somewhere will be
> confused. And it is not for us to judge whether that's silly or not...

I just thought of named anonymous VMA may help. We can give the private 
/dev/zero mapping a name, for example, just "/dev/zero". However, 
"[anon:/dev/zero]" will show up in smaps/maps. We can't keep the device 
numbers and inode number either, but it seems it can tell the user this 
mapping comes from /dev/zero, and it also explicitly tells us it is 
specially treated by kernel. Hopefully setting anon_name is permitted.

>
>>          if (error)
>>                  goto free_iter_vma;
>>
>>
>>> I do appreciate you raising this especially as I was blissfully unaware,
>>> but I don't see how this patch can possibly work, sorry :(
>>>
>>> On Tue, Jan 14, 2025 at 08:53:01AM -0800, Yang Shi wrote:
>>>>
>>>> On 1/14/25 4:05 AM, Lorenzo Stoakes wrote:
>>>>> + Willy for the fs/weirdness elements of this.
>>>>>
>>>>> On Mon, Jan 13, 2025 at 02:30:33PM -0800, Yang Shi wrote:
>>>>>> When creating private mapping for /dev/zero, the driver makes it an
>>>>>> anonymous mapping by calling set_vma_anonymous().  But it just sets
>>>>>> vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
>>>>> Hm yikes.
>>>>>
>>>>>> This is a special case and the VMA doesn't look like either anonymous VMA
>>>>>> or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
>>>>>>
>>>>>> It seems pointless to keep such special case.  Making private /dev/zero
>>>>>> mapping a full anonymous mapping doesn't change the semantic of
>>>>>> /dev/zero either.
>>>>> My concern is that ostensibly there _is_ a file right? Are we certain that by
>>>>> not setting this we are not breaking something somewhere else?
>>>>>
>>>>> Are we not creating a sort of other type of 'non-such-beast' here?
>>>> But the file is /dev/zero. I don't see this could break the semantic of
>>>> /dev/zero. The shared mapping of /dev/zero is not affected by this change,
>>>> kernel already treated private mapping of /dev/zero as anonymous mapping,
>>>> but with some weird settings in VMA. When reading the mapping, it returns 0
>>>> with zero page, when writing the mapping, a new anonymous folio is
>>>> allocated.
>>> You're creating a new concept of an anon but not anon but also now with
>>> anon vm_pgoff and missing vm_file even though it does reference a file
>>> and... yeah.
>>>
>>> This is not usual :)
>> It does reference a file, but the file is /dev/zero... And if kernel already
>> treated it as anonymous mapping, it sounds like the file may not matter that
>> much, so why not make it as a real anonymous mapping?  Then we end up having
>> anonymous VMA and file VMA only instead of anonymous VMA, file VMA and
>> hybrid special VMA. So we have less thing to worry about. If VMA is
>> anonymous VMA, it is guaranteed vm_file is NULL, vm_ops is NULL and vm_pgoff
>> is linear pgoff. But it is not true now.
> It's about user confusion for me really.
>
>>>>> I mean already setting it anon and setting vm_file non-NULL is really strange.
>>>>>
>>>>>> The user visible effect is the mapping entry shown in /proc/<PID>/smaps
>>>>>> and /proc/<PID>/maps.
>>>>>>
>>>>>> Before the change:
>>>>>> ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
>>>>>>
>>>>>> After the change:
>>>>>> ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
>>>>>>
>>>>> Yeah this seems like it might break somebody to be honest, it's really
>>>>> really really strange to map a file then for it not to be mapped.
>>>> Yes, it is possible if someone really care whether the anonymous-like
>>>> mapping is mapped by /dev/zero or just created by malloc(). But I don't know
>>>> who really do...
>>>>
>>>>> But it's possibly EVEN WEIRDER to map a file and for it to seem mapped as a
>>>>> file but for it to be marked anonymous.
>>>>>
>>>>> God what a mess.
>>>>>
>>>>>> [1]: https://lore.kernel.org/linux-mm/20250111034511.2223353-1-liushixin2@huawei.com/
>>>>> I kind of hate that we have to mitigate like this for a case that should
>>>>> never ever happen so I'm inclined towards your solution but a lot more
>>>>> inclined towards us totally rethinking this.
>>>>>
>>>>> Do we _have_ to make this anonymous?? Why can't we just reference the zero
>>>>> page as if it were in the page cache (Willy - feel free to correct naive
>>>>> misapprehension here).
>>>> TBH, I don't see why page cache has to be involved. When reading, 0 is
>>>> returned by zero page. When writing a CoW is triggered if page cache is
>>>> involved, but the content of the page cache should be just 0, so we copy 0
>>>> to the new folio then write to it. It doesn't make too much sense. I think
>>>> this is why private /dev/zero mapping is treated as anonymous mapping in the
>>>> first place.
>>> I'm obviously not suggesting allocating a bunch of extra folios, I was
>>> thinking there would be some means of handing back the actual zero
>>> page. But I am not sure this is workable.
>> As I mentioned above, even handing back zero page should be not needed.
> Ack.
>
>>>>>> Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
>>>>>> ---
>>>>>>     drivers/char/mem.c | 4 ++++
>>>>>>     1 file changed, 4 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
>>>>>> index 169eed162a7f..dae113f7fc1b 100644
>>>>>> --- a/drivers/char/mem.c
>>>>>> +++ b/drivers/char/mem.c
>>>>>> @@ -527,6 +527,10 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
>>>>>>     	if (vma->vm_flags & VM_SHARED)
>>>>>>     		return shmem_zero_setup(vma);
>>>>>>     	vma_set_anonymous(vma);
>>>>>> +	fput(vma->vm_file);
>>>>>> +	vma->vm_file = NULL;
>>>>>> +	vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
>>> This is just not permitted. We maintain mmap state which contains the file
>>> and pgoff state which gets threaded through the mapping operation, and
>>> simply do not expect you to change these fields.
>>>
>>> In future we will assert on this or preferably, restrict users to only
>>> changing VMA flags, the private field and vm_ops.
>> Sure, hardening the VMA initialization code and making less surprising
>> corner case is definitely helpful.
> Yes and I've opened a can of worms and the worms have jumped out and on to
> my face and were not worms but in fact an alien facehugger :P
>
> In other words, I am going to be looking into this very seriously and
> auditing this whole thing... yay for making work for myself... :>)

Thank you for taking the action to kill the alien facehugger :-)

>
>>>>> Hmm, this might have been mremap()'d _potentially_ though? And then now
>>>>> this will be wrong? But then we'd have no way of tracking it correctly...
>>>> I'm not quite familiar with the subtle details and corner cases of
>>>> meremap(). But mmap_zero() should be called by mmap(), so the VMA has not
>>>> been visible to user yet at this point IIUC. How come mremap() could move
>>>> it?
>>> Ah OK, in that case fine on that front.
>>>
>>> But you are not permitted to touch this field (we need to enforce this...)
>>>
>>>>> I've not checked the function but do we mark this as a special mapping of
>>>>> some kind?
>>>>>
>>>>>> +
>>>>>>     	return 0;
>>>>>>     }
>>>>>>
>>>>>> --
>>>>>> 2.47.0
>>>>>>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-14 21:24           ` Yang Shi
@ 2025-01-15 12:10             ` Lorenzo Stoakes
  2025-01-15 21:29               ` Yang Shi
  0 siblings, 1 reply; 35+ messages in thread
From: Lorenzo Stoakes @ 2025-01-15 12:10 UTC (permalink / raw)
  To: Yang Shi
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel

On Tue, Jan 14, 2025 at 01:24:25PM -0800, Yang Shi wrote:
>
>
>
> On 1/14/25 11:13 AM, Lorenzo Stoakes wrote:
> > On Tue, Jan 14, 2025 at 11:03:48AM -0800, Yang Shi wrote:
> > >
> > >
> > > On 1/14/25 10:14 AM, Lorenzo Stoakes wrote:
> > > > This is getting into realms of discussion so to risk sounding rude - to be
> > > > clear - NACK.
> > > >
> > > > The user-visible change to /proc/$pid/[s]maps kills this patch dead. This
> > > > is regardless of any other discussed issue.
> > > I admit this is a concern, but I don't think this is really that bad to kill
> > > this patch. May this change result in userspace regression? Maybe, likely
> > > happens to some debugging and monitoring scripts, typically we don't worry
> > > them that much. Of course, I can't completely guarantee no regression for
> > > real life applications, it should just be unlikely IMHO.
> > Yeah, I don't think we can accept this unfortunately.
> >
> > This patch is SUPER important though even if rejected, because you've made
> > me realise we really need to audit all of these mmap handlers... so it's
> > all super appreciated regardless :)
>
> :-)
>
> >
> > > > But more importantly, I hadn't realise mmap_zero() was on the .mmap()
> > > > callback (sorry my mistake) - you're simply not permitted to change
> > > > vm_pgoff and vm_file fields here, the mapping logic doesn't expect it, and
> > > > it's broken.
> > > >
> > > > To me the alternative would be to have a custom fault handler that hands
> > > > back the zero page, but I"m not sure that's workable, you'd have to install
> > > > a special mapping etc. and huge pages are weird and...
> > > TBH, I don't think we need to make fault handler more complicated, it is
> > > just handled as anonymous fault handler.
> > >
> > > I understand your concern about changing those vma filed outside core mm. An
> > > alternative is to move such change to vma.c. For example:
> > >
> > > diff --git a/mm/vma.c b/mm/vma.c
> > > index bb2119e5a0d0..2a7ea9901f57 100644
> > > --- a/mm/vma.c
> > > +++ b/mm/vma.c
> > > @@ -2358,6 +2358,12 @@ static int __mmap_new_vma(struct mmap_state *map,
> > > struct vm_area_struct **vmap)
> > >          else
> > >                  vma_set_anonymous(vma);
> > >
> > > +       if (vma_is_anonymous(vma) && vma->vm_file) {
> > > +               fput(vma->vm_file);
> > > +               vma->vm_file = NULL;
> > > +               vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
> > > +       }
> > > +
> > OK that's more interesting. Though the user-facing thing remains...
> >
> > It's possiible we could detect that the underlying thing is a zero page and
> > manually print out /dev/zero, but can somebody create a zero page file
> > elsewhere? In which case they might find this confusing.
>
> I'm not sure about file mapping. However reading an anonymous mapping will
> instantiate zero page. It should not be marked as /dev/zero mapping.
>
> >
> > It's actually a nice idea to have this _explicitly_ covered off as we could
> > then also add a comment explaining 'hey there's this weird type of VMA' and
> > have it in a place where it's actually obvious to mm folk anyway.
> >
> > But this maps thing is just a killer. Somebody somewhere will be
> > confused. And it is not for us to judge whether that's silly or not...
>
> I just thought of named anonymous VMA may help. We can give the private
> /dev/zero mapping a name, for example, just "/dev/zero". However,
> "[anon:/dev/zero]" will show up in smaps/maps. We can't keep the device
> numbers and inode number either, but it seems it can tell the user this
> mapping comes from /dev/zero, and it also explicitly tells us it is
> specially treated by kernel. Hopefully setting anon_name is permitted.

But then that'd require CONFIG_ANON_VMA_NAME unfortunately :(

I think this maps thing is the killer here really.

It'd be nice to -specifically- have a means of expressing this kind of VMA,
we have a means of setting a VMA anon, so maybe we can 'set a VMA to
/dev/zero' and somehow explicitly know that we've done this and identify
this special case.

I'm not sure that the .mmap callback is the right place to do this and I"m
not sure how exactly this would work but this could be workable.

I agree the actual offset into the zero page is of no relevance and no
_sane_ user will care, but this way we could put /dev/zero in [s]maps,
treat this VMA as anon, but also add semantic information about the
existence of this weird corner case.

>
> >
> > >          if (error)
> > >                  goto free_iter_vma;
> > >
> > >
> > > > I do appreciate you raising this especially as I was blissfully unaware,
> > > > but I don't see how this patch can possibly work, sorry :(
> > > >
> > > > On Tue, Jan 14, 2025 at 08:53:01AM -0800, Yang Shi wrote:
> > > > >
> > > > > On 1/14/25 4:05 AM, Lorenzo Stoakes wrote:
> > > > > > + Willy for the fs/weirdness elements of this.
> > > > > >
> > > > > > On Mon, Jan 13, 2025 at 02:30:33PM -0800, Yang Shi wrote:
> > > > > > > When creating private mapping for /dev/zero, the driver makes it an
> > > > > > > anonymous mapping by calling set_vma_anonymous().  But it just sets
> > > > > > > vm_ops to NULL, vm_file is still valid and vm_pgoff is also file offset.
> > > > > > Hm yikes.
> > > > > >
> > > > > > > This is a special case and the VMA doesn't look like either anonymous VMA
> > > > > > > or file VMA.  It confused other kernel subsystem, for example, khugepaged [1].
> > > > > > >
> > > > > > > It seems pointless to keep such special case.  Making private /dev/zero
> > > > > > > mapping a full anonymous mapping doesn't change the semantic of
> > > > > > > /dev/zero either.
> > > > > > My concern is that ostensibly there _is_ a file right? Are we certain that by
> > > > > > not setting this we are not breaking something somewhere else?
> > > > > >
> > > > > > Are we not creating a sort of other type of 'non-such-beast' here?
> > > > > But the file is /dev/zero. I don't see this could break the semantic of
> > > > > /dev/zero. The shared mapping of /dev/zero is not affected by this change,
> > > > > kernel already treated private mapping of /dev/zero as anonymous mapping,
> > > > > but with some weird settings in VMA. When reading the mapping, it returns 0
> > > > > with zero page, when writing the mapping, a new anonymous folio is
> > > > > allocated.
> > > > You're creating a new concept of an anon but not anon but also now with
> > > > anon vm_pgoff and missing vm_file even though it does reference a file
> > > > and... yeah.
> > > >
> > > > This is not usual :)
> > > It does reference a file, but the file is /dev/zero... And if kernel already
> > > treated it as anonymous mapping, it sounds like the file may not matter that
> > > much, so why not make it as a real anonymous mapping?  Then we end up having
> > > anonymous VMA and file VMA only instead of anonymous VMA, file VMA and
> > > hybrid special VMA. So we have less thing to worry about. If VMA is
> > > anonymous VMA, it is guaranteed vm_file is NULL, vm_ops is NULL and vm_pgoff
> > > is linear pgoff. But it is not true now.
> > It's about user confusion for me really.
> >
> > > > > > I mean already setting it anon and setting vm_file non-NULL is really strange.
> > > > > >
> > > > > > > The user visible effect is the mapping entry shown in /proc/<PID>/smaps
> > > > > > > and /proc/<PID>/maps.
> > > > > > >
> > > > > > > Before the change:
> > > > > > > ffffb7190000-ffffb7590000 rw-p 00001000 00:06 8                          /dev/zero
> > > > > > >
> > > > > > > After the change:
> > > > > > > ffffb6130000-ffffb6530000 rw-p 00000000 00:00 0
> > > > > > >
> > > > > > Yeah this seems like it might break somebody to be honest, it's really
> > > > > > really really strange to map a file then for it not to be mapped.
> > > > > Yes, it is possible if someone really care whether the anonymous-like
> > > > > mapping is mapped by /dev/zero or just created by malloc(). But I don't know
> > > > > who really do...
> > > > >
> > > > > > But it's possibly EVEN WEIRDER to map a file and for it to seem mapped as a
> > > > > > file but for it to be marked anonymous.
> > > > > >
> > > > > > God what a mess.
> > > > > >
> > > > > > > [1]: https://lore.kernel.org/linux-mm/20250111034511.2223353-1-liushixin2@huawei.com/
> > > > > > I kind of hate that we have to mitigate like this for a case that should
> > > > > > never ever happen so I'm inclined towards your solution but a lot more
> > > > > > inclined towards us totally rethinking this.
> > > > > >
> > > > > > Do we _have_ to make this anonymous?? Why can't we just reference the zero
> > > > > > page as if it were in the page cache (Willy - feel free to correct naive
> > > > > > misapprehension here).
> > > > > TBH, I don't see why page cache has to be involved. When reading, 0 is
> > > > > returned by zero page. When writing a CoW is triggered if page cache is
> > > > > involved, but the content of the page cache should be just 0, so we copy 0
> > > > > to the new folio then write to it. It doesn't make too much sense. I think
> > > > > this is why private /dev/zero mapping is treated as anonymous mapping in the
> > > > > first place.
> > > > I'm obviously not suggesting allocating a bunch of extra folios, I was
> > > > thinking there would be some means of handing back the actual zero
> > > > page. But I am not sure this is workable.
> > > As I mentioned above, even handing back zero page should be not needed.
> > Ack.
> >
> > > > > > > Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
> > > > > > > ---
> > > > > > >     drivers/char/mem.c | 4 ++++
> > > > > > >     1 file changed, 4 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> > > > > > > index 169eed162a7f..dae113f7fc1b 100644
> > > > > > > --- a/drivers/char/mem.c
> > > > > > > +++ b/drivers/char/mem.c
> > > > > > > @@ -527,6 +527,10 @@ static int mmap_zero(struct file *file, struct vm_area_struct *vma)
> > > > > > >     	if (vma->vm_flags & VM_SHARED)
> > > > > > >     		return shmem_zero_setup(vma);
> > > > > > >     	vma_set_anonymous(vma);
> > > > > > > +	fput(vma->vm_file);
> > > > > > > +	vma->vm_file = NULL;
> > > > > > > +	vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
> > > > This is just not permitted. We maintain mmap state which contains the file
> > > > and pgoff state which gets threaded through the mapping operation, and
> > > > simply do not expect you to change these fields.
> > > >
> > > > In future we will assert on this or preferably, restrict users to only
> > > > changing VMA flags, the private field and vm_ops.
> > > Sure, hardening the VMA initialization code and making less surprising
> > > corner case is definitely helpful.
> > Yes and I've opened a can of worms and the worms have jumped out and on to
> > my face and were not worms but in fact an alien facehugger :P
> >
> > In other words, I am going to be looking into this very seriously and
> > auditing this whole thing... yay for making work for myself... :>)
>
> Thank you for taking the action to kill the alien facehugger :-)

Haha thanks I'll do my best :))

>
> >
> > > > > > Hmm, this might have been mremap()'d _potentially_ though? And then now
> > > > > > this will be wrong? But then we'd have no way of tracking it correctly...
> > > > > I'm not quite familiar with the subtle details and corner cases of
> > > > > meremap(). But mmap_zero() should be called by mmap(), so the VMA has not
> > > > > been visible to user yet at this point IIUC. How come mremap() could move
> > > > > it?
> > > > Ah OK, in that case fine on that front.
> > > >
> > > > But you are not permitted to touch this field (we need to enforce this...)
> > > >
> > > > > > I've not checked the function but do we mark this as a special mapping of
> > > > > > some kind?
> > > > > >
> > > > > > > +
> > > > > > >     	return 0;
> > > > > > >     }
> > > > > > >
> > > > > > > --
> > > > > > > 2.47.0
> > > > > > >
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-15 12:10             ` Lorenzo Stoakes
@ 2025-01-15 21:29               ` Yang Shi
  2025-01-15 22:05                 ` Christoph Lameter (Ampere)
  0 siblings, 1 reply; 35+ messages in thread
From: Yang Shi @ 2025-01-15 21:29 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: arnd, gregkh, Liam.Howlett, vbabka, jannh, willy, liushixin2,
	akpm, linux-mm, linux-kernel


>> I just thought of named anonymous VMA may help. We can give the private
>> /dev/zero mapping a name, for example, just "/dev/zero". However,
>> "[anon:/dev/zero]" will show up in smaps/maps. We can't keep the device
>> numbers and inode number either, but it seems it can tell the user this
>> mapping comes from /dev/zero, and it also explicitly tells us it is
>> specially treated by kernel. Hopefully setting anon_name is permitted.
> But then that'd require CONFIG_ANON_VMA_NAME unfortunately :(

Yes.

>
> I think this maps thing is the killer here really.
>
> It'd be nice to -specifically- have a means of expressing this kind of VMA,
> we have a means of setting a VMA anon, so maybe we can 'set a VMA to
> /dev/zero' and somehow explicitly know that we've done this and identify
> this special case.
>
> I'm not sure that the .mmap callback is the right place to do this and I"m
> not sure how exactly this would work but this could be workable.

A couple of potential approaches off the top of my head:
   - A new vm flag
   - Use vm_private_data

Both of them have pros and cons. The vm flag is simple enough, but it 
needs to consume one bit for just one usecase. The vm_private_data is a 
void pointer and a lot drivers use it to store driver specific data 
structures, so using the pointer in a generic path (for example, smaps) 
to tell us whether it is /dev/zero is not easy. We may be able to have a 
special encoding to it, for example, set the last bit (the trick is not 
unusual in core mm code).

>
> I agree the actual offset into the zero page is of no relevance and no
> _sane_ user will care, but this way we could put /dev/zero in [s]maps,
> treat this VMA as anon, but also add semantic information about the
> existence of this weird corner case.
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-15 21:29               ` Yang Shi
@ 2025-01-15 22:05                 ` Christoph Lameter (Ampere)
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Lameter (Ampere) @ 2025-01-15 22:05 UTC (permalink / raw)
  To: Yang Shi
  Cc: Lorenzo Stoakes, arnd, gregkh, Liam.Howlett, vbabka, jannh,
	willy, liushixin2, akpm, linux-mm, linux-kernel

On Wed, 15 Jan 2025, Yang Shi wrote:

>
> > > I just thought of named anonymous VMA may help. We can give the private
> > > /dev/zero mapping a name, for example, just "/dev/zero". However,
> > > "[anon:/dev/zero]" will show up in smaps/maps. We can't keep the device
> > > numbers and inode number either, but it seems it can tell the user this
> > > mapping comes from /dev/zero, and it also explicitly tells us it is
> > > specially treated by kernel. Hopefully setting anon_name is permitted.
> > But then that'd require CONFIG_ANON_VMA_NAME unfortunately :(
>
> Yes.

Add a counter for NULL pages in smaps?

I.e.

Null:	  4 kB

Both anonymous and file mappings could have NULL page references right?





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-13 22:30 [PATCH] /dev/zero: make private mapping full anonymous mapping Yang Shi
  2025-01-14 12:05 ` Lorenzo Stoakes
  2025-01-14 13:01 ` David Hildenbrand
@ 2025-01-28  3:14 ` kernel test robot
  2025-01-31 18:38   ` Yang Shi
  2 siblings, 1 reply; 35+ messages in thread
From: kernel test robot @ 2025-01-28  3:14 UTC (permalink / raw)
  To: Yang Shi
  Cc: oe-lkp, lkp, linux-kernel, arnd, gregkh, Liam.Howlett,
	lorenzo.stoakes, vbabka, jannh, willy, liushixin2, akpm, yang,
	linux-mm, oliver.sang


hi, All,

we don't have enough knowledge to understand fully the discussion for this
patch, we saw "NACK" but there were more discussions later.
so below report is just FYI what we observed in our tests. thanks

Hello,

kernel test robot noticed a 858.5% improvement of vm-scalability.throughput on:


commit: 7143ee2391f1ea15e6791e129870473543634de2 ("[PATCH] /dev/zero: make private mapping full anonymous mapping")
url: https://github.com/intel-lab-lkp/linux/commits/Yang-Shi/dev-zero-make-private-mapping-full-anonymous-mapping/20250114-063339
base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git a68d3cbfade64392507302f3a920113b60dc811f
patch link: https://lore.kernel.org/all/20250113223033.4054534-1-yang@os.amperecomputing.com/
patch subject: [PATCH] /dev/zero: make private mapping full anonymous mapping

testcase: vm-scalability
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:

	runtime: 300s
	test: small-allocs
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250128/202501281038.617c6b60-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability

commit: 
  a68d3cbfad ("memstick: core: fix kernel-doc notation")
  7143ee2391 ("/dev/zero: make private mapping full anonymous mapping")

a68d3cbfade64392 7143ee2391f1ea15e6791e12987 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 5.262e+09 ±  3%     -67.6%  1.705e+09 ±  3%  cpuidle..time
   7924008 ±  3%     -88.9%     875849 ±  3%  cpuidle..usage
   1585617 ±  5%     +13.5%    1799302 ±  2%  numa-numastat.node1.local_node
   1667793 ±  4%     +13.2%    1887467 ±  2%  numa-numastat.node1.numa_hit
    399.52           -78.0%      87.79        uptime.boot
     14507           -24.4%      10963        uptime.idle
      3408 ±  5%     -99.6%      13.00 ± 40%  perf-c2c.DRAM.local
     18076 ±  3%     -99.8%      38.67 ± 36%  perf-c2c.DRAM.remote
      8082 ±  5%     -99.8%      19.33 ± 52%  perf-c2c.HITM.local
      6544 ±  6%     -99.8%      14.17 ± 35%  perf-c2c.HITM.remote
     14627 ±  4%     -99.8%      33.50 ± 34%  perf-c2c.HITM.total
      6.49 ±  3%     +10.5       17.04 ±  7%  mpstat.cpu.all.idle%
      0.63            -0.3        0.35 ±  2%  mpstat.cpu.all.irq%
      0.03 ±  2%      +0.2        0.18 ±  6%  mpstat.cpu.all.soft%
     91.17           -29.6       61.57 ±  2%  mpstat.cpu.all.sys%
      1.68 ±  2%     +19.2       20.86 ±  2%  mpstat.cpu.all.usr%
    337.33           -95.3%      15.83 ± 35%  mpstat.max_utilization.seconds
      6.99 ±  3%    +190.2%      20.30 ±  5%  vmstat.cpu.id
     91.35           -34.8%      59.59 ±  2%  vmstat.cpu.sy
      1.71         +1073.6%      20.04 ±  2%  vmstat.cpu.us
    210.36           -12.7%     183.65        vmstat.procs.r
     34204 ±  5%     -41.8%      19899 ±  6%  vmstat.system.cs
    266575           -23.1%     205001        vmstat.system.in
   1609925           -50.9%     790974        meminfo.Active
   1609925           -50.9%     790974        meminfo.Active(anon)
    160837 ± 33%     -77.3%      36534 ± 11%  meminfo.AnonHugePages
   4435665           -18.7%    3606310        meminfo.Cached
   1775547           -44.6%     983546        meminfo.Committed_AS
    148539           -47.7%      77658 ±  2%  meminfo.Mapped
  25332110 ±  3%      -7.7%   23373667        meminfo.Memused
   4245538 ±  4%     -26.2%    3134309        meminfo.PageTables
  14166291 ±  4%     -11.9%   12484042        meminfo.SUnreclaim
    929777           -89.1%     100886        meminfo.Shmem
  14315492 ±  4%     -11.8%   12624243        meminfo.Slab
   1063552 ±  4%     -27.8%     767817 ± 12%  numa-meminfo.node0.PageTables
    125455 ±106%     -83.3%      20992 ±155%  numa-meminfo.node0.Shmem
     48482 ± 67%     -44.8%      26748 ±127%  numa-meminfo.node1.Mapped
   1062709 ±  4%     -21.9%     829672        numa-meminfo.node1.PageTables
   1058901 ±  4%     -27.5%     767469 ± 14%  numa-meminfo.node2.PageTables
    770405 ± 30%     -74.0%     200464 ± 77%  numa-meminfo.node3.Active
    770405 ± 30%     -74.0%     200464 ± 77%  numa-meminfo.node3.Active(anon)
   1146977 ±108%     -94.5%      63226 ±114%  numa-meminfo.node3.FilePages
     52663 ± 47%     -97.8%       1141 ± 55%  numa-meminfo.node3.Mapped
   6368902 ± 20%     -23.5%    4869231 ± 12%  numa-meminfo.node3.MemUsed
   1058539 ±  4%     -27.8%     764243 ± 12%  numa-meminfo.node3.PageTables
    558943 ± 14%     -97.0%      16946 ±195%  numa-meminfo.node3.Shmem
     64129 ±  4%    +885.2%     631788 ±  3%  vm-scalability.median
     45.40 ±  5%   +1368.7        1414 ±  5%  vm-scalability.stddev%
  14364828 ±  4%    +858.5%  1.377e+08 ±  3%  vm-scalability.throughput
    352.76           -88.2%      41.52 ±  3%  vm-scalability.time.elapsed_time
    352.76           -88.2%      41.52 ±  3%  vm-scalability.time.elapsed_time.max
    225965 ±  7%     +62.0%     365969 ±  2%  vm-scalability.time.involuntary_context_switches
 9.592e+08 ±  4%     +11.9%  1.074e+09        vm-scalability.time.minor_page_faults
     20852            -9.7%      18831        vm-scalability.time.percent_of_cpu_this_job_got
     72302           -91.9%       5866 ±  4%  vm-scalability.time.system_time
      1260 ±  3%     +54.9%       1953        vm-scalability.time.user_time
   5393707 ±  5%     -99.6%      21840 ± 49%  vm-scalability.time.voluntary_context_switches
 4.316e+09 ±  4%     +11.9%  4.832e+09        vm-scalability.workload
    265763 ±  4%     -27.8%     191828 ± 11%  numa-vmstat.node0.nr_page_table_pages
     31364 ±106%     -83.0%       5332 ±156%  numa-vmstat.node0.nr_shmem
     12205 ± 67%     -44.4%       6791 ±127%  numa-vmstat.node1.nr_mapped
    265546 ±  4%     -21.8%     207663        numa-vmstat.node1.nr_page_table_pages
   1667048 ±  4%     +13.2%    1886422 ±  2%  numa-vmstat.node1.numa_hit
   1584872 ±  5%     +13.5%    1798258 ±  2%  numa-vmstat.node1.numa_local
    264589 ±  4%     -27.1%     192920 ± 14%  numa-vmstat.node2.nr_page_table_pages
    192683 ± 30%     -73.9%      50195 ± 76%  numa-vmstat.node3.nr_active_anon
    286819 ±108%     -94.5%      15799 ±114%  numa-vmstat.node3.nr_file_pages
     13124 ± 49%     -97.8%     285.03 ± 55%  numa-vmstat.node3.nr_mapped
    264499 ±  4%     -27.4%     192027 ± 12%  numa-vmstat.node3.nr_page_table_pages
    139810 ± 14%     -97.0%       4229 ±195%  numa-vmstat.node3.nr_shmem
    192683 ± 30%     -73.9%      50195 ± 76%  numa-vmstat.node3.nr_zone_active_anon
    402515           -50.8%     197849        proc-vmstat.nr_active_anon
    170568            +1.8%     173597        proc-vmstat.nr_anon_pages
     78.63 ± 33%     -77.4%      17.80 ± 11%  proc-vmstat.nr_anon_transparent_hugepages
   4257257            +1.1%    4305540        proc-vmstat.nr_dirty_background_threshold
   8524925            +1.1%    8621607        proc-vmstat.nr_dirty_threshold
   1109246           -18.7%     901907        proc-vmstat.nr_file_pages
  42815276            +1.1%   43299295        proc-vmstat.nr_free_pages
     37525           -47.6%      19653 ±  2%  proc-vmstat.nr_mapped
   1059932 ±  4%     -26.0%     784175        proc-vmstat.nr_page_table_pages
    232507           -89.1%      25298        proc-vmstat.nr_shmem
     37297            -6.0%      35048        proc-vmstat.nr_slab_reclaimable
   3537843 ±  4%     -11.8%    3120130        proc-vmstat.nr_slab_unreclaimable
    402515           -50.8%     197849        proc-vmstat.nr_zone_active_anon
     61931 ±  8%     -73.8%      16233 ± 34%  proc-vmstat.numa_hint_faults
     15755 ± 21%     -89.8%       1609 ±117%  proc-vmstat.numa_hint_faults_local
    293942 ±  3%     -66.1%      99500 ± 20%  proc-vmstat.numa_pte_updates
 9.608e+08 ±  4%     +11.8%  1.074e+09        proc-vmstat.pgfault
     55981 ±  2%     -69.0%      17375 ±  8%  proc-vmstat.pgreuse
      0.82 ±  4%     -60.7%       0.32 ±  3%  perf-stat.i.MPKI
 2.714e+10 ±  2%    +413.1%  1.393e+11 ±  3%  perf-stat.i.branch-instructions
      0.11 ±  3%      +0.1        0.19 ±  2%  perf-stat.i.branch-miss-rate%
  24932893          +321.8%  1.052e+08 ±  3%  perf-stat.i.branch-misses
     64.93            -7.4       57.53        perf-stat.i.cache-miss-rate%
  88563288 ±  3%     +50.5%  1.333e+08 ±  3%  perf-stat.i.cache-misses
 1.369e+08 ±  3%     +55.8%  2.134e+08 ±  3%  perf-stat.i.cache-references
     34508 ±  4%     -39.5%      20864 ±  6%  perf-stat.i.context-switches
      7.67           -79.6%       1.57 ±  2%  perf-stat.i.cpi
 7.989e+11            -7.6%  7.383e+11 ±  2%  perf-stat.i.cpu-cycles
    696.35 ±  2%     -52.8%     328.76 ±  2%  perf-stat.i.cpu-migrations
     10834 ±  4%     -32.9%       7272 ±  4%  perf-stat.i.cycles-between-cache-misses
 1.102e+11          +310.6%  4.525e+11 ±  3%  perf-stat.i.instructions
      0.14          +426.9%       0.75 ±  2%  perf-stat.i.ipc
     24.25 ±  3%    +855.3%     231.63 ±  3%  perf-stat.i.metric.K/sec
   2722043 ±  3%    +867.7%   26340617 ±  3%  perf-stat.i.minor-faults
   2722043 ±  3%    +867.7%   26340616 ±  3%  perf-stat.i.page-faults
      0.81 ±  3%     -63.3%       0.30 ±  2%  perf-stat.overall.MPKI
      0.09            -0.0        0.07 ±  2%  perf-stat.overall.branch-miss-rate%
     64.81            -2.1       62.72        perf-stat.overall.cache-miss-rate%
      7.24           -77.5%       1.63 ±  3%  perf-stat.overall.cpi
      8933 ±  4%     -38.7%       5479 ±  4%  perf-stat.overall.cycles-between-cache-misses
      0.14          +344.4%       0.61 ±  3%  perf-stat.overall.ipc
      9012 ±  2%     -57.9%       3797        perf-stat.overall.path-length
 2.701e+10 ±  2%    +396.9%  1.342e+11 ±  3%  perf-stat.ps.branch-instructions
  24708939          +305.5%  1.002e+08 ±  4%  perf-stat.ps.branch-misses
  89032538 ±  3%     +45.9%  1.299e+08 ±  3%  perf-stat.ps.cache-misses
 1.374e+08 ±  3%     +50.8%  2.071e+08 ±  3%  perf-stat.ps.cache-references
     34266 ±  5%     -41.1%      20179 ±  7%  perf-stat.ps.context-switches
    223334            -2.2%     218529        perf-stat.ps.cpu-clock
 7.941e+11           -10.5%   7.11e+11        perf-stat.ps.cpu-cycles
    693.54 ±  2%     -54.7%     314.08 ±  2%  perf-stat.ps.cpu-migrations
 1.097e+11          +297.8%  4.362e+11 ±  3%  perf-stat.ps.instructions
   2710577 ±  3%    +836.2%   25375552 ±  3%  perf-stat.ps.minor-faults
   2710577 ±  3%    +836.2%   25375552 ±  3%  perf-stat.ps.page-faults
    223334            -2.2%     218529        perf-stat.ps.task-clock
 3.886e+13 ±  2%     -52.8%  1.835e+13        perf-stat.total.instructions
  64052898 ±  5%     -99.8%     124999 ± 22%  sched_debug.cfs_rq:/.avg_vruntime.avg
  95701822 ±  7%     -96.4%    3453252 ±  6%  sched_debug.cfs_rq:/.avg_vruntime.max
  43098762 ±  6%    -100.0%     148.27 ± 21%  sched_debug.cfs_rq:/.avg_vruntime.min
   9223270 ±  9%     -94.6%     495929 ± 17%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.78 ±  2%     -94.6%       0.04 ± 22%  sched_debug.cfs_rq:/.h_nr_running.avg
      0.28 ±  7%     -28.9%       0.20 ± 10%  sched_debug.cfs_rq:/.h_nr_running.stddev
    411536 ± 58%    -100.0%       3.77 ±141%  sched_debug.cfs_rq:/.left_deadline.avg
  43049468 ± 22%    -100.0%     844.45 ±141%  sched_debug.cfs_rq:/.left_deadline.max
   3836405 ± 37%    -100.0%      56.30 ±141%  sched_debug.cfs_rq:/.left_deadline.stddev
    411536 ± 58%    -100.0%       3.62 ±141%  sched_debug.cfs_rq:/.left_vruntime.avg
  43049467 ± 22%    -100.0%     809.82 ±141%  sched_debug.cfs_rq:/.left_vruntime.max
   3836405 ± 37%    -100.0%      53.99 ±141%  sched_debug.cfs_rq:/.left_vruntime.stddev
      8792 ± 28%     -81.8%       1600 ±106%  sched_debug.cfs_rq:/.load.avg
  64052901 ±  5%     -99.8%     124999 ± 22%  sched_debug.cfs_rq:/.min_vruntime.avg
  95701822 ±  7%     -96.4%    3453252 ±  6%  sched_debug.cfs_rq:/.min_vruntime.max
  43098762 ±  6%    -100.0%     148.27 ± 21%  sched_debug.cfs_rq:/.min_vruntime.min
   9223270 ±  9%     -94.6%     495929 ± 17%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.77 ±  2%     -94.6%       0.04 ± 22%  sched_debug.cfs_rq:/.nr_running.avg
      0.26 ± 10%     -22.4%       0.20 ± 10%  sched_debug.cfs_rq:/.nr_running.stddev
    411536 ± 58%    -100.0%       3.62 ±141%  sched_debug.cfs_rq:/.right_vruntime.avg
  43049467 ± 22%    -100.0%     809.82 ±141%  sched_debug.cfs_rq:/.right_vruntime.max
   3836405 ± 37%    -100.0%      53.99 ±141%  sched_debug.cfs_rq:/.right_vruntime.stddev
    286633 ± 43%    +421.0%    1493420 ± 42%  sched_debug.cfs_rq:/.runnable_avg.avg
  34728895 ± 30%    +380.1%  1.667e+08 ± 27%  sched_debug.cfs_rq:/.runnable_avg.max
   2845573 ± 30%    +406.5%   14411856 ± 30%  sched_debug.cfs_rq:/.runnable_avg.stddev
    769.03           -85.4%     112.18 ±  6%  sched_debug.cfs_rq:/.util_avg.avg
      1621 ±  5%     -39.3%     983.67 ±  9%  sched_debug.cfs_rq:/.util_avg.max
    159.12 ±  8%     +26.6%     201.45 ±  6%  sched_debug.cfs_rq:/.util_avg.stddev
    724.17 ±  2%     -98.8%       8.91 ± 43%  sched_debug.cfs_rq:/.util_est.avg
      1360 ± 15%     -52.9%     640.17 ± 13%  sched_debug.cfs_rq:/.util_est.max
    234.34 ±  9%     -71.0%      67.88 ± 27%  sched_debug.cfs_rq:/.util_est.stddev
    766944 ±  3%     +18.9%     911838        sched_debug.cpu.avg_idle.avg
   1067639 ±  5%     +31.7%    1406047 ± 12%  sched_debug.cpu.avg_idle.max
    321459 ±  2%     -37.0%     202531 ±  7%  sched_debug.cpu.avg_idle.stddev
    195573           -76.7%      45494        sched_debug.cpu.clock.avg
    195596           -76.7%      45510        sched_debug.cpu.clock.max
    195548           -76.7%      45471        sched_debug.cpu.clock.min
     13.79 ±  3%     -36.2%       8.80 ±  2%  sched_debug.cpu.clock.stddev
    194424           -76.7%      45370        sched_debug.cpu.clock_task.avg
    194608           -76.6%      45496        sched_debug.cpu.clock_task.max
    181834           -81.8%      33106        sched_debug.cpu.clock_task.min
      4241 ±  2%     -96.8%     134.16 ± 27%  sched_debug.cpu.curr->pid.avg
      9799 ±  2%     -59.8%       3941        sched_debug.cpu.curr->pid.max
      1365 ± 10%     -49.6%     688.63 ± 13%  sched_debug.cpu.curr->pid.stddev
    537665 ±  4%     +31.3%     705893 ±  9%  sched_debug.cpu.max_idle_balance_cost.max
      3119 ± 56%    +590.3%      21534 ± 34%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.00 ± 12%     -70.8%       0.00 ± 12%  sched_debug.cpu.next_balance.stddev
      0.78 ±  2%     -95.2%       0.04 ± 25%  sched_debug.cpu.nr_running.avg
      2.17 ±  8%     -46.2%       1.17 ± 31%  sched_debug.cpu.nr_running.max
      0.29 ±  8%     -34.0%       0.19 ± 12%  sched_debug.cpu.nr_running.stddev
     25773 ±  5%     -97.0%     783.41 ±  5%  sched_debug.cpu.nr_switches.avg
     48669 ± 10%     -76.8%      11301 ± 18%  sched_debug.cpu.nr_switches.max
     19006 ±  7%     -99.2%     156.50 ± 11%  sched_debug.cpu.nr_switches.min
      4142 ±  8%     -68.9%       1290 ± 12%  sched_debug.cpu.nr_switches.stddev
      0.07 ± 23%     -94.0%       0.00 ± 57%  sched_debug.cpu.nr_uninterruptible.avg
    240.19 ± 16%     -81.7%      44.00 ± 19%  sched_debug.cpu.nr_uninterruptible.max
    -77.92           -84.6%     -12.00        sched_debug.cpu.nr_uninterruptible.min
     37.87 ±  5%     -85.2%       5.60 ± 12%  sched_debug.cpu.nr_uninterruptible.stddev
    195549           -76.7%      45480        sched_debug.cpu_clk
    194699           -77.1%      44630        sched_debug.ktime
      0.00          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.avg
      0.17          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.max
      0.01          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.stddev
    196368           -76.4%      46311        sched_debug.sched_clk
     95.59           -95.6        0.00        perf-profile.calltrace.cycles-pp.__mmap
     95.54           -95.5        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     95.54           -95.5        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
     94.54           -94.5        0.00        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     94.46           -94.1        0.31 ±101%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     94.14           -93.8        0.37 ±105%  perf-profile.calltrace.cycles-pp.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
     93.79           -93.6        0.16 ±223%  perf-profile.calltrace.cycles-pp.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
     93.44           -93.4        0.00        perf-profile.calltrace.cycles-pp.down_write.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap
     93.40           -93.4        0.00        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma.__mmap_region
     93.33           -93.3        0.00        perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma
     94.25           -93.3        0.98 ± 82%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
     94.45           -93.0        1.40 ± 51%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     92.89           -92.9        0.00        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file
      0.00            +1.7        1.73 ± 34%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exec_mmap.begin_new_exec.load_elf_binary
      0.00            +1.8        1.82 ± 56%  perf-profile.calltrace.cycles-pp.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      0.00            +1.9        1.85 ± 31%  perf-profile.calltrace.cycles-pp.__mmput.exec_mmap.begin_new_exec.load_elf_binary.search_binary_handler
      0.00            +1.9        1.85 ± 31%  perf-profile.calltrace.cycles-pp.begin_new_exec.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve
      0.00            +1.9        1.85 ± 31%  perf-profile.calltrace.cycles-pp.exec_mmap.begin_new_exec.load_elf_binary.search_binary_handler.exec_binprm
      0.00            +2.3        2.28 ± 38%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00            +2.5        2.48 ± 25%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00            +2.5        2.48 ± 25%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
      0.00            +2.5        2.50 ± 48%  perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      0.00            +2.5        2.52 ± 31%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +2.5        2.52 ± 31%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +2.7        2.68 ± 27%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault
      0.00            +2.7        2.71 ± 40%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
      0.00            +2.7        2.71 ± 40%  perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +2.8        2.76 ± 59%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      0.00            +2.8        2.85 ± 54%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry
      0.00            +2.8        2.85 ± 54%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
      0.00            +3.0        2.96 ± 53%  perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
      0.00            +3.0        2.99 ± 53%  perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +3.0        2.99 ± 53%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +3.0        2.99 ± 53%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +3.0        2.99 ± 53%  perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +3.0        3.02 ± 31%  perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
      0.00            +3.0        3.02 ± 31%  perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common
      0.00            +3.0        3.02 ± 31%  perf-profile.calltrace.cycles-pp.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve
      0.00            +3.0        3.03 ± 52%  perf-profile.calltrace.cycles-pp._Fork
      0.00            +3.3        3.31 ± 26%  perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.5        3.52 ± 20%  perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
      0.00            +3.5        3.52 ± 20%  perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
      0.00            +3.5        3.52 ± 20%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
      0.00            +3.5        3.52 ± 20%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve
      0.00            +3.5        3.52 ± 20%  perf-profile.calltrace.cycles-pp.execve
      0.00            +3.5        3.54 ± 41%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput
      0.00            +3.5        3.54 ± 41%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
      0.00            +3.7        3.69 ± 37%  perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.7        3.69 ± 37%  perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
      0.00            +3.9        3.89 ± 50%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
      0.00            +3.9        3.94 ± 44%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00            +4.2        4.18 ± 91%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
      0.00            +4.2        4.18 ± 91%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      0.00            +4.2        4.18 ± 91%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
      0.00            +5.5        5.54 ± 38%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
      0.00            +5.8        5.85 ± 27%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
      0.00            +6.5        6.50 ± 62%  perf-profile.calltrace.cycles-pp.handle_internal_command.main
      0.00            +6.5        6.50 ± 62%  perf-profile.calltrace.cycles-pp.main
      0.00            +6.5        6.50 ± 62%  perf-profile.calltrace.cycles-pp.run_builtin.handle_internal_command.main
      0.00            +9.1        9.05 ± 54%  perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.00            +9.1        9.05 ± 54%  perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.00            +9.4        9.38 ± 52%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.00            +9.5        9.48 ± 52%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
      0.00            +9.9        9.92 ± 57%  perf-profile.calltrace.cycles-pp.read
      0.00           +12.0       11.98 ± 50%  perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
      0.00           +18.8       18.83 ± 38%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00           +18.8       18.83 ± 38%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.21 ±  3%     +34.3       35.50 ± 18%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      1.21 ±  3%     +34.8       35.97 ± 18%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.21 ±  3%     +35.0       36.19 ± 16%  perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      1.21 ±  3%     +35.1       36.30 ± 16%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      1.21 ±  3%     +35.1       36.30 ± 16%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      1.22 ±  3%     +35.5       36.71 ± 18%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
      1.22 ±  3%     +35.5       36.71 ± 18%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.22 ±  3%     +35.5       36.71 ± 18%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
      1.22 ±  3%     +36.4       37.61 ± 15%  perf-profile.calltrace.cycles-pp.common_startup_64
      2.19 ±  3%     +49.9       52.08 ± 18%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
     95.60           -95.2        0.42 ±113%  perf-profile.children.cycles-pp.__mmap
     94.14           -93.6        0.54 ±106%  perf-profile.children.cycles-pp.__mmap_new_vma
     93.79           -93.6        0.21 ±171%  perf-profile.children.cycles-pp.vma_link_file
     93.40           -93.4        0.00        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
     93.33           -93.3        0.00        perf-profile.children.cycles-pp.rwsem_optimistic_spin
     93.44           -93.2        0.24 ±178%  perf-profile.children.cycles-pp.down_write
     94.55           -93.1        1.40 ± 51%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
     94.25           -93.0        1.30 ± 59%  perf-profile.children.cycles-pp.__mmap_region
     92.91           -92.9        0.00        perf-profile.children.cycles-pp.osq_lock
     94.45           -92.7        1.72 ± 34%  perf-profile.children.cycles-pp.do_mmap
     94.46           -92.6        1.83 ± 31%  perf-profile.children.cycles-pp.vm_mmap_pgoff
     95.58           -45.3       50.30 ±  6%  perf-profile.children.cycles-pp.do_syscall_64
     95.58           -45.2       50.40 ±  6%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.00            +1.2        1.22 ± 40%  perf-profile.children.cycles-pp._raw_spin_lock
      0.00            +1.3        1.26 ± 34%  perf-profile.children.cycles-pp.seq_printf
      0.00            +1.3        1.32 ± 78%  perf-profile.children.cycles-pp.kmem_cache_free
      0.00            +1.6        1.60 ± 42%  perf-profile.children.cycles-pp.sched_balance_rq
      0.00            +1.7        1.73 ± 41%  perf-profile.children.cycles-pp.open_last_lookups
      0.00            +1.9        1.85 ± 31%  perf-profile.children.cycles-pp.begin_new_exec
      0.00            +1.9        1.85 ± 31%  perf-profile.children.cycles-pp.exec_mmap
      0.00            +2.1        2.09 ± 40%  perf-profile.children.cycles-pp.do_pte_missing
      0.46            +2.4        2.85 ± 54%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.53            +2.4        2.94 ± 49%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.53            +2.4        2.94 ± 49%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.00            +2.4        2.44 ±101%  perf-profile.children.cycles-pp.__evlist__enable
      0.00            +2.5        2.54 ± 45%  perf-profile.children.cycles-pp.zap_present_ptes
      0.00            +2.6        2.58 ± 54%  perf-profile.children.cycles-pp.mutex_unlock
      0.00            +2.7        2.68 ± 67%  perf-profile.children.cycles-pp.evlist_cpu_iterator__next
      0.00            +2.7        2.71 ± 40%  perf-profile.children.cycles-pp.__x64_sys_exit_group
      0.00            +2.7        2.71 ± 40%  perf-profile.children.cycles-pp.x64_sys_call
      0.00            +3.0        2.99 ± 53%  perf-profile.children.cycles-pp.__do_sys_clone
      0.00            +3.0        2.99 ± 53%  perf-profile.children.cycles-pp.kernel_clone
      0.00            +3.0        3.02 ± 31%  perf-profile.children.cycles-pp.exec_binprm
      0.00            +3.0        3.02 ± 31%  perf-profile.children.cycles-pp.load_elf_binary
      0.00            +3.0        3.02 ± 31%  perf-profile.children.cycles-pp.search_binary_handler
      0.00            +3.0        3.03 ± 52%  perf-profile.children.cycles-pp._Fork
      0.00            +3.3        3.31 ± 26%  perf-profile.children.cycles-pp.bprm_execve
      0.58 ±  2%      +3.4        3.98 ± 47%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.00            +3.5        3.52 ± 20%  perf-profile.children.cycles-pp.execve
      0.04 ± 44%      +3.7        3.72 ± 18%  perf-profile.children.cycles-pp.__schedule
      0.00            +3.7        3.72 ± 14%  perf-profile.children.cycles-pp.__x64_sys_execve
      0.00            +3.7        3.72 ± 14%  perf-profile.children.cycles-pp.do_execveat_common
      0.51 ±  6%      +3.7        4.25 ± 31%  perf-profile.children.cycles-pp.handle_mm_fault
      0.00            +3.8        3.79 ± 40%  perf-profile.children.cycles-pp.zap_pte_range
      0.00            +3.9        3.90 ± 26%  perf-profile.children.cycles-pp.do_filp_open
      0.00            +3.9        3.90 ± 26%  perf-profile.children.cycles-pp.path_openat
      0.00            +3.9        3.91 ± 43%  perf-profile.children.cycles-pp.unmap_page_range
      0.00            +3.9        3.91 ± 43%  perf-profile.children.cycles-pp.zap_pmd_range
      1.18            +4.0        5.20 ± 19%  perf-profile.children.cycles-pp.asm_exc_page_fault
      0.19 ± 23%      +4.0        4.21 ± 32%  perf-profile.children.cycles-pp.__handle_mm_fault
      0.77 ±  3%      +4.0        4.79 ± 27%  perf-profile.children.cycles-pp.exc_page_fault
      0.76 ±  3%      +4.0        4.79 ± 27%  perf-profile.children.cycles-pp.do_user_addr_fault
      0.00            +4.1        4.13 ± 38%  perf-profile.children.cycles-pp.do_sys_openat2
      0.00            +4.2        4.15 ± 35%  perf-profile.children.cycles-pp.unmap_vmas
      0.00            +4.2        4.18 ± 91%  perf-profile.children.cycles-pp.kthread
      0.00            +4.2        4.22 ± 91%  perf-profile.children.cycles-pp.ret_from_fork
      0.00            +4.2        4.22 ± 91%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.00            +4.3        4.25 ± 37%  perf-profile.children.cycles-pp.__x64_sys_openat
      0.00            +5.5        5.54 ± 38%  perf-profile.children.cycles-pp.exit_mm
      0.00            +6.1        6.09 ± 48%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
      0.02 ±141%      +6.5        6.50 ± 62%  perf-profile.children.cycles-pp.__cmd_record
      0.02 ±141%      +6.5        6.50 ± 62%  perf-profile.children.cycles-pp.cmd_record
      0.02 ±141%      +6.5        6.50 ± 62%  perf-profile.children.cycles-pp.handle_internal_command
      0.02 ±141%      +6.5        6.50 ± 62%  perf-profile.children.cycles-pp.main
      0.02 ±141%      +6.5        6.50 ± 62%  perf-profile.children.cycles-pp.run_builtin
      0.00            +7.3        7.28 ± 26%  perf-profile.children.cycles-pp.exit_mmap
      0.00            +7.4        7.40 ± 27%  perf-profile.children.cycles-pp.__mmput
      0.00            +8.5        8.52 ± 58%  perf-profile.children.cycles-pp.seq_read_iter
      0.00            +8.6        8.56 ± 52%  perf-profile.children.cycles-pp.__fput
      0.00            +9.1        9.05 ± 54%  perf-profile.children.cycles-pp.ksys_read
      0.00            +9.1        9.05 ± 54%  perf-profile.children.cycles-pp.vfs_read
      0.00            +9.7        9.72 ± 54%  perf-profile.children.cycles-pp.read
      0.00           +16.0       16.03 ± 41%  perf-profile.children.cycles-pp.do_exit
      0.00           +16.0       16.03 ± 41%  perf-profile.children.cycles-pp.do_group_exit
      1.70 ±  2%     +26.7       28.38 ± 16%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      1.21 ±  3%     +35.0       36.19 ± 16%  perf-profile.children.cycles-pp.acpi_idle_do_entry
      1.21 ±  3%     +35.0       36.19 ± 16%  perf-profile.children.cycles-pp.acpi_safe_halt
      1.21 ±  3%     +35.1       36.30 ± 16%  perf-profile.children.cycles-pp.acpi_idle_enter
      1.21 ±  3%     +35.1       36.30 ± 16%  perf-profile.children.cycles-pp.cpuidle_enter_state
      1.21 ±  3%     +35.2       36.40 ± 15%  perf-profile.children.cycles-pp.cpuidle_enter
      1.22 ±  3%     +35.5       36.71 ± 18%  perf-profile.children.cycles-pp.start_secondary
      1.22 ±  3%     +35.7       36.87 ± 15%  perf-profile.children.cycles-pp.cpuidle_idle_call
      1.22 ±  3%     +36.4       37.61 ± 15%  perf-profile.children.cycles-pp.common_startup_64
      1.22 ±  3%     +36.4       37.61 ± 15%  perf-profile.children.cycles-pp.cpu_startup_entry
      1.22 ±  3%     +36.4       37.61 ± 15%  perf-profile.children.cycles-pp.do_idle
     92.37           -92.4        0.00        perf-profile.self.cycles-pp.osq_lock
      1.19 ±  3%     +29.6       30.75 ± 22%  perf-profile.self.cycles-pp.acpi_safe_halt
      0.17 ±142%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
      0.19 ± 34%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      0.14 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      0.14 ± 73%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
      0.10 ± 66%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
      0.11 ± 59%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.04 ±132%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.07 ±101%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.02 ± 31%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.02 ±143%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
      0.10 ± 44%     -99.5%       0.00 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      0.12 ±145%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      0.04 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.25 ± 41%     -95.8%       0.01 ±144%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      0.11 ± 59%     -99.1%       0.00 ±115%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.40 ± 50%     -99.6%       0.00 ±223%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.32 ±104%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.01 ± 12%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.08 ± 28%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.01 ± 42%     -90.6%       0.00 ±223%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
      0.18 ± 57%     -99.8%       0.00 ±223%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.03 ± 83%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.01 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      0.32 ± 47%     -97.1%       0.01 ± 55%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      0.07 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      0.26 ± 17%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.02 ± 60%     -83.3%       0.00 ±141%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      0.01 ±128%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      0.06 ± 31%   +1806.3%       1.16 ±127%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.00 ±151%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
     25.45 ± 94%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      4.56 ± 67%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      3.55 ± 97%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
      2.13 ± 67%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
      3.16 ± 78%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.30 ±159%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      1.61 ±100%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.03 ± 86%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.20 ±182%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
      3.51 ± 21%    -100.0%       0.00 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      0.83 ±160%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      0.09 ± 31%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3.59 ± 11%     -99.6%       0.01 ±158%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      1.60 ± 69%     -99.9%       0.00 ±104%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.81 ± 43%     -99.8%       0.00 ±223%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      1.02 ± 88%    -100.0%       0.00        perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.02 ±  7%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      9.68 ± 32%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.01 ± 49%     -92.3%       0.00 ±223%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
     12.26 ±109%    -100.0%       0.00 ±223%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      5.60 ±139%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.03 ±106%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      2.11 ± 61%     -99.6%       0.01 ±160%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
    171.77 ±217%     -99.7%       0.54 ±195%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      3.67 ± 25%     -99.7%       0.01 ± 47%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     37.84 ± 47%    -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      4.68 ± 36%    -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.21 ±169%     -98.4%       0.00 ±145%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      7.92 ±131%     -99.6%       0.03 ± 75%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.36 ±186%    -100.0%       0.00        perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     33.45 ±  3%     -88.6%       3.82 ± 80%  perf-sched.total_wait_and_delay.average.ms
     97903 ±  4%     -98.0%       1998 ± 22%  perf-sched.total_wait_and_delay.count.ms
      2942 ± 23%     -96.3%     109.30 ± 43%  perf-sched.total_wait_and_delay.max.ms
     33.37 ±  3%     -88.9%       3.71 ± 83%  perf-sched.total_wait_time.average.ms
      2942 ± 23%     -97.2%      81.62 ± 52%  perf-sched.total_wait_time.max.ms
      3.97 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      3.08 ±  4%     -96.4%       0.11 ± 94%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    119.91 ± 38%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    433.73 ± 41%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    302.41 ±  5%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.48 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     23.24 ± 25%     -95.7%       1.01 ± 23%  perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
    327.16 ±  9%     -97.5%       8.12 ±202%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    369.37 ±  2%     -96.6%      12.56 ± 89%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.96 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
    453.60          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
    187.66           -95.3%       8.75 ± 90%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    750.07           -99.0%       7.40 ± 73%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1831 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      1269 ±  8%     -43.3%     719.33 ± 26%  perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      6.17 ± 45%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      5.00          -100.0%       0.00        perf-sched.wait_and_delay.count.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     14.33 ±  5%    -100.0%       0.00        perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
    810.00 ± 10%    -100.0%       0.00        perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      3112 ± 24%     -96.8%     100.67 ± 72%  perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
     40.50 ±  8%     -97.5%       1.00 ±100%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     13.17 ±  2%     -44.3%       7.33 ± 28%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     73021 ±  3%    -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
     40.00          -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.kcompactd.kthread.ret_from_fork
      1122           -98.5%      16.33 ± 78%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     11323 ±  3%     -93.3%     756.17 ± 25%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1887 ± 45%     -99.9%       2.33 ±117%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      1238           -93.4%      81.50 ± 64%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     35.19 ± 57%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      1002           -96.9%      31.26 ± 97%  perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    318.48 ± 65%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1000          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    966.90 ±  7%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
     20.79 ± 19%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      1043           -97.6%      24.88 ±123%  perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      1240 ± 20%     -98.7%      16.23 ±202%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    500.34           -90.4%      47.79 ± 94%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     58.83 ± 39%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
    505.17          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
     19.77 ± 55%     -68.0%       6.33 ± 54%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1237 ± 34%     -93.3%      83.40 ± 33%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1001           -97.3%      27.51 ±141%  perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2794 ± 24%     -97.4%      73.62 ± 55%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     49.27 ±119%    -100.0%       0.00 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
     58.17 ±187%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
      3.78 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      2.99 ±  4%     -98.1%       0.06 ± 95%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      3.92 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      4.71 ±  8%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
      1.67 ± 20%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
      2.10 ± 27%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.01 ± 44%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      1.67 ± 21%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.04 ±133%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
     67.14 ± 73%     -96.0%       2.67 ±208%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      1.65 ± 67%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
      2.30 ± 14%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
     42.44 ±200%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
    119.87 ± 38%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3.80 ± 18%     -99.7%       0.01 ±144%  perf-sched.wait_time.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
    433.32 ± 41%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    250.23 ±107%    -100.0%       0.00        perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
     29.19 ±  5%     -99.0%       0.30 ± 28%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
    302.40 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.40 ±  6%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      4.03 ±  8%     -96.6%       0.14 ±223%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     35.38 ±192%     -99.9%       0.05 ±223%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      0.05 ± 40%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.72 ±220%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      1.00 ±120%     -98.0%       0.02 ±193%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
     23.07 ± 24%     -95.7%       1.00 ± 23%  perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
    326.84 ±  9%     -97.5%       8.14 ±201%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    369.18 ±  2%     -98.0%       7.39 ±103%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.89 ±  6%    -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      1.17 ± 16%    -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    453.58          -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      4.42           -27.8%       3.19 ± 26%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    187.58           -95.4%       8.69 ± 91%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ±156%    -100.0%       0.00        perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
    750.01           -99.2%       6.24 ± 99%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    340.69 ±135%    -100.0%       0.00 ±223%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
    535.09 ±128%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
     22.04 ± 32%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      1001           -98.4%      15.63 ± 97%  perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
     13.57 ± 17%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
     13.54 ± 10%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
     10.17 ± 19%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
     11.35 ± 25%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.01 ± 32%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
     10.62 ±  9%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.20 ±199%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      1559 ± 64%     -99.8%       2.67 ±208%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      6.93 ± 53%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
     14.42 ± 22%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
    159.10 ±148%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
    391.02 ±171%     -99.3%       2.80 ±223%  perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
    318.43 ± 65%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     13.14 ± 21%     -99.9%       0.01 ±158%  perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      1000          -100.0%       0.00        perf-sched.wait_time.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    500.84 ± 99%    -100.0%       0.00        perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
    641.50 ± 23%     -99.0%       6.41 ± 48%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
     10.75 ± 98%     -93.5%       0.70 ±  9%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    966.89 ±  7%    -100.0%       0.00        perf-sched.wait_time.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
     15.80 ±  8%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     16.69 ± 10%     -99.2%       0.14 ±223%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     41.71 ±158%     -99.9%       0.05 ±223%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     11.64 ± 61%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      2.94 ±213%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
    175.70 ±210%    -100.0%       0.06 ±213%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      1043           -97.6%      24.88 ±123%  perf-sched.wait_time.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      1240 ± 20%     -98.7%      16.28 ±201%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    500.11           -94.3%      28.64 ±118%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     32.65 ± 33%    -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
     22.94 ± 56%    -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    505.00          -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
     12.20 ± 43%     -60.5%       4.82 ±  7%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1237 ± 34%     -94.0%      74.19 ± 53%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1000           -97.2%      27.51 ±141%  perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.36 ±190%    -100.0%       0.00        perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      2794 ± 24%     -98.0%      56.88 ± 94%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-28  3:14 ` kernel test robot
@ 2025-01-31 18:38   ` Yang Shi
  2025-02-06  8:02     ` Oliver Sang
  0 siblings, 1 reply; 35+ messages in thread
From: Yang Shi @ 2025-01-31 18:38 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, linux-kernel, arnd, gregkh, Liam.Howlett,
	lorenzo.stoakes, vbabka, jannh, willy, liushixin2, akpm,
	linux-mm




On 1/27/25 7:14 PM, kernel test robot wrote:
> hi, All,
>
> we don't have enough knowledge to understand fully the discussion for this
> patch, we saw "NACK" but there were more discussions later.
> so below report is just FYI what we observed in our tests. thanks

Thanks for the report. It was nack'ed because of the change to 
smaps/maps files in proc.

>
> Hello,
>
> kernel test robot noticed a 858.5% improvement of vm-scalability.throughput on:
>
>
> commit: 7143ee2391f1ea15e6791e129870473543634de2 ("[PATCH] /dev/zero: make private mapping full anonymous mapping")
> url: https://github.com/intel-lab-lkp/linux/commits/Yang-Shi/dev-zero-make-private-mapping-full-anonymous-mapping/20250114-063339
> base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git a68d3cbfade64392507302f3a920113b60dc811f
> patch link: https://lore.kernel.org/all/20250113223033.4054534-1-yang@os.amperecomputing.com/
> patch subject: [PATCH] /dev/zero: make private mapping full anonymous mapping
>
> testcase: vm-scalability
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> parameters:
>
> 	runtime: 300s
> 	test: small-allocs

It seems this benchmark allocates huge amount of small areas (each area 
is as big as 40K) by mmap'ing /dev/zero.

This patch makes /dev/zero mapping a full anonymous mapping, so the 
later vma_link_file() is actually skipped, which needs acquire file rmap 
lock then insert the mapping into file rmap tree. The below profiling 
also showed this.
Quoted here so that we don't have to scroll down:

>       95.60           -95.2        0.42 ±113%  perf-profile.children.cycles-pp.__mmap
>       94.14           -93.6        0.54 ±106%  perf-profile.children.cycles-pp.__mmap_new_vma
>       93.79           -93.6        0.21 ±171%  perf-profile.children.cycles-pp.vma_link_file
>       93.40           -93.4        0.00        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
>       93.33           -93.3        0.00        perf-profile.children.cycles-pp.rwsem_optimistic_spin
>       93.44           -93.2        0.24 ±178%  perf-profile.children.cycles-pp.down_write
>       94.55           -93.1        1.40 ± 51%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
>       94.25           -93.0        1.30 ± 59%  perf-profile.children.cycles-pp.__mmap_region
>       92.91           -92.9        0.00        perf-profile.children.cycles-pp.osq_lock
>       94.45           -92.7        1.72 ± 34%  perf-profile.children.cycles-pp.do_mmap
>       94.46           -92.6        1.83 ± 31%  perf-profile.children.cycles-pp.vm_mmap_pgoff

It significantly speed up mmap for this benchmark and the rmap lock 
contention is reduced significantly for both multi-processes and 
multi-threads.

The benchmark itself may exaggerate the improvement, but it may really 
speed up some real life workloads. For example, multiple applications 
which may allocate anonymous mapping by mmap'ing /dev/zero, then they 
may have contention on /dev/zero's rmap lock.

It doesn't make too much sense to link /dev/zero anonymous vmas to the 
file rmap tree. So the below patch should be able to speed up the 
benchmark too.

Oliver, can you please give this patch a try?


diff --git a/mm/vma.c b/mm/vma.c
index bb2119e5a0d0..1092222c40ae 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1633,6 +1633,9 @@ static void unlink_file_vma_batch_process(struct 
unlink_vma_file_batch *vb)
  void unlink_file_vma_batch_add(struct unlink_vma_file_batch *vb,
                                struct vm_area_struct *vma)
  {
+       if (vma_is_anonymous(vma))
+               return;
+
         if (vma->vm_file == NULL)
                 return;

@@ -1658,6 +1661,9 @@ void unlink_file_vma(struct vm_area_struct *vma)
  {
         struct file *file = vma->vm_file;

+       if (vma_is_anonymous(vma))
+               return;
+
         if (file) {
                 struct address_space *mapping = file->f_mapping;

@@ -1672,6 +1678,9 @@ void vma_link_file(struct vm_area_struct *vma)
         struct file *file = vma->vm_file;
         struct address_space *mapping;

+       if (vma_is_anonymous(vma))
+               return;
+
         if (file) {
                 mapping = file->f_mapping;
                 i_mmap_lock_write(mapping);


Because /dev/zero's private mapping is an anonymous mapping with valid 
vm_file, so we need to bail out early if the vma is anonymous even 
though it has vm_file. IMHO, making /dev/zero private mapping a full 
anonymous mapping looks more clean.

> 	cpufreq_governor: performance
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250128/202501281038.617c6b60-lkp@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
>    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability
>
> commit:
>    a68d3cbfad ("memstick: core: fix kernel-doc notation")
>    7143ee2391 ("/dev/zero: make private mapping full anonymous mapping")
>
> a68d3cbfade64392 7143ee2391f1ea15e6791e12987
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>   5.262e+09 ±  3%     -67.6%  1.705e+09 ±  3%  cpuidle..time
>     7924008 ±  3%     -88.9%     875849 ±  3%  cpuidle..usage
>     1585617 ±  5%     +13.5%    1799302 ±  2%  numa-numastat.node1.local_node
>     1667793 ±  4%     +13.2%    1887467 ±  2%  numa-numastat.node1.numa_hit
>      399.52           -78.0%      87.79        uptime.boot
>       14507           -24.4%      10963        uptime.idle
>        3408 ±  5%     -99.6%      13.00 ± 40%  perf-c2c.DRAM.local
>       18076 ±  3%     -99.8%      38.67 ± 36%  perf-c2c.DRAM.remote
>        8082 ±  5%     -99.8%      19.33 ± 52%  perf-c2c.HITM.local
>        6544 ±  6%     -99.8%      14.17 ± 35%  perf-c2c.HITM.remote
>       14627 ±  4%     -99.8%      33.50 ± 34%  perf-c2c.HITM.total
>        6.49 ±  3%     +10.5       17.04 ±  7%  mpstat.cpu.all.idle%
>        0.63            -0.3        0.35 ±  2%  mpstat.cpu.all.irq%
>        0.03 ±  2%      +0.2        0.18 ±  6%  mpstat.cpu.all.soft%
>       91.17           -29.6       61.57 ±  2%  mpstat.cpu.all.sys%
>        1.68 ±  2%     +19.2       20.86 ±  2%  mpstat.cpu.all.usr%
>      337.33           -95.3%      15.83 ± 35%  mpstat.max_utilization.seconds
>        6.99 ±  3%    +190.2%      20.30 ±  5%  vmstat.cpu.id
>       91.35           -34.8%      59.59 ±  2%  vmstat.cpu.sy
>        1.71         +1073.6%      20.04 ±  2%  vmstat.cpu.us
>      210.36           -12.7%     183.65        vmstat.procs.r
>       34204 ±  5%     -41.8%      19899 ±  6%  vmstat.system.cs
>      266575           -23.1%     205001        vmstat.system.in
>     1609925           -50.9%     790974        meminfo.Active
>     1609925           -50.9%     790974        meminfo.Active(anon)
>      160837 ± 33%     -77.3%      36534 ± 11%  meminfo.AnonHugePages
>     4435665           -18.7%    3606310        meminfo.Cached
>     1775547           -44.6%     983546        meminfo.Committed_AS
>      148539           -47.7%      77658 ±  2%  meminfo.Mapped
>    25332110 ±  3%      -7.7%   23373667        meminfo.Memused
>     4245538 ±  4%     -26.2%    3134309        meminfo.PageTables
>    14166291 ±  4%     -11.9%   12484042        meminfo.SUnreclaim
>      929777           -89.1%     100886        meminfo.Shmem
>    14315492 ±  4%     -11.8%   12624243        meminfo.Slab
>     1063552 ±  4%     -27.8%     767817 ± 12%  numa-meminfo.node0.PageTables
>      125455 ±106%     -83.3%      20992 ±155%  numa-meminfo.node0.Shmem
>       48482 ± 67%     -44.8%      26748 ±127%  numa-meminfo.node1.Mapped
>     1062709 ±  4%     -21.9%     829672        numa-meminfo.node1.PageTables
>     1058901 ±  4%     -27.5%     767469 ± 14%  numa-meminfo.node2.PageTables
>      770405 ± 30%     -74.0%     200464 ± 77%  numa-meminfo.node3.Active
>      770405 ± 30%     -74.0%     200464 ± 77%  numa-meminfo.node3.Active(anon)
>     1146977 ±108%     -94.5%      63226 ±114%  numa-meminfo.node3.FilePages
>       52663 ± 47%     -97.8%       1141 ± 55%  numa-meminfo.node3.Mapped
>     6368902 ± 20%     -23.5%    4869231 ± 12%  numa-meminfo.node3.MemUsed
>     1058539 ±  4%     -27.8%     764243 ± 12%  numa-meminfo.node3.PageTables
>      558943 ± 14%     -97.0%      16946 ±195%  numa-meminfo.node3.Shmem
>       64129 ±  4%    +885.2%     631788 ±  3%  vm-scalability.median
>       45.40 ±  5%   +1368.7        1414 ±  5%  vm-scalability.stddev%
>    14364828 ±  4%    +858.5%  1.377e+08 ±  3%  vm-scalability.throughput
>      352.76           -88.2%      41.52 ±  3%  vm-scalability.time.elapsed_time
>      352.76           -88.2%      41.52 ±  3%  vm-scalability.time.elapsed_time.max
>      225965 ±  7%     +62.0%     365969 ±  2%  vm-scalability.time.involuntary_context_switches
>   9.592e+08 ±  4%     +11.9%  1.074e+09        vm-scalability.time.minor_page_faults
>       20852            -9.7%      18831        vm-scalability.time.percent_of_cpu_this_job_got
>       72302           -91.9%       5866 ±  4%  vm-scalability.time.system_time
>        1260 ±  3%     +54.9%       1953        vm-scalability.time.user_time
>     5393707 ±  5%     -99.6%      21840 ± 49%  vm-scalability.time.voluntary_context_switches
>   4.316e+09 ±  4%     +11.9%  4.832e+09        vm-scalability.workload
>      265763 ±  4%     -27.8%     191828 ± 11%  numa-vmstat.node0.nr_page_table_pages
>       31364 ±106%     -83.0%       5332 ±156%  numa-vmstat.node0.nr_shmem
>       12205 ± 67%     -44.4%       6791 ±127%  numa-vmstat.node1.nr_mapped
>      265546 ±  4%     -21.8%     207663        numa-vmstat.node1.nr_page_table_pages
>     1667048 ±  4%     +13.2%    1886422 ±  2%  numa-vmstat.node1.numa_hit
>     1584872 ±  5%     +13.5%    1798258 ±  2%  numa-vmstat.node1.numa_local
>      264589 ±  4%     -27.1%     192920 ± 14%  numa-vmstat.node2.nr_page_table_pages
>      192683 ± 30%     -73.9%      50195 ± 76%  numa-vmstat.node3.nr_active_anon
>      286819 ±108%     -94.5%      15799 ±114%  numa-vmstat.node3.nr_file_pages
>       13124 ± 49%     -97.8%     285.03 ± 55%  numa-vmstat.node3.nr_mapped
>      264499 ±  4%     -27.4%     192027 ± 12%  numa-vmstat.node3.nr_page_table_pages
>      139810 ± 14%     -97.0%       4229 ±195%  numa-vmstat.node3.nr_shmem
>      192683 ± 30%     -73.9%      50195 ± 76%  numa-vmstat.node3.nr_zone_active_anon
>      402515           -50.8%     197849        proc-vmstat.nr_active_anon
>      170568            +1.8%     173597        proc-vmstat.nr_anon_pages
>       78.63 ± 33%     -77.4%      17.80 ± 11%  proc-vmstat.nr_anon_transparent_hugepages
>     4257257            +1.1%    4305540        proc-vmstat.nr_dirty_background_threshold
>     8524925            +1.1%    8621607        proc-vmstat.nr_dirty_threshold
>     1109246           -18.7%     901907        proc-vmstat.nr_file_pages
>    42815276            +1.1%   43299295        proc-vmstat.nr_free_pages
>       37525           -47.6%      19653 ±  2%  proc-vmstat.nr_mapped
>     1059932 ±  4%     -26.0%     784175        proc-vmstat.nr_page_table_pages
>      232507           -89.1%      25298        proc-vmstat.nr_shmem
>       37297            -6.0%      35048        proc-vmstat.nr_slab_reclaimable
>     3537843 ±  4%     -11.8%    3120130        proc-vmstat.nr_slab_unreclaimable
>      402515           -50.8%     197849        proc-vmstat.nr_zone_active_anon
>       61931 ±  8%     -73.8%      16233 ± 34%  proc-vmstat.numa_hint_faults
>       15755 ± 21%     -89.8%       1609 ±117%  proc-vmstat.numa_hint_faults_local
>      293942 ±  3%     -66.1%      99500 ± 20%  proc-vmstat.numa_pte_updates
>   9.608e+08 ±  4%     +11.8%  1.074e+09        proc-vmstat.pgfault
>       55981 ±  2%     -69.0%      17375 ±  8%  proc-vmstat.pgreuse
>        0.82 ±  4%     -60.7%       0.32 ±  3%  perf-stat.i.MPKI
>   2.714e+10 ±  2%    +413.1%  1.393e+11 ±  3%  perf-stat.i.branch-instructions
>        0.11 ±  3%      +0.1        0.19 ±  2%  perf-stat.i.branch-miss-rate%
>    24932893          +321.8%  1.052e+08 ±  3%  perf-stat.i.branch-misses
>       64.93            -7.4       57.53        perf-stat.i.cache-miss-rate%
>    88563288 ±  3%     +50.5%  1.333e+08 ±  3%  perf-stat.i.cache-misses
>   1.369e+08 ±  3%     +55.8%  2.134e+08 ±  3%  perf-stat.i.cache-references
>       34508 ±  4%     -39.5%      20864 ±  6%  perf-stat.i.context-switches
>        7.67           -79.6%       1.57 ±  2%  perf-stat.i.cpi
>   7.989e+11            -7.6%  7.383e+11 ±  2%  perf-stat.i.cpu-cycles
>      696.35 ±  2%     -52.8%     328.76 ±  2%  perf-stat.i.cpu-migrations
>       10834 ±  4%     -32.9%       7272 ±  4%  perf-stat.i.cycles-between-cache-misses
>   1.102e+11          +310.6%  4.525e+11 ±  3%  perf-stat.i.instructions
>        0.14          +426.9%       0.75 ±  2%  perf-stat.i.ipc
>       24.25 ±  3%    +855.3%     231.63 ±  3%  perf-stat.i.metric.K/sec
>     2722043 ±  3%    +867.7%   26340617 ±  3%  perf-stat.i.minor-faults
>     2722043 ±  3%    +867.7%   26340616 ±  3%  perf-stat.i.page-faults
>        0.81 ±  3%     -63.3%       0.30 ±  2%  perf-stat.overall.MPKI
>        0.09            -0.0        0.07 ±  2%  perf-stat.overall.branch-miss-rate%
>       64.81            -2.1       62.72        perf-stat.overall.cache-miss-rate%
>        7.24           -77.5%       1.63 ±  3%  perf-stat.overall.cpi
>        8933 ±  4%     -38.7%       5479 ±  4%  perf-stat.overall.cycles-between-cache-misses
>        0.14          +344.4%       0.61 ±  3%  perf-stat.overall.ipc
>        9012 ±  2%     -57.9%       3797        perf-stat.overall.path-length
>   2.701e+10 ±  2%    +396.9%  1.342e+11 ±  3%  perf-stat.ps.branch-instructions
>    24708939          +305.5%  1.002e+08 ±  4%  perf-stat.ps.branch-misses
>    89032538 ±  3%     +45.9%  1.299e+08 ±  3%  perf-stat.ps.cache-misses
>   1.374e+08 ±  3%     +50.8%  2.071e+08 ±  3%  perf-stat.ps.cache-references
>       34266 ±  5%     -41.1%      20179 ±  7%  perf-stat.ps.context-switches
>      223334            -2.2%     218529        perf-stat.ps.cpu-clock
>   7.941e+11           -10.5%   7.11e+11        perf-stat.ps.cpu-cycles
>      693.54 ±  2%     -54.7%     314.08 ±  2%  perf-stat.ps.cpu-migrations
>   1.097e+11          +297.8%  4.362e+11 ±  3%  perf-stat.ps.instructions
>     2710577 ±  3%    +836.2%   25375552 ±  3%  perf-stat.ps.minor-faults
>     2710577 ±  3%    +836.2%   25375552 ±  3%  perf-stat.ps.page-faults
>      223334            -2.2%     218529        perf-stat.ps.task-clock
>   3.886e+13 ±  2%     -52.8%  1.835e+13        perf-stat.total.instructions
>    64052898 ±  5%     -99.8%     124999 ± 22%  sched_debug.cfs_rq:/.avg_vruntime.avg
>    95701822 ±  7%     -96.4%    3453252 ±  6%  sched_debug.cfs_rq:/.avg_vruntime.max
>    43098762 ±  6%    -100.0%     148.27 ± 21%  sched_debug.cfs_rq:/.avg_vruntime.min
>     9223270 ±  9%     -94.6%     495929 ± 17%  sched_debug.cfs_rq:/.avg_vruntime.stddev
>        0.78 ±  2%     -94.6%       0.04 ± 22%  sched_debug.cfs_rq:/.h_nr_running.avg
>        0.28 ±  7%     -28.9%       0.20 ± 10%  sched_debug.cfs_rq:/.h_nr_running.stddev
>      411536 ± 58%    -100.0%       3.77 ±141%  sched_debug.cfs_rq:/.left_deadline.avg
>    43049468 ± 22%    -100.0%     844.45 ±141%  sched_debug.cfs_rq:/.left_deadline.max
>     3836405 ± 37%    -100.0%      56.30 ±141%  sched_debug.cfs_rq:/.left_deadline.stddev
>      411536 ± 58%    -100.0%       3.62 ±141%  sched_debug.cfs_rq:/.left_vruntime.avg
>    43049467 ± 22%    -100.0%     809.82 ±141%  sched_debug.cfs_rq:/.left_vruntime.max
>     3836405 ± 37%    -100.0%      53.99 ±141%  sched_debug.cfs_rq:/.left_vruntime.stddev
>        8792 ± 28%     -81.8%       1600 ±106%  sched_debug.cfs_rq:/.load.avg
>    64052901 ±  5%     -99.8%     124999 ± 22%  sched_debug.cfs_rq:/.min_vruntime.avg
>    95701822 ±  7%     -96.4%    3453252 ±  6%  sched_debug.cfs_rq:/.min_vruntime.max
>    43098762 ±  6%    -100.0%     148.27 ± 21%  sched_debug.cfs_rq:/.min_vruntime.min
>     9223270 ±  9%     -94.6%     495929 ± 17%  sched_debug.cfs_rq:/.min_vruntime.stddev
>        0.77 ±  2%     -94.6%       0.04 ± 22%  sched_debug.cfs_rq:/.nr_running.avg
>        0.26 ± 10%     -22.4%       0.20 ± 10%  sched_debug.cfs_rq:/.nr_running.stddev
>      411536 ± 58%    -100.0%       3.62 ±141%  sched_debug.cfs_rq:/.right_vruntime.avg
>    43049467 ± 22%    -100.0%     809.82 ±141%  sched_debug.cfs_rq:/.right_vruntime.max
>     3836405 ± 37%    -100.0%      53.99 ±141%  sched_debug.cfs_rq:/.right_vruntime.stddev
>      286633 ± 43%    +421.0%    1493420 ± 42%  sched_debug.cfs_rq:/.runnable_avg.avg
>    34728895 ± 30%    +380.1%  1.667e+08 ± 27%  sched_debug.cfs_rq:/.runnable_avg.max
>     2845573 ± 30%    +406.5%   14411856 ± 30%  sched_debug.cfs_rq:/.runnable_avg.stddev
>      769.03           -85.4%     112.18 ±  6%  sched_debug.cfs_rq:/.util_avg.avg
>        1621 ±  5%     -39.3%     983.67 ±  9%  sched_debug.cfs_rq:/.util_avg.max
>      159.12 ±  8%     +26.6%     201.45 ±  6%  sched_debug.cfs_rq:/.util_avg.stddev
>      724.17 ±  2%     -98.8%       8.91 ± 43%  sched_debug.cfs_rq:/.util_est.avg
>        1360 ± 15%     -52.9%     640.17 ± 13%  sched_debug.cfs_rq:/.util_est.max
>      234.34 ±  9%     -71.0%      67.88 ± 27%  sched_debug.cfs_rq:/.util_est.stddev
>      766944 ±  3%     +18.9%     911838        sched_debug.cpu.avg_idle.avg
>     1067639 ±  5%     +31.7%    1406047 ± 12%  sched_debug.cpu.avg_idle.max
>      321459 ±  2%     -37.0%     202531 ±  7%  sched_debug.cpu.avg_idle.stddev
>      195573           -76.7%      45494        sched_debug.cpu.clock.avg
>      195596           -76.7%      45510        sched_debug.cpu.clock.max
>      195548           -76.7%      45471        sched_debug.cpu.clock.min
>       13.79 ±  3%     -36.2%       8.80 ±  2%  sched_debug.cpu.clock.stddev
>      194424           -76.7%      45370        sched_debug.cpu.clock_task.avg
>      194608           -76.6%      45496        sched_debug.cpu.clock_task.max
>      181834           -81.8%      33106        sched_debug.cpu.clock_task.min
>        4241 ±  2%     -96.8%     134.16 ± 27%  sched_debug.cpu.curr->pid.avg
>        9799 ±  2%     -59.8%       3941        sched_debug.cpu.curr->pid.max
>        1365 ± 10%     -49.6%     688.63 ± 13%  sched_debug.cpu.curr->pid.stddev
>      537665 ±  4%     +31.3%     705893 ±  9%  sched_debug.cpu.max_idle_balance_cost.max
>        3119 ± 56%    +590.3%      21534 ± 34%  sched_debug.cpu.max_idle_balance_cost.stddev
>        0.00 ± 12%     -70.8%       0.00 ± 12%  sched_debug.cpu.next_balance.stddev
>        0.78 ±  2%     -95.2%       0.04 ± 25%  sched_debug.cpu.nr_running.avg
>        2.17 ±  8%     -46.2%       1.17 ± 31%  sched_debug.cpu.nr_running.max
>        0.29 ±  8%     -34.0%       0.19 ± 12%  sched_debug.cpu.nr_running.stddev
>       25773 ±  5%     -97.0%     783.41 ±  5%  sched_debug.cpu.nr_switches.avg
>       48669 ± 10%     -76.8%      11301 ± 18%  sched_debug.cpu.nr_switches.max
>       19006 ±  7%     -99.2%     156.50 ± 11%  sched_debug.cpu.nr_switches.min
>        4142 ±  8%     -68.9%       1290 ± 12%  sched_debug.cpu.nr_switches.stddev
>        0.07 ± 23%     -94.0%       0.00 ± 57%  sched_debug.cpu.nr_uninterruptible.avg
>      240.19 ± 16%     -81.7%      44.00 ± 19%  sched_debug.cpu.nr_uninterruptible.max
>      -77.92           -84.6%     -12.00        sched_debug.cpu.nr_uninterruptible.min
>       37.87 ±  5%     -85.2%       5.60 ± 12%  sched_debug.cpu.nr_uninterruptible.stddev
>      195549           -76.7%      45480        sched_debug.cpu_clk
>      194699           -77.1%      44630        sched_debug.ktime
>        0.00          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.avg
>        0.17          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.max
>        0.01          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.stddev
>      196368           -76.4%      46311        sched_debug.sched_clk
>       95.59           -95.6        0.00        perf-profile.calltrace.cycles-pp.__mmap
>       95.54           -95.5        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       95.54           -95.5        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
>       94.54           -94.5        0.00        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       94.46           -94.1        0.31 ±101%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       94.14           -93.8        0.37 ±105%  perf-profile.calltrace.cycles-pp.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
>       93.79           -93.6        0.16 ±223%  perf-profile.calltrace.cycles-pp.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
>       93.44           -93.4        0.00        perf-profile.calltrace.cycles-pp.down_write.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap
>       93.40           -93.4        0.00        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>       93.33           -93.3        0.00        perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma
>       94.25           -93.3        0.98 ± 82%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>       94.45           -93.0        1.40 ± 51%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       92.89           -92.9        0.00        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file
>        0.00            +1.7        1.73 ± 34%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exec_mmap.begin_new_exec.load_elf_binary
>        0.00            +1.8        1.82 ± 56%  perf-profile.calltrace.cycles-pp.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
>        0.00            +1.9        1.85 ± 31%  perf-profile.calltrace.cycles-pp.__mmput.exec_mmap.begin_new_exec.load_elf_binary.search_binary_handler
>        0.00            +1.9        1.85 ± 31%  perf-profile.calltrace.cycles-pp.begin_new_exec.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve
>        0.00            +1.9        1.85 ± 31%  perf-profile.calltrace.cycles-pp.exec_mmap.begin_new_exec.load_elf_binary.search_binary_handler.exec_binprm
>        0.00            +2.3        2.28 ± 38%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>        0.00            +2.5        2.48 ± 25%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>        0.00            +2.5        2.48 ± 25%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
>        0.00            +2.5        2.50 ± 48%  perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
>        0.00            +2.5        2.52 ± 31%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +2.5        2.52 ± 31%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +2.7        2.68 ± 27%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault
>        0.00            +2.7        2.71 ± 40%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
>        0.00            +2.7        2.71 ± 40%  perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +2.8        2.76 ± 59%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
>        0.00            +2.8        2.85 ± 54%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry
>        0.00            +2.8        2.85 ± 54%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
>        0.00            +3.0        2.96 ± 53%  perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
>        0.00            +3.0        2.99 ± 53%  perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +3.0        2.99 ± 53%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +3.0        2.99 ± 53%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +3.0        2.99 ± 53%  perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +3.0        3.02 ± 31%  perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
>        0.00            +3.0        3.02 ± 31%  perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common
>        0.00            +3.0        3.02 ± 31%  perf-profile.calltrace.cycles-pp.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve
>        0.00            +3.0        3.03 ± 52%  perf-profile.calltrace.cycles-pp._Fork
>        0.00            +3.3        3.31 ± 26%  perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.5        3.52 ± 20%  perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
>        0.00            +3.5        3.52 ± 20%  perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
>        0.00            +3.5        3.52 ± 20%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
>        0.00            +3.5        3.52 ± 20%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve
>        0.00            +3.5        3.52 ± 20%  perf-profile.calltrace.cycles-pp.execve
>        0.00            +3.5        3.54 ± 41%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput
>        0.00            +3.5        3.54 ± 41%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
>        0.00            +3.7        3.69 ± 37%  perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.7        3.69 ± 37%  perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
>        0.00            +3.9        3.89 ± 50%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
>        0.00            +3.9        3.94 ± 44%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>        0.00            +4.2        4.18 ± 91%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
>        0.00            +4.2        4.18 ± 91%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>        0.00            +4.2        4.18 ± 91%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>        0.00            +5.5        5.54 ± 38%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
>        0.00            +5.8        5.85 ± 27%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
>        0.00            +6.5        6.50 ± 62%  perf-profile.calltrace.cycles-pp.handle_internal_command.main
>        0.00            +6.5        6.50 ± 62%  perf-profile.calltrace.cycles-pp.main
>        0.00            +6.5        6.50 ± 62%  perf-profile.calltrace.cycles-pp.run_builtin.handle_internal_command.main
>        0.00            +9.1        9.05 ± 54%  perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.00            +9.1        9.05 ± 54%  perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.00            +9.4        9.38 ± 52%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.00            +9.5        9.48 ± 52%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
>        0.00            +9.9        9.92 ± 57%  perf-profile.calltrace.cycles-pp.read
>        0.00           +12.0       11.98 ± 50%  perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
>        0.00           +18.8       18.83 ± 38%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00           +18.8       18.83 ± 38%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>        1.21 ±  3%     +34.3       35.50 ± 18%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
>        1.21 ±  3%     +34.8       35.97 ± 18%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>        1.21 ±  3%     +35.0       36.19 ± 16%  perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
>        1.21 ±  3%     +35.1       36.30 ± 16%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
>        1.21 ±  3%     +35.1       36.30 ± 16%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>        1.22 ±  3%     +35.5       36.71 ± 18%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
>        1.22 ±  3%     +35.5       36.71 ± 18%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>        1.22 ±  3%     +35.5       36.71 ± 18%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
>        1.22 ±  3%     +36.4       37.61 ± 15%  perf-profile.calltrace.cycles-pp.common_startup_64
>        2.19 ±  3%     +49.9       52.08 ± 18%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
>       95.60           -95.2        0.42 ±113%  perf-profile.children.cycles-pp.__mmap
>       94.14           -93.6        0.54 ±106%  perf-profile.children.cycles-pp.__mmap_new_vma
>       93.79           -93.6        0.21 ±171%  perf-profile.children.cycles-pp.vma_link_file
>       93.40           -93.4        0.00        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
>       93.33           -93.3        0.00        perf-profile.children.cycles-pp.rwsem_optimistic_spin
>       93.44           -93.2        0.24 ±178%  perf-profile.children.cycles-pp.down_write
>       94.55           -93.1        1.40 ± 51%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
>       94.25           -93.0        1.30 ± 59%  perf-profile.children.cycles-pp.__mmap_region
>       92.91           -92.9        0.00        perf-profile.children.cycles-pp.osq_lock
>       94.45           -92.7        1.72 ± 34%  perf-profile.children.cycles-pp.do_mmap
>       94.46           -92.6        1.83 ± 31%  perf-profile.children.cycles-pp.vm_mmap_pgoff
>       95.58           -45.3       50.30 ±  6%  perf-profile.children.cycles-pp.do_syscall_64
>       95.58           -45.2       50.40 ±  6%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>        0.00            +1.2        1.22 ± 40%  perf-profile.children.cycles-pp._raw_spin_lock
>        0.00            +1.3        1.26 ± 34%  perf-profile.children.cycles-pp.seq_printf
>        0.00            +1.3        1.32 ± 78%  perf-profile.children.cycles-pp.kmem_cache_free
>        0.00            +1.6        1.60 ± 42%  perf-profile.children.cycles-pp.sched_balance_rq
>        0.00            +1.7        1.73 ± 41%  perf-profile.children.cycles-pp.open_last_lookups
>        0.00            +1.9        1.85 ± 31%  perf-profile.children.cycles-pp.begin_new_exec
>        0.00            +1.9        1.85 ± 31%  perf-profile.children.cycles-pp.exec_mmap
>        0.00            +2.1        2.09 ± 40%  perf-profile.children.cycles-pp.do_pte_missing
>        0.46            +2.4        2.85 ± 54%  perf-profile.children.cycles-pp.__hrtimer_run_queues
>        0.53            +2.4        2.94 ± 49%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
>        0.53            +2.4        2.94 ± 49%  perf-profile.children.cycles-pp.hrtimer_interrupt
>        0.00            +2.4        2.44 ±101%  perf-profile.children.cycles-pp.__evlist__enable
>        0.00            +2.5        2.54 ± 45%  perf-profile.children.cycles-pp.zap_present_ptes
>        0.00            +2.6        2.58 ± 54%  perf-profile.children.cycles-pp.mutex_unlock
>        0.00            +2.7        2.68 ± 67%  perf-profile.children.cycles-pp.evlist_cpu_iterator__next
>        0.00            +2.7        2.71 ± 40%  perf-profile.children.cycles-pp.__x64_sys_exit_group
>        0.00            +2.7        2.71 ± 40%  perf-profile.children.cycles-pp.x64_sys_call
>        0.00            +3.0        2.99 ± 53%  perf-profile.children.cycles-pp.__do_sys_clone
>        0.00            +3.0        2.99 ± 53%  perf-profile.children.cycles-pp.kernel_clone
>        0.00            +3.0        3.02 ± 31%  perf-profile.children.cycles-pp.exec_binprm
>        0.00            +3.0        3.02 ± 31%  perf-profile.children.cycles-pp.load_elf_binary
>        0.00            +3.0        3.02 ± 31%  perf-profile.children.cycles-pp.search_binary_handler
>        0.00            +3.0        3.03 ± 52%  perf-profile.children.cycles-pp._Fork
>        0.00            +3.3        3.31 ± 26%  perf-profile.children.cycles-pp.bprm_execve
>        0.58 ±  2%      +3.4        3.98 ± 47%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
>        0.00            +3.5        3.52 ± 20%  perf-profile.children.cycles-pp.execve
>        0.04 ± 44%      +3.7        3.72 ± 18%  perf-profile.children.cycles-pp.__schedule
>        0.00            +3.7        3.72 ± 14%  perf-profile.children.cycles-pp.__x64_sys_execve
>        0.00            +3.7        3.72 ± 14%  perf-profile.children.cycles-pp.do_execveat_common
>        0.51 ±  6%      +3.7        4.25 ± 31%  perf-profile.children.cycles-pp.handle_mm_fault
>        0.00            +3.8        3.79 ± 40%  perf-profile.children.cycles-pp.zap_pte_range
>        0.00            +3.9        3.90 ± 26%  perf-profile.children.cycles-pp.do_filp_open
>        0.00            +3.9        3.90 ± 26%  perf-profile.children.cycles-pp.path_openat
>        0.00            +3.9        3.91 ± 43%  perf-profile.children.cycles-pp.unmap_page_range
>        0.00            +3.9        3.91 ± 43%  perf-profile.children.cycles-pp.zap_pmd_range
>        1.18            +4.0        5.20 ± 19%  perf-profile.children.cycles-pp.asm_exc_page_fault
>        0.19 ± 23%      +4.0        4.21 ± 32%  perf-profile.children.cycles-pp.__handle_mm_fault
>        0.77 ±  3%      +4.0        4.79 ± 27%  perf-profile.children.cycles-pp.exc_page_fault
>        0.76 ±  3%      +4.0        4.79 ± 27%  perf-profile.children.cycles-pp.do_user_addr_fault
>        0.00            +4.1        4.13 ± 38%  perf-profile.children.cycles-pp.do_sys_openat2
>        0.00            +4.2        4.15 ± 35%  perf-profile.children.cycles-pp.unmap_vmas
>        0.00            +4.2        4.18 ± 91%  perf-profile.children.cycles-pp.kthread
>        0.00            +4.2        4.22 ± 91%  perf-profile.children.cycles-pp.ret_from_fork
>        0.00            +4.2        4.22 ± 91%  perf-profile.children.cycles-pp.ret_from_fork_asm
>        0.00            +4.3        4.25 ± 37%  perf-profile.children.cycles-pp.__x64_sys_openat
>        0.00            +5.5        5.54 ± 38%  perf-profile.children.cycles-pp.exit_mm
>        0.00            +6.1        6.09 ± 48%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
>        0.02 ±141%      +6.5        6.50 ± 62%  perf-profile.children.cycles-pp.__cmd_record
>        0.02 ±141%      +6.5        6.50 ± 62%  perf-profile.children.cycles-pp.cmd_record
>        0.02 ±141%      +6.5        6.50 ± 62%  perf-profile.children.cycles-pp.handle_internal_command
>        0.02 ±141%      +6.5        6.50 ± 62%  perf-profile.children.cycles-pp.main
>        0.02 ±141%      +6.5        6.50 ± 62%  perf-profile.children.cycles-pp.run_builtin
>        0.00            +7.3        7.28 ± 26%  perf-profile.children.cycles-pp.exit_mmap
>        0.00            +7.4        7.40 ± 27%  perf-profile.children.cycles-pp.__mmput
>        0.00            +8.5        8.52 ± 58%  perf-profile.children.cycles-pp.seq_read_iter
>        0.00            +8.6        8.56 ± 52%  perf-profile.children.cycles-pp.__fput
>        0.00            +9.1        9.05 ± 54%  perf-profile.children.cycles-pp.ksys_read
>        0.00            +9.1        9.05 ± 54%  perf-profile.children.cycles-pp.vfs_read
>        0.00            +9.7        9.72 ± 54%  perf-profile.children.cycles-pp.read
>        0.00           +16.0       16.03 ± 41%  perf-profile.children.cycles-pp.do_exit
>        0.00           +16.0       16.03 ± 41%  perf-profile.children.cycles-pp.do_group_exit
>        1.70 ±  2%     +26.7       28.38 ± 16%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>        1.21 ±  3%     +35.0       36.19 ± 16%  perf-profile.children.cycles-pp.acpi_idle_do_entry
>        1.21 ±  3%     +35.0       36.19 ± 16%  perf-profile.children.cycles-pp.acpi_safe_halt
>        1.21 ±  3%     +35.1       36.30 ± 16%  perf-profile.children.cycles-pp.acpi_idle_enter
>        1.21 ±  3%     +35.1       36.30 ± 16%  perf-profile.children.cycles-pp.cpuidle_enter_state
>        1.21 ±  3%     +35.2       36.40 ± 15%  perf-profile.children.cycles-pp.cpuidle_enter
>        1.22 ±  3%     +35.5       36.71 ± 18%  perf-profile.children.cycles-pp.start_secondary
>        1.22 ±  3%     +35.7       36.87 ± 15%  perf-profile.children.cycles-pp.cpuidle_idle_call
>        1.22 ±  3%     +36.4       37.61 ± 15%  perf-profile.children.cycles-pp.common_startup_64
>        1.22 ±  3%     +36.4       37.61 ± 15%  perf-profile.children.cycles-pp.cpu_startup_entry
>        1.22 ±  3%     +36.4       37.61 ± 15%  perf-profile.children.cycles-pp.do_idle
>       92.37           -92.4        0.00        perf-profile.self.cycles-pp.osq_lock
>        1.19 ±  3%     +29.6       30.75 ± 22%  perf-profile.self.cycles-pp.acpi_safe_halt
>        0.17 ±142%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>        0.19 ± 34%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        0.14 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>        0.14 ± 73%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>        0.10 ± 66%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>        0.11 ± 59%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.04 ±132%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>        0.07 ±101%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.02 ± 31%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        0.02 ±143%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
>        0.10 ± 44%     -99.5%       0.00 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>        0.12 ±145%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>        0.04 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        0.25 ± 41%     -95.8%       0.01 ±144%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
>        0.11 ± 59%     -99.1%       0.00 ±115%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
>        0.40 ± 50%     -99.6%       0.00 ±223%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.32 ±104%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>        0.01 ± 12%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        0.08 ± 28%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        0.01 ± 42%     -90.6%       0.00 ±223%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
>        0.18 ± 57%     -99.8%       0.00 ±223%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        0.03 ± 83%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
>        0.01 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>        0.32 ± 47%     -97.1%       0.01 ± 55%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>        0.07 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        0.26 ± 17%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>        0.02 ± 60%     -83.3%       0.00 ±141%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>        0.01 ±128%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>        0.06 ± 31%   +1806.3%       1.16 ±127%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1.00 ±151%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>       25.45 ± 94%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        4.56 ± 67%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>        3.55 ± 97%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>        2.13 ± 67%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>        3.16 ± 78%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.30 ±159%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>        1.61 ±100%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.03 ± 86%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        0.20 ±182%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
>        3.51 ± 21%    -100.0%       0.00 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>        0.83 ±160%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>        0.09 ± 31%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        3.59 ± 11%     -99.6%       0.01 ±158%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
>        1.60 ± 69%     -99.9%       0.00 ±104%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
>        0.81 ± 43%     -99.8%       0.00 ±223%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        1.02 ± 88%    -100.0%       0.00        perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>        0.02 ±  7%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        9.68 ± 32%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        0.01 ± 49%     -92.3%       0.00 ±223%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
>       12.26 ±109%    -100.0%       0.00 ±223%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        5.60 ±139%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
>        0.03 ±106%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>        2.11 ± 61%     -99.6%       0.01 ±160%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>      171.77 ±217%     -99.7%       0.54 ±195%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>        3.67 ± 25%     -99.7%       0.01 ± 47%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>       37.84 ± 47%    -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        4.68 ± 36%    -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>        0.21 ±169%     -98.4%       0.00 ±145%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>        7.92 ±131%     -99.6%       0.03 ± 75%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.36 ±186%    -100.0%       0.00        perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>       33.45 ±  3%     -88.6%       3.82 ± 80%  perf-sched.total_wait_and_delay.average.ms
>       97903 ±  4%     -98.0%       1998 ± 22%  perf-sched.total_wait_and_delay.count.ms
>        2942 ± 23%     -96.3%     109.30 ± 43%  perf-sched.total_wait_and_delay.max.ms
>       33.37 ±  3%     -88.9%       3.71 ± 83%  perf-sched.total_wait_time.average.ms
>        2942 ± 23%     -97.2%      81.62 ± 52%  perf-sched.total_wait_time.max.ms
>        3.97 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        3.08 ±  4%     -96.4%       0.11 ± 94%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>      119.91 ± 38%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      433.73 ± 41%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      302.41 ±  5%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1.48 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>       23.24 ± 25%     -95.7%       1.01 ± 23%  perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>      327.16 ±  9%     -97.5%       8.12 ±202%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      369.37 ±  2%     -96.6%      12.56 ± 89%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>        0.96 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>      453.60          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>      187.66           -95.3%       8.75 ± 90%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      750.07           -99.0%       7.40 ± 73%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1831 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        1269 ±  8%     -43.3%     719.33 ± 26%  perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>        6.17 ± 45%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        5.00          -100.0%       0.00        perf-sched.wait_and_delay.count.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       14.33 ±  5%    -100.0%       0.00        perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>      810.00 ± 10%    -100.0%       0.00        perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        3112 ± 24%     -96.8%     100.67 ± 72%  perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
>       40.50 ±  8%     -97.5%       1.00 ±100%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>       13.17 ±  2%     -44.3%       7.33 ± 28%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       73021 ±  3%    -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>       40.00          -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.kcompactd.kthread.ret_from_fork
>        1122           -98.5%      16.33 ± 78%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>       11323 ±  3%     -93.3%     756.17 ± 25%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        1887 ± 45%     -99.9%       2.33 ±117%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        1238           -93.4%      81.50 ± 64%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       35.19 ± 57%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        1002           -96.9%      31.26 ± 97%  perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>      318.48 ± 65%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        1000          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      966.90 ±  7%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>       20.79 ± 19%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        1043           -97.6%      24.88 ±123%  perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>        1240 ± 20%     -98.7%      16.23 ±202%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      500.34           -90.4%      47.79 ± 94%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       58.83 ± 39%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>      505.17          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>       19.77 ± 55%     -68.0%       6.33 ± 54%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        1237 ± 34%     -93.3%      83.40 ± 33%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        1001           -97.3%      27.51 ±141%  perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        2794 ± 24%     -97.4%      73.62 ± 55%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       49.27 ±119%    -100.0%       0.00 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
>       58.17 ±187%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>        3.78 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        2.99 ±  4%     -98.1%       0.06 ± 95%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>        3.92 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>        4.71 ±  8%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>        1.67 ± 20%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>        2.10 ± 27%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.01 ± 44%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>        1.67 ± 21%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.04 ±133%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>       67.14 ± 73%     -96.0%       2.67 ±208%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        1.65 ± 67%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
>        2.30 ± 14%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>       42.44 ±200%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>      119.87 ± 38%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        3.80 ± 18%     -99.7%       0.01 ±144%  perf-sched.wait_time.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
>      433.32 ± 41%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      250.23 ±107%    -100.0%       0.00        perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>       29.19 ±  5%     -99.0%       0.30 ± 28%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
>      302.40 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1.40 ±  6%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        4.03 ±  8%     -96.6%       0.14 ±223%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       35.38 ±192%     -99.9%       0.05 ±223%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>        0.05 ± 40%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
>        0.72 ±220%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>        1.00 ±120%     -98.0%       0.02 ±193%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>       23.07 ± 24%     -95.7%       1.00 ± 23%  perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>      326.84 ±  9%     -97.5%       8.14 ±201%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      369.18 ±  2%     -98.0%       7.39 ±103%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>        0.89 ±  6%    -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        1.17 ± 16%    -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>      453.58          -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>        4.42           -27.8%       3.19 ± 26%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>      187.58           -95.4%       8.69 ± 91%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        0.01 ±156%    -100.0%       0.00        perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>      750.01           -99.2%       6.24 ± 99%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      340.69 ±135%    -100.0%       0.00 ±223%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
>      535.09 ±128%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>       22.04 ± 32%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        1001           -98.4%      15.63 ± 97%  perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>       13.57 ± 17%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>       13.54 ± 10%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>       10.17 ± 19%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>       11.35 ± 25%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.01 ± 32%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>       10.62 ±  9%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.20 ±199%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        1559 ± 64%     -99.8%       2.67 ±208%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        6.93 ± 53%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
>       14.42 ± 22%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>      159.10 ±148%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>      391.02 ±171%     -99.3%       2.80 ±223%  perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>      318.43 ± 65%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       13.14 ± 21%     -99.9%       0.01 ±158%  perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
>        1000          -100.0%       0.00        perf-sched.wait_time.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      500.84 ± 99%    -100.0%       0.00        perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>      641.50 ± 23%     -99.0%       6.41 ± 48%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
>       10.75 ± 98%     -93.5%       0.70 ±  9%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      966.89 ±  7%    -100.0%       0.00        perf-sched.wait_time.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>       15.80 ±  8%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>       16.69 ± 10%     -99.2%       0.14 ±223%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       41.71 ±158%     -99.9%       0.05 ±223%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>       11.64 ± 61%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
>        2.94 ±213%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>      175.70 ±210%    -100.0%       0.06 ±213%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>        1043           -97.6%      24.88 ±123%  perf-sched.wait_time.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>        1240 ± 20%     -98.7%      16.28 ±201%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      500.11           -94.3%      28.64 ±118%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       32.65 ± 33%    -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>       22.94 ± 56%    -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>      505.00          -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>       12.20 ± 43%     -60.5%       4.82 ±  7%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        1237 ± 34%     -94.0%      74.19 ± 53%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        1000           -97.2%      27.51 ±141%  perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.36 ±190%    -100.0%       0.00        perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>        2794 ± 24%     -98.0%      56.88 ± 94%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-01-31 18:38   ` Yang Shi
@ 2025-02-06  8:02     ` Oliver Sang
  2025-02-07 18:10       ` Yang Shi
  0 siblings, 1 reply; 35+ messages in thread
From: Oliver Sang @ 2025-02-06  8:02 UTC (permalink / raw)
  To: Yang Shi
  Cc: oe-lkp, lkp, linux-kernel, arnd, gregkh, Liam.Howlett,
	lorenzo.stoakes, vbabka, jannh, willy, liushixin2, akpm,
	linux-mm, oliver.sang

hi, Yang Shi,

On Fri, Jan 31, 2025 at 10:38:03AM -0800, Yang Shi wrote:
> 
> 
> 
> On 1/27/25 7:14 PM, kernel test robot wrote:
> > hi, All,
> > 
> > we don't have enough knowledge to understand fully the discussion for this
> > patch, we saw "NACK" but there were more discussions later.
> > so below report is just FYI what we observed in our tests. thanks
> 
> Thanks for the report. It was nack'ed because of the change to smaps/maps
> files in proc.
> 
> > 
> > Hello,
> > 
> > kernel test robot noticed a 858.5% improvement of vm-scalability.throughput on:
> > 
> > 
> > commit: 7143ee2391f1ea15e6791e129870473543634de2 ("[PATCH] /dev/zero: make private mapping full anonymous mapping")
> > url: https://github.com/intel-lab-lkp/linux/commits/Yang-Shi/dev-zero-make-private-mapping-full-anonymous-mapping/20250114-063339
> > base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git a68d3cbfade64392507302f3a920113b60dc811f
> > patch link: https://lore.kernel.org/all/20250113223033.4054534-1-yang@os.amperecomputing.com/
> > patch subject: [PATCH] /dev/zero: make private mapping full anonymous mapping
> > 
> > testcase: vm-scalability
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> > parameters:
> > 
> > 	runtime: 300s
> > 	test: small-allocs
> 
> It seems this benchmark allocates huge amount of small areas (each area is
> as big as 40K) by mmap'ing /dev/zero.
> 
> This patch makes /dev/zero mapping a full anonymous mapping, so the later
> vma_link_file() is actually skipped, which needs acquire file rmap lock then
> insert the mapping into file rmap tree. The below profiling also showed
> this.
> Quoted here so that we don't have to scroll down:
> 
> >       95.60           -95.2        0.42 ±113%  perf-profile.children.cycles-pp.__mmap
> >       94.14           -93.6        0.54 ±106%  perf-profile.children.cycles-pp.__mmap_new_vma
> >       93.79           -93.6        0.21 ±171%  perf-profile.children.cycles-pp.vma_link_file
> >       93.40           -93.4        0.00        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
> >       93.33           -93.3        0.00        perf-profile.children.cycles-pp.rwsem_optimistic_spin
> >       93.44           -93.2        0.24 ±178%  perf-profile.children.cycles-pp.down_write
> >       94.55           -93.1        1.40 ± 51%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
> >       94.25           -93.0        1.30 ± 59%  perf-profile.children.cycles-pp.__mmap_region
> >       92.91           -92.9        0.00        perf-profile.children.cycles-pp.osq_lock
> >       94.45           -92.7        1.72 ± 34%  perf-profile.children.cycles-pp.do_mmap
> >       94.46           -92.6        1.83 ± 31%  perf-profile.children.cycles-pp.vm_mmap_pgoff
> 
> It significantly speed up mmap for this benchmark and the rmap lock
> contention is reduced significantly for both multi-processes and
> multi-threads.
> 
> The benchmark itself may exaggerate the improvement, but it may really speed
> up some real life workloads. For example, multiple applications which may
> allocate anonymous mapping by mmap'ing /dev/zero, then they may have
> contention on /dev/zero's rmap lock.
> 
> It doesn't make too much sense to link /dev/zero anonymous vmas to the file
> rmap tree. So the below patch should be able to speed up the benchmark too.

sorry for late and thanks a lot for information!

> 
> Oliver, can you please give this patch a try?

it seems this is an alternative patch?
since we applied your "/dev/zero: make private mapping full anonymous mapping"
patch upon a68d3cbfad like below:

* 7143ee2391f1e /dev/zero: make private mapping full anonymous mapping
* a68d3cbfade64 memstick: core: fix kernel-doc notation

so I applied below patch also upon a68d3cbfad.

we saw big improvement but not that big.

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability

commit: 
  a68d3cbfad ("memstick: core: fix kernel-doc notation")
  52ec85cb99  <--- your patch


a68d3cbfade64392 52ec85cb99e9b31dc304eae965a
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
  14364828 ±  4%    +410.6%   73349239 ±  3%  vm-scalability.throughput

full comparison as below [1] just FYI.

> 
> 
> diff --git a/mm/vma.c b/mm/vma.c
> index bb2119e5a0d0..1092222c40ae 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -1633,6 +1633,9 @@ static void unlink_file_vma_batch_process(struct
> unlink_vma_file_batch *vb)
>  void unlink_file_vma_batch_add(struct unlink_vma_file_batch *vb,
>                                struct vm_area_struct *vma)
>  {
> +       if (vma_is_anonymous(vma))
> +               return;
> +
>         if (vma->vm_file == NULL)
>                 return;
> 
> @@ -1658,6 +1661,9 @@ void unlink_file_vma(struct vm_area_struct *vma)
>  {
>         struct file *file = vma->vm_file;
> 
> +       if (vma_is_anonymous(vma))
> +               return;
> +
>         if (file) {
>                 struct address_space *mapping = file->f_mapping;
> 
> @@ -1672,6 +1678,9 @@ void vma_link_file(struct vm_area_struct *vma)
>         struct file *file = vma->vm_file;
>         struct address_space *mapping;
> 
> +       if (vma_is_anonymous(vma))
> +               return;
> +
>         if (file) {
>                 mapping = file->f_mapping;
>                 i_mmap_lock_write(mapping);
> 
> 
> Because /dev/zero's private mapping is an anonymous mapping with valid
> vm_file, so we need to bail out early if the vma is anonymous even though it
> has vm_file. IMHO, making /dev/zero private mapping a full anonymous mapping
> looks more clean.
> 

[1]
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability

commit: 
  a68d3cbfad ("memstick: core: fix kernel-doc notation")
  52ec85cb99  <--- your patch


a68d3cbfade64392 52ec85cb99e9b31dc304eae965a
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
 5.262e+09 ±  3%     -45.0%  2.896e+09 ±  6%  cpuidle..time
   7924008 ±  3%     -79.3%    1643339 ± 11%  cpuidle..usage
   1871164 ±  4%     -22.4%    1452554 ± 12%  numa-numastat.node3.local_node
   1952164 ±  3%     -20.1%    1560294 ± 12%  numa-numastat.node3.numa_hit
    399.52           -68.2%     126.86        uptime.boot
     14507           -15.7%      12232        uptime.idle
      6.99 ±  3%    +147.9%      17.34 ±  4%  vmstat.cpu.id
      1.71          +473.6%       9.79 ±  2%  vmstat.cpu.us
     34204 ±  5%     -72.9%       9272 ±  7%  vmstat.system.cs
    266575           -21.2%     210191        vmstat.system.in
      3408 ±  5%     -99.8%       8.38 ± 48%  perf-c2c.DRAM.local
     18076 ±  3%     -99.8%      32.25 ± 27%  perf-c2c.DRAM.remote
      8082 ±  5%     -99.8%      15.50 ± 64%  perf-c2c.HITM.local
      6544 ±  6%     -99.8%      13.62 ± 51%  perf-c2c.HITM.remote
     14627 ±  4%     -99.8%      29.12 ± 53%  perf-c2c.HITM.total
      6.49 ±  3%      +8.8       15.24 ±  5%  mpstat.cpu.all.idle%
      0.63            -0.3        0.32 ±  4%  mpstat.cpu.all.irq%
      0.03 ±  2%      +0.2        0.26 ±  2%  mpstat.cpu.all.soft%
     91.17           -17.0       74.15        mpstat.cpu.all.sys%
      1.68 ±  2%      +8.3       10.03 ±  2%  mpstat.cpu.all.usr%
    337.33           -97.4%       8.88 ± 75%  mpstat.max_utilization.seconds
    352.76           -77.3%      79.95 ±  2%  time.elapsed_time
    352.76           -77.3%      79.95 ±  2%  time.elapsed_time.max
    225965 ±  7%     -16.0%     189844 ±  6%  time.involuntary_context_switches
 9.592e+08 ±  4%     +11.9%  1.074e+09        time.minor_page_faults
     20852            -8.8%      19012        time.percent_of_cpu_this_job_got
     72302           -81.4%      13425 ±  3%  time.system_time
      1260 ±  3%     +41.0%       1777        time.user_time
   5393707 ±  5%     -98.4%      86880 ± 17%  time.voluntary_context_switches
   1609925           -50.3%     800493        meminfo.Active
   1609925           -50.3%     800493        meminfo.Active(anon)
    160837 ± 33%     -63.9%      58119 ± 13%  meminfo.AnonHugePages
   4435665           -18.5%    3614714        meminfo.Cached
   1775547           -43.8%     998415        meminfo.Committed_AS
    148539           -43.7%      83699 ±  4%  meminfo.Mapped
   4245538 ±  4%     -20.9%    3356561        meminfo.PageTables
  14166291 ±  4%      -9.6%   12806082        meminfo.SUnreclaim
    929777           -88.2%     109274 ±  3%  meminfo.Shmem
  14315492 ±  4%      -9.6%   12947821        meminfo.Slab
     64129 ±  4%    +418.9%     332751 ±  3%  vm-scalability.median
     45.40 ±  5%   +1961.8        2007 ±  8%  vm-scalability.stddev%
  14364828 ±  4%    +410.6%   73349239 ±  3%  vm-scalability.throughput
    352.76           -77.3%      79.95 ±  2%  vm-scalability.time.elapsed_time
    352.76           -77.3%      79.95 ±  2%  vm-scalability.time.elapsed_time.max
    225965 ±  7%     -16.0%     189844 ±  6%  vm-scalability.time.involuntary_context_switches
 9.592e+08 ±  4%     +11.9%  1.074e+09        vm-scalability.time.minor_page_faults
     20852            -8.8%      19012        vm-scalability.time.percent_of_cpu_this_job_got
     72302           -81.4%      13425 ±  3%  vm-scalability.time.system_time
      1260 ±  3%     +41.0%       1777        vm-scalability.time.user_time
   5393707 ±  5%     -98.4%      86880 ± 17%  vm-scalability.time.voluntary_context_switches
 4.316e+09 ±  4%     +11.9%  4.832e+09        vm-scalability.workload
    265763 ±  4%     -20.5%     211398 ±  4%  numa-vmstat.node0.nr_page_table_pages
     31364 ±106%     -85.0%       4690 ±169%  numa-vmstat.node0.nr_shmem
     12205 ± 67%     -74.1%       3161 ±199%  numa-vmstat.node1.nr_mapped
    265546 ±  4%     -21.8%     207742 ±  4%  numa-vmstat.node1.nr_page_table_pages
     44052 ± 71%     -86.0%       6163 ±161%  numa-vmstat.node1.nr_shmem
    885590 ±  4%      -9.9%     797649 ±  4%  numa-vmstat.node1.nr_slab_unreclaimable
    264589 ±  4%     -21.2%     208598 ±  4%  numa-vmstat.node2.nr_page_table_pages
    881598 ±  4%     -10.0%     793829 ±  4%  numa-vmstat.node2.nr_slab_unreclaimable
    192683 ± 30%     -61.0%      75078 ± 70%  numa-vmstat.node3.nr_active_anon
    286819 ±108%     -93.0%      19993 ± 39%  numa-vmstat.node3.nr_file_pages
     13124 ± 49%     -92.3%       1006 ± 57%  numa-vmstat.node3.nr_mapped
    264499 ±  4%     -22.1%     206135 ±  2%  numa-vmstat.node3.nr_page_table_pages
    139810 ± 14%     -90.5%      13229 ± 89%  numa-vmstat.node3.nr_shmem
    880199 ±  4%     -11.8%     776210 ±  5%  numa-vmstat.node3.nr_slab_unreclaimable
    192683 ± 30%     -61.0%      75077 ± 70%  numa-vmstat.node3.nr_zone_active_anon
   1951359 ±  3%     -20.1%    1558936 ± 12%  numa-vmstat.node3.numa_hit
   1870359 ±  4%     -22.4%    1451195 ± 12%  numa-vmstat.node3.numa_local
    402515           -50.3%     200150        proc-vmstat.nr_active_anon
    170568            +1.9%     173746        proc-vmstat.nr_anon_pages
   4257257            +0.9%    4296664        proc-vmstat.nr_dirty_background_threshold
   8524925            +0.9%    8603835        proc-vmstat.nr_dirty_threshold
   1109246           -18.5%     903959        proc-vmstat.nr_file_pages
  42815276            +0.9%   43210344        proc-vmstat.nr_free_pages
     37525           -43.6%      21164 ±  4%  proc-vmstat.nr_mapped
   1059932 ±  4%     -21.1%     836810        proc-vmstat.nr_page_table_pages
    232507           -88.2%      27341 ±  3%  proc-vmstat.nr_shmem
     37297            -5.0%      35436        proc-vmstat.nr_slab_reclaimable
   3537843 ±  4%      -9.8%    3192506        proc-vmstat.nr_slab_unreclaimable
    402515           -50.3%     200150        proc-vmstat.nr_zone_active_anon
     61931 ±  8%     -83.8%      10023 ± 45%  proc-vmstat.numa_hint_faults
     15755 ± 21%     -87.1%       2039 ± 97%  proc-vmstat.numa_hint_faults_local
   6916516 ±  3%      -7.1%    6425430        proc-vmstat.numa_hit
   6568542 ±  3%      -7.5%    6077764        proc-vmstat.numa_local
    293942 ±  3%     -69.6%      89435 ± 49%  proc-vmstat.numa_pte_updates
 9.608e+08 ±  4%     +11.8%  1.074e+09        proc-vmstat.pgfault
     55981 ±  2%     -63.1%      20641 ±  2%  proc-vmstat.pgreuse
   1063552 ±  4%     -20.3%     847673 ±  4%  numa-meminfo.node0.PageTables
   3565610 ±  4%      -8.0%    3279375 ±  3%  numa-meminfo.node0.SUnreclaim
    125455 ±106%     -85.2%      18620 ±168%  numa-meminfo.node0.Shmem
   3592377 ±  4%      -7.1%    3336072 ±  4%  numa-meminfo.node0.Slab
     48482 ± 67%     -74.3%      12475 ±199%  numa-meminfo.node1.Mapped
   1062709 ±  4%     -21.7%     831966 ±  4%  numa-meminfo.node1.PageTables
   3543793 ±  4%     -10.0%    3189589 ±  4%  numa-meminfo.node1.SUnreclaim
    176171 ± 71%     -86.0%      24677 ±161%  numa-meminfo.node1.Shmem
   3593431 ±  4%     -10.4%    3220352 ±  4%  numa-meminfo.node1.Slab
   1058901 ±  4%     -21.3%     833124 ±  4%  numa-meminfo.node2.PageTables
   3527862 ±  4%     -10.2%    3168666 ±  5%  numa-meminfo.node2.SUnreclaim
   3565750 ±  4%     -10.3%    3200248 ±  5%  numa-meminfo.node2.Slab
    770405 ± 30%     -61.0%     300435 ± 70%  numa-meminfo.node3.Active
    770405 ± 30%     -61.0%     300435 ± 70%  numa-meminfo.node3.Active(anon)
   1146977 ±108%     -93.0%      80110 ± 40%  numa-meminfo.node3.FilePages
     52663 ± 47%     -91.6%       4397 ± 56%  numa-meminfo.node3.Mapped
   6368902 ± 20%     -21.2%    5021246 ±  2%  numa-meminfo.node3.MemUsed
   1058539 ±  4%     -22.2%     823061 ±  3%  numa-meminfo.node3.PageTables
   3522496 ±  4%     -12.1%    3096728 ±  6%  numa-meminfo.node3.SUnreclaim
    558943 ± 14%     -90.5%      53054 ± 89%  numa-meminfo.node3.Shmem
   3557392 ±  4%     -12.3%    3119454 ±  6%  numa-meminfo.node3.Slab
      0.82 ±  4%     -39.7%       0.50 ± 12%  perf-stat.i.MPKI
 2.714e+10 ±  2%    +185.7%  7.755e+10 ±  6%  perf-stat.i.branch-instructions
      0.11 ±  3%      +0.1        0.20 ±  5%  perf-stat.i.branch-miss-rate%
  24932893          +156.6%   63980942 ±  5%  perf-stat.i.branch-misses
     64.93           -10.1       54.87 ±  2%  perf-stat.i.cache-miss-rate%
     34508 ±  4%     -61.4%      13315 ± 10%  perf-stat.i.context-switches
      7.67           -63.7%       2.79 ±  6%  perf-stat.i.cpi
    224605           +10.8%     248972 ±  4%  perf-stat.i.cpu-clock
    696.35 ±  2%     -57.4%     296.79 ±  3%  perf-stat.i.cpu-migrations
 1.102e+11          +128.5%  2.518e+11 ±  6%  perf-stat.i.instructions
      0.14          +198.2%       0.42 ±  5%  perf-stat.i.ipc
     24.25 ±  3%    +375.8%     115.36 ±  3%  perf-stat.i.metric.K/sec
   2722043 ±  3%    +439.7%   14690226 ±  6%  perf-stat.i.minor-faults
   2722043 ±  3%    +439.7%   14690226 ±  6%  perf-stat.i.page-faults
    224605           +10.8%     248972 ±  4%  perf-stat.i.task-clock
      0.81 ±  3%     -52.5%       0.39 ± 14%  perf-stat.overall.MPKI
      0.09            -0.0        0.08 ±  2%  perf-stat.overall.branch-miss-rate%
     64.81            -6.4       58.40        perf-stat.overall.cache-miss-rate%
      7.24           -56.3%       3.17 ±  3%  perf-stat.overall.cpi
      0.14          +129.0%       0.32 ±  3%  perf-stat.overall.ipc
      9012 ±  2%     -57.5%       3827        perf-stat.overall.path-length
 2.701e+10 ±  2%    +159.6%  7.012e+10 ±  2%  perf-stat.ps.branch-instructions
  24708939          +119.2%   54173035        perf-stat.ps.branch-misses
     34266 ±  5%     -73.9%       8949 ±  7%  perf-stat.ps.context-switches
 7.941e+11            -9.1%  7.219e+11        perf-stat.ps.cpu-cycles
    693.54 ±  2%     -68.6%     217.73 ±  5%  perf-stat.ps.cpu-migrations
 1.097e+11          +108.1%  2.282e+11 ±  2%  perf-stat.ps.instructions
   2710577 ±  3%    +388.7%   13246535 ±  2%  perf-stat.ps.minor-faults
   2710577 ±  3%    +388.7%   13246536 ±  2%  perf-stat.ps.page-faults
 3.886e+13 ±  2%     -52.4%  1.849e+13        perf-stat.total.instructions
  64052898 ±  5%     -96.2%    2460331 ±166%  sched_debug.cfs_rq:/.avg_vruntime.avg
  95701822 ±  7%     -85.1%   14268127 ±116%  sched_debug.cfs_rq:/.avg_vruntime.max
  43098762 ±  6%     -96.0%    1715136 ±173%  sched_debug.cfs_rq:/.avg_vruntime.min
   9223270 ±  9%     -84.2%    1457904 ±122%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.78 ±  2%     -77.0%       0.18 ±130%  sched_debug.cfs_rq:/.h_nr_running.avg
  43049468 ± 22%     -89.3%    4590302 ±180%  sched_debug.cfs_rq:/.left_deadline.max
   3836405 ± 37%     -85.6%     550773 ±176%  sched_debug.cfs_rq:/.left_deadline.stddev
  43049467 ± 22%     -89.3%    4590279 ±180%  sched_debug.cfs_rq:/.left_vruntime.max
   3836405 ± 37%     -85.6%     550772 ±176%  sched_debug.cfs_rq:/.left_vruntime.stddev
  64052901 ±  5%     -96.2%    2460341 ±166%  sched_debug.cfs_rq:/.min_vruntime.avg
  95701822 ±  7%     -85.1%   14268127 ±116%  sched_debug.cfs_rq:/.min_vruntime.max
  43098762 ±  6%     -96.0%    1715136 ±173%  sched_debug.cfs_rq:/.min_vruntime.min
   9223270 ±  9%     -84.2%    1457902 ±122%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.77 ±  2%     -77.4%       0.17 ±128%  sched_debug.cfs_rq:/.nr_running.avg
      1.61 ± 24%    +396.0%       7.96 ± 62%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
     86.69          +424.4%     454.62 ± 24%  sched_debug.cfs_rq:/.removed.runnable_avg.max
     11.14 ± 13%    +409.8%      56.79 ± 35%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
      1.61 ± 24%    +396.0%       7.96 ± 62%  sched_debug.cfs_rq:/.removed.util_avg.avg
     86.69          +424.4%     454.62 ± 24%  sched_debug.cfs_rq:/.removed.util_avg.max
     11.14 ± 13%    +409.8%      56.79 ± 35%  sched_debug.cfs_rq:/.removed.util_avg.stddev
  43049467 ± 22%     -89.3%    4590282 ±180%  sched_debug.cfs_rq:/.right_vruntime.max
   3836405 ± 37%     -85.6%     550772 ±176%  sched_debug.cfs_rq:/.right_vruntime.stddev
    286633 ± 43%    +262.3%    1038592 ± 36%  sched_debug.cfs_rq:/.runnable_avg.avg
  34728895 ± 30%    +349.2%   1.56e+08 ± 26%  sched_debug.cfs_rq:/.runnable_avg.max
   2845573 ± 30%    +325.9%   12119045 ± 26%  sched_debug.cfs_rq:/.runnable_avg.stddev
    769.03           -69.9%     231.86 ± 84%  sched_debug.cfs_rq:/.util_avg.avg
      1621 ±  5%     -31.5%       1111 ±  8%  sched_debug.cfs_rq:/.util_avg.max
    724.17 ±  2%     -89.6%      75.66 ±147%  sched_debug.cfs_rq:/.util_est.avg
      1360 ± 15%     -39.2%     826.88 ± 37%  sched_debug.cfs_rq:/.util_est.max
    766944 ±  3%     +18.1%     905901        sched_debug.cpu.avg_idle.avg
    321459 ±  2%     -35.6%     207172 ± 10%  sched_debug.cpu.avg_idle.stddev
    195573           -72.7%      53401 ± 24%  sched_debug.cpu.clock.avg
    195596           -72.7%      53442 ± 24%  sched_debug.cpu.clock.max
    195548           -72.7%      53352 ± 24%  sched_debug.cpu.clock.min
    194424           -72.6%      53229 ± 24%  sched_debug.cpu.clock_task.avg
    194608           -72.6%      53383 ± 24%  sched_debug.cpu.clock_task.max
    181834           -77.5%      40964 ± 31%  sched_debug.cpu.clock_task.min
      4241 ±  2%     -80.6%     821.65 ±142%  sched_debug.cpu.curr->pid.avg
      9799 ±  2%     -55.4%       4365 ± 17%  sched_debug.cpu.curr->pid.max
      1365 ± 10%     -48.0%     709.44 ±  5%  sched_debug.cpu.curr->pid.stddev
    537665 ±  4%     +31.2%     705318 ± 14%  sched_debug.cpu.max_idle_balance_cost.max
      3119 ± 56%    +579.1%      21184 ± 39%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.78 ±  2%     -76.3%       0.18 ±135%  sched_debug.cpu.nr_running.avg
     25773 ±  5%     -96.1%       1007 ± 41%  sched_debug.cpu.nr_switches.avg
     48669 ± 10%     -76.5%      11448 ± 13%  sched_debug.cpu.nr_switches.max
     19006 ±  7%     -98.6%     258.81 ± 64%  sched_debug.cpu.nr_switches.min
      4142 ±  8%     -66.3%       1396 ± 17%  sched_debug.cpu.nr_switches.stddev
      0.07 ± 23%     -92.9%       0.01 ± 41%  sched_debug.cpu.nr_uninterruptible.avg
    240.19 ± 16%     -82.1%      42.94 ± 41%  sched_debug.cpu.nr_uninterruptible.max
    -77.92           -88.1%      -9.25        sched_debug.cpu.nr_uninterruptible.min
     37.87 ±  5%     -85.8%       5.36 ± 13%  sched_debug.cpu.nr_uninterruptible.stddev
    195549           -72.7%      53356 ± 24%  sched_debug.cpu_clk
    194699           -73.0%      52506 ± 25%  sched_debug.ktime
      0.00          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.avg
      0.17          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.max
      0.01          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.stddev
    196368           -72.4%      54191 ± 24%  sched_debug.sched_clk
      0.17 ±142%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
      0.19 ± 34%     -51.3%       0.09 ± 37%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      0.14 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      0.14 ± 73%     -82.5%       0.03 ±168%  perf-sched.sch_delay.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
      0.11 ± 59%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.04 ±132%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.02 ± 31%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.00 ±223%  +51950.0%       0.26 ±212%  perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
      0.25 ± 59%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.12 ±145%     -99.1%       0.00 ±141%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      0.25 ± 41%     -81.6%       0.05 ± 69%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      0.11 ± 59%     -87.1%       0.01 ±198%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.40 ± 50%     -97.8%       0.01 ± 30%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2.25 ±138%     -99.6%       0.01 ±  7%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      0.32 ±104%     -97.3%       0.01 ± 38%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.01 ± 12%     -34.9%       0.01 ± 18%  perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      0.19 ±185%     -95.6%       0.01 ± 44%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.07 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      0.26 ± 17%     -98.8%       0.00 ± 10%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.03 ± 51%     -69.7%       0.01 ± 67%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.01 ± 55%    +721.9%       0.10 ± 29%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ±128%     -83.6%       0.00 ± 20%  perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      0.06 ± 31%   +1921.5%       1.23 ±165%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.00 ±151%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
     25.45 ± 94%     -98.6%       0.36 ± 61%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      4.56 ± 67%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      3.55 ± 97%     -98.9%       0.04 ±189%  perf-sched.sch_delay.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
      3.16 ± 78%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.30 ±159%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.03 ± 86%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.00 ±223%  +3.2e+06%      15.79 ±259%  perf-sched.sch_delay.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
      3.09 ± 45%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      3.51 ± 21%     -86.1%       0.49 ± 72%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      3.59 ± 11%     -92.0%       0.29 ±165%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      1.60 ± 69%     -95.7%       0.07 ±243%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.81 ± 43%     -98.5%       0.01 ± 43%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      1.02 ± 88%     -98.1%       0.02 ± 47%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      9.68 ± 32%     -92.2%       0.76 ± 72%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     12.26 ±109%     -92.9%       0.87 ±101%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.03 ±106%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
     37.84 ± 47%    -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      4.68 ± 36%     -99.8%       0.01 ± 65%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.36 ±186%     -96.3%       0.01 ± 90%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     97903 ±  4%     -38.3%      60433 ± 29%  perf-sched.total_wait_and_delay.count.ms
      3.97 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
    302.41 ±  5%     -27.4%     219.54 ± 14%  perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.48 ±  6%     -90.9%       0.14 ± 79%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
    327.16 ±  9%     -46.6%     174.81 ± 24%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    369.37 ±  2%     -75.3%      91.05 ± 35%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.96 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
    187.66          +120.6%     413.97 ± 14%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1831 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      6.17 ± 45%     -79.7%       1.25 ±142%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     40.50 ±  8%    +245.7%     140.00 ± 23%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     13.17 ±  2%    +624.4%      95.38 ± 19%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     73021 ±  3%    -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
     11323 ±  3%     -75.9%       2725 ± 28%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1887 ± 45%     -96.1%      73.88 ± 78%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      1238           -34.5%     811.25 ± 13%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     35.19 ± 57%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
     20.79 ± 19%     -95.9%       0.84 ± 93%  perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      1240 ± 20%     -14.4%       1062 ± 10%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    500.34           +31.2%     656.38 ± 39%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     58.83 ± 39%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      1237 ± 34%    +151.7%       3114 ± 25%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     49.27 ±119%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
     58.17 ±187%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
      3.78 ±  5%     -97.6%       0.09 ± 37%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      2.99 ±  4%     +15.4%       3.45 ± 10%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      3.92 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      4.71 ±  8%     -99.5%       0.02 ±170%  perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
      1.67 ± 20%     -92.7%       0.12 ± 30%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
      2.10 ± 27%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.01 ± 44%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      1.67 ± 21%     -94.3%       0.10 ± 35%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.04 ±133%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      2.30 ± 14%     -95.5%       0.10 ± 42%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      2.00 ± 74%   +2917.4%      60.44 ± 33%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
     29.19 ±  5%     -38.5%      17.96 ± 28%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      0.37 ± 30%   +5524.5%      20.95 ± 30%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    302.40 ±  5%     -27.4%     219.53 ± 14%  perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.40 ±  6%     -92.7%       0.10 ± 18%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.72 ±220%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
    326.84 ±  9%     -46.6%     174.54 ± 24%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    369.18 ±  2%     -75.3%      91.04 ± 35%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.89 ±  6%    -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
    187.58          +120.6%     413.77 ± 14%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      2.36 ± 29%   +1759.6%      43.80 ± 33%  perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ±156%     -97.9%       0.00 ±264%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
    340.69 ±135%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
    535.09 ±128%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
     22.04 ± 32%     -98.4%       0.36 ± 61%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
     13.57 ± 17%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
     13.54 ± 10%     -99.7%       0.04 ±189%  perf-sched.wait_time.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
     10.17 ± 19%     -95.2%       0.49 ± 56%  perf-sched.wait_time.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
     11.35 ± 25%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.01 ± 32%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
     10.62 ±  9%     -96.5%       0.38 ± 72%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.20 ±199%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
     14.42 ± 22%     -96.6%       0.49 ± 72%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      4.00 ± 74%  +19182.5%     772.23 ± 40%  perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
     10.75 ± 98%   +6512.2%     710.88 ± 56%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
     15.80 ±  8%     -95.2%       0.76 ± 72%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     11.64 ± 61%     -98.9%       0.13 ±132%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      2.94 ±213%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      1240 ± 20%     -14.3%       1062 ± 10%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    500.11           +31.2%     656.37 ± 39%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     32.65 ± 33%    -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      1237 ± 34%    +151.6%       3113 ± 25%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     95.59           -95.6        0.00        perf-profile.calltrace.cycles-pp.__mmap
     95.54           -95.5        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     95.54           -95.5        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
     94.54           -94.5        0.00        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     94.46           -94.0        0.41 ±138%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     94.14           -93.7        0.40 ±136%  perf-profile.calltrace.cycles-pp.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
     93.79           -93.5        0.31 ±134%  perf-profile.calltrace.cycles-pp.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
     93.40           -93.4        0.00        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma.__mmap_region
     93.33           -93.3        0.00        perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma
     93.44           -93.3        0.14 ±264%  perf-profile.calltrace.cycles-pp.down_write.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap
     94.45           -93.0        1.42 ± 60%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     94.25           -92.9        1.33 ± 61%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
     92.89           -92.9        0.00        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file
      0.00            +1.1        1.09 ± 33%  perf-profile.calltrace.cycles-pp.dup_mmap.dup_mm.copy_process.kernel_clone.__do_sys_clone
      0.00            +1.4        1.37 ± 49%  perf-profile.calltrace.cycles-pp.setlocale
      0.00            +1.6        1.64 ± 47%  perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry
      0.00            +1.6        1.64 ± 47%  perf-profile.calltrace.cycles-pp.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
      0.00            +1.6        1.65 ± 43%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.8        1.76 ± 44%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.9        1.93 ± 26%  perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
      0.00            +2.2        2.16 ± 44%  perf-profile.calltrace.cycles-pp.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      0.00            +2.2        2.23 ± 33%  perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +2.4        2.37 ± 36%  perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      0.00            +2.5        2.48 ± 32%  perf-profile.calltrace.cycles-pp.get_cpu_sleep_time_us.get_idle_time.uptime_proc_show.seq_read_iter.vfs_read
      0.00            +2.5        2.50 ± 45%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00            +2.5        2.54 ± 47%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
      0.00            +2.5        2.54 ± 47%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      0.00            +2.6        2.62 ± 35%  perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +2.6        2.62 ± 35%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +2.6        2.62 ± 35%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +2.6        2.62 ± 35%  perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +2.7        2.68 ± 35%  perf-profile.calltrace.cycles-pp.get_idle_time.uptime_proc_show.seq_read_iter.vfs_read.ksys_read
      0.00            +2.8        2.77 ± 33%  perf-profile.calltrace.cycles-pp.uptime_proc_show.seq_read_iter.vfs_read.ksys_read.do_syscall_64
      0.00            +2.8        2.82 ± 32%  perf-profile.calltrace.cycles-pp._Fork
      0.00            +2.8        2.84 ± 45%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00            +2.8        2.84 ± 45%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
      0.00            +2.9        2.89 ± 39%  perf-profile.calltrace.cycles-pp.event_function_call.perf_event_release_kernel.perf_release.__fput.task_work_run
      0.00            +2.9        2.89 ± 39%  perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_event_release_kernel.perf_release.__fput
      0.00            +3.1        3.10 ± 64%  perf-profile.calltrace.cycles-pp.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.1        3.10 ± 64%  perf-profile.calltrace.cycles-pp.seq_read_iter.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64
      0.00            +3.1        3.13 ± 33%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault
      0.00            +3.2        3.18 ± 37%  perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.2        3.20 ± 28%  perf-profile.calltrace.cycles-pp.mutex_unlock.sw_perf_event_destroy._free_event.perf_event_release_kernel.perf_release
      0.00            +3.2        3.24 ± 39%  perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.2        3.24 ± 36%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.2        3.24 ± 36%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
      0.00            +3.2        3.24 ± 36%  perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.2        3.24 ± 36%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.8        3.85 ± 39%  perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
      0.00            +3.8        3.85 ± 39%  perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
      0.00            +3.8        3.85 ± 39%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
      0.00            +3.8        3.85 ± 39%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve
      0.00            +3.8        3.85 ± 39%  perf-profile.calltrace.cycles-pp.execve
      0.00            +4.0        4.04 ± 43%  perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +4.0        4.04 ± 43%  perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
      0.00            +4.1        4.10 ± 30%  perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
      0.00            +4.2        4.18 ± 31%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput
      0.00            +4.2        4.18 ± 31%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
      0.00            +4.2        4.20 ± 28%  perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
      0.00            +4.2        4.25 ± 65%  perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
      0.00            +4.3        4.27 ± 26%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00            +4.3        4.30 ± 22%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.get_signal
      0.00            +4.3        4.30 ± 22%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
      0.00            +4.5        4.46 ± 59%  perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +4.6        4.57 ± 58%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen
      0.00            +4.7        4.68 ± 55%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn
      0.00            +4.7        4.68 ± 55%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn.perf_mmap__push
      0.00            +4.7        4.68 ± 55%  perf-profile.calltrace.cycles-pp.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
      0.00            +4.7        4.68 ± 55%  perf-profile.calltrace.cycles-pp.write.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist
      0.00            +4.7        4.68 ± 55%  perf-profile.calltrace.cycles-pp.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record
      0.00            +4.9        4.90 ± 57%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.00            +4.9        4.92 ± 26%  perf-profile.calltrace.cycles-pp.sw_perf_event_destroy._free_event.perf_event_release_kernel.perf_release.__fput
      0.00            +5.0        4.99 ±100%  perf-profile.calltrace.cycles-pp.__intel_pmu_enable_all.perf_rotate_context.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt
      0.00            +5.0        4.99 ±100%  perf-profile.calltrace.cycles-pp.perf_rotate_context.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
      0.00            +5.1        5.08 ±102%  perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
      0.00            +5.1        5.14 ± 28%  perf-profile.calltrace.cycles-pp.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin
      0.00            +5.1        5.14 ± 28%  perf-profile.calltrace.cycles-pp.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin.handle_internal_command
      0.00            +5.4        5.43 ± 25%  perf-profile.calltrace.cycles-pp._free_event.perf_event_release_kernel.perf_release.__fput.task_work_run
      0.00            +5.8        5.82 ± 94%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry
      0.00            +5.8        5.82 ± 94%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
      0.00            +6.1        6.07 ± 90%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      0.00            +6.6        6.62 ± 24%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.run_builtin.handle_internal_command.main
      0.00            +6.6        6.62 ± 24%  perf-profile.calltrace.cycles-pp.cmd_record.run_builtin.handle_internal_command.main
      0.00            +6.8        6.76 ± 18%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
      0.00            +7.6        7.56 ± 76%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
      0.00            +8.0        8.03 ± 27%  perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.00            +8.0        8.03 ± 27%  perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.00            +8.0        8.05 ± 68%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
      0.00            +8.1        8.13 ± 28%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.00            +8.1        8.13 ± 28%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
      0.00            +8.1        8.13 ± 28%  perf-profile.calltrace.cycles-pp.read
      0.00            +9.1        9.05 ± 35%  perf-profile.calltrace.cycles-pp.handle_internal_command.main
      0.00            +9.1        9.05 ± 35%  perf-profile.calltrace.cycles-pp.main
      0.00            +9.1        9.05 ± 35%  perf-profile.calltrace.cycles-pp.run_builtin.handle_internal_command.main
      0.00            +9.3        9.26 ± 30%  perf-profile.calltrace.cycles-pp.perf_event_release_kernel.perf_release.__fput.task_work_run.do_exit
      0.00            +9.3        9.26 ± 30%  perf-profile.calltrace.cycles-pp.perf_release.__fput.task_work_run.do_exit.do_group_exit
      0.00           +10.1       10.14 ± 28%  perf-profile.calltrace.cycles-pp.__fput.task_work_run.do_exit.do_group_exit.get_signal
      0.00           +10.2       10.23 ± 27%  perf-profile.calltrace.cycles-pp.task_work_run.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
      0.00           +11.0       10.98 ± 55%  perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
      0.00           +20.6       20.64 ± 30%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00           +20.6       20.64 ± 30%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.21 ±  3%     +36.6       37.80 ± 12%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      1.21 ±  3%     +36.6       37.80 ± 12%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.22 ±  3%     +36.8       38.00 ± 13%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.22 ±  3%     +36.9       38.10 ± 13%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
      1.22 ±  3%     +36.9       38.10 ± 13%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
      1.21 ±  3%     +37.2       38.43 ± 11%  perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      1.21 ±  3%     +37.2       38.43 ± 11%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      1.21 ±  3%     +37.3       38.54 ± 12%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      1.22 ±  3%     +37.6       38.84 ± 12%  perf-profile.calltrace.cycles-pp.common_startup_64
      2.19 ±  3%     +53.9       56.10 ± 19%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
     95.60           -95.2        0.41 ±138%  perf-profile.children.cycles-pp.__mmap
     94.14           -93.7        0.49 ±130%  perf-profile.children.cycles-pp.__mmap_new_vma
     93.79           -93.5        0.31 ±134%  perf-profile.children.cycles-pp.vma_link_file
     93.40           -93.4        0.00        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
     93.33           -93.3        0.00        perf-profile.children.cycles-pp.rwsem_optimistic_spin
     94.55           -93.1        1.42 ± 60%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
     92.91           -92.9        0.00        perf-profile.children.cycles-pp.osq_lock
     93.44           -92.7        0.75 ±109%  perf-profile.children.cycles-pp.down_write
     94.46           -92.6        1.84 ± 34%  perf-profile.children.cycles-pp.vm_mmap_pgoff
     94.45           -92.6        1.84 ± 34%  perf-profile.children.cycles-pp.do_mmap
     94.25           -92.6        1.66 ± 37%  perf-profile.children.cycles-pp.__mmap_region
     95.58           -44.8       50.78 ± 11%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     95.58           -44.8       50.78 ± 11%  perf-profile.children.cycles-pp.do_syscall_64
      0.00            +1.1        1.09 ± 33%  perf-profile.children.cycles-pp.dup_mmap
      0.00            +1.4        1.37 ± 49%  perf-profile.children.cycles-pp.setlocale
      0.00            +1.9        1.93 ± 26%  perf-profile.children.cycles-pp.dup_mm
      0.03 ± 70%      +2.0        1.99 ± 36%  perf-profile.children.cycles-pp.handle_softirqs
      0.00            +2.0        1.99 ± 36%  perf-profile.children.cycles-pp.__irq_exit_rcu
      0.00            +2.0        2.02 ± 38%  perf-profile.children.cycles-pp.folios_put_refs
      0.00            +2.1        2.06 ± 52%  perf-profile.children.cycles-pp._raw_spin_lock
      0.00            +2.2        2.16 ± 44%  perf-profile.children.cycles-pp.do_pte_missing
      0.00            +2.2        2.21 ± 68%  perf-profile.children.cycles-pp.link_path_walk
      0.00            +2.2        2.23 ± 33%  perf-profile.children.cycles-pp.copy_process
      0.00            +2.3        2.30 ± 40%  perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
      0.00            +2.3        2.30 ± 40%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
      0.00            +2.3        2.34 ± 46%  perf-profile.children.cycles-pp.walk_component
      0.00            +2.4        2.37 ± 36%  perf-profile.children.cycles-pp.zap_present_ptes
      0.00            +2.5        2.48 ± 32%  perf-profile.children.cycles-pp.get_cpu_sleep_time_us
      0.00            +2.6        2.62 ± 35%  perf-profile.children.cycles-pp.__do_sys_clone
      0.00            +2.6        2.62 ± 35%  perf-profile.children.cycles-pp.kernel_clone
      0.00            +2.7        2.68 ± 35%  perf-profile.children.cycles-pp.get_idle_time
      0.00            +2.8        2.77 ± 33%  perf-profile.children.cycles-pp.uptime_proc_show
      0.00            +2.9        2.91 ± 32%  perf-profile.children.cycles-pp._Fork
      0.00            +3.1        3.10 ± 64%  perf-profile.children.cycles-pp.proc_reg_read_iter
      0.00            +3.2        3.24 ± 39%  perf-profile.children.cycles-pp.bprm_execve
      0.00            +3.2        3.24 ± 36%  perf-profile.children.cycles-pp.__x64_sys_exit_group
      0.00            +3.2        3.24 ± 36%  perf-profile.children.cycles-pp.x64_sys_call
      0.00            +3.8        3.85 ± 39%  perf-profile.children.cycles-pp.__x64_sys_execve
      0.00            +3.8        3.85 ± 39%  perf-profile.children.cycles-pp.do_execveat_common
      0.00            +3.8        3.85 ± 39%  perf-profile.children.cycles-pp.execve
      0.00            +4.0        3.99 ± 38%  perf-profile.children.cycles-pp.mutex_unlock
      0.00            +4.2        4.19 ± 31%  perf-profile.children.cycles-pp.zap_pte_range
      0.00            +4.2        4.25 ± 65%  perf-profile.children.cycles-pp.generic_perform_write
      0.00            +4.3        4.29 ± 29%  perf-profile.children.cycles-pp.unmap_page_range
      0.00            +4.3        4.29 ± 29%  perf-profile.children.cycles-pp.zap_pmd_range
      0.00            +4.3        4.31 ± 51%  perf-profile.children.cycles-pp.do_filp_open
      0.00            +4.3        4.31 ± 51%  perf-profile.children.cycles-pp.path_openat
      0.19 ± 23%      +4.4        4.60 ± 26%  perf-profile.children.cycles-pp.__handle_mm_fault
      0.00            +4.5        4.46 ± 59%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.00            +4.5        4.55 ± 24%  perf-profile.children.cycles-pp.event_function_call
      0.00            +4.5        4.55 ± 24%  perf-profile.children.cycles-pp.smp_call_function_single
      0.00            +4.6        4.58 ± 30%  perf-profile.children.cycles-pp.unmap_vmas
      0.51 ±  6%      +4.6        5.14 ± 24%  perf-profile.children.cycles-pp.handle_mm_fault
      0.00            +4.7        4.68 ± 55%  perf-profile.children.cycles-pp.record__pushfn
      0.00            +4.7        4.68 ± 55%  perf-profile.children.cycles-pp.writen
      0.00            +4.8        4.80 ± 48%  perf-profile.children.cycles-pp.do_sys_openat2
      0.77 ±  3%      +4.8        5.59 ± 21%  perf-profile.children.cycles-pp.exc_page_fault
      0.76 ±  3%      +4.8        5.59 ± 21%  perf-profile.children.cycles-pp.do_user_addr_fault
      0.00            +4.9        4.90 ± 57%  perf-profile.children.cycles-pp.ksys_write
      0.00            +4.9        4.90 ± 57%  perf-profile.children.cycles-pp.vfs_write
      0.00            +4.9        4.90 ± 48%  perf-profile.children.cycles-pp.__x64_sys_openat
      0.00            +4.9        4.92 ± 26%  perf-profile.children.cycles-pp.sw_perf_event_destroy
      0.00            +5.0        4.99 ±100%  perf-profile.children.cycles-pp.perf_rotate_context
      0.00            +5.0        5.01 ± 54%  perf-profile.children.cycles-pp.write
      0.00            +5.1        5.09 ±102%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
      0.00            +5.4        5.43 ± 25%  perf-profile.children.cycles-pp._free_event
      1.18            +5.6        6.78 ± 20%  perf-profile.children.cycles-pp.asm_exc_page_fault
      0.46            +5.6        6.07 ± 90%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.00            +5.7        5.75 ± 39%  perf-profile.children.cycles-pp.perf_mmap__push
      0.00            +5.7        5.75 ± 39%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.53            +5.8        6.28 ± 89%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.53            +5.8        6.28 ± 89%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.00            +6.6        6.65 ± 77%  perf-profile.children.cycles-pp.__intel_pmu_enable_all
      0.00            +6.8        6.85 ± 20%  perf-profile.children.cycles-pp.exit_mm
      0.58 ±  2%      +7.6        8.14 ± 75%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.00            +7.7        7.67 ± 23%  perf-profile.children.cycles-pp.exit_mmap
      0.00            +7.7        7.67 ± 30%  perf-profile.children.cycles-pp.seq_read_iter
      0.00            +7.7        7.72 ± 80%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
      0.00            +7.8        7.75 ± 23%  perf-profile.children.cycles-pp.__mmput
      0.00            +8.0        8.03 ± 27%  perf-profile.children.cycles-pp.ksys_read
      0.00            +8.0        8.03 ± 27%  perf-profile.children.cycles-pp.vfs_read
      0.00            +8.1        8.13 ± 28%  perf-profile.children.cycles-pp.read
      0.02 ±141%      +9.0        9.05 ± 35%  perf-profile.children.cycles-pp.__cmd_record
      0.02 ±141%      +9.0        9.05 ± 35%  perf-profile.children.cycles-pp.cmd_record
      0.02 ±141%      +9.0        9.05 ± 35%  perf-profile.children.cycles-pp.handle_internal_command
      0.02 ±141%      +9.0        9.05 ± 35%  perf-profile.children.cycles-pp.main
      0.02 ±141%      +9.0        9.05 ± 35%  perf-profile.children.cycles-pp.run_builtin
      0.00            +9.3        9.26 ± 30%  perf-profile.children.cycles-pp.perf_event_release_kernel
      0.00            +9.3        9.26 ± 30%  perf-profile.children.cycles-pp.perf_release
      1.02 ±  4%      +9.3       10.33 ± 27%  perf-profile.children.cycles-pp.task_work_run
      0.00           +11.0       11.05 ± 28%  perf-profile.children.cycles-pp.__fput
      0.00           +15.8       15.85 ± 25%  perf-profile.children.cycles-pp.arch_do_signal_or_restart
      0.00           +15.8       15.85 ± 25%  perf-profile.children.cycles-pp.get_signal
      0.00           +19.1       19.09 ± 19%  perf-profile.children.cycles-pp.do_exit
      0.00           +19.1       19.09 ± 19%  perf-profile.children.cycles-pp.do_group_exit
      1.70 ±  2%     +30.7       32.41 ± 21%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      1.22 ±  3%     +36.9       38.10 ± 13%  perf-profile.children.cycles-pp.start_secondary
      1.21 ±  3%     +37.2       38.43 ± 11%  perf-profile.children.cycles-pp.acpi_idle_do_entry
      1.21 ±  3%     +37.2       38.43 ± 11%  perf-profile.children.cycles-pp.acpi_idle_enter
      1.21 ±  3%     +37.2       38.43 ± 11%  perf-profile.children.cycles-pp.acpi_safe_halt
      1.22 ±  3%     +37.3       38.54 ± 12%  perf-profile.children.cycles-pp.cpuidle_idle_call
      1.21 ±  3%     +37.3       38.54 ± 12%  perf-profile.children.cycles-pp.cpuidle_enter
      1.21 ±  3%     +37.3       38.54 ± 12%  perf-profile.children.cycles-pp.cpuidle_enter_state
      1.22 ±  3%     +37.6       38.84 ± 12%  perf-profile.children.cycles-pp.common_startup_64
      1.22 ±  3%     +37.6       38.84 ± 12%  perf-profile.children.cycles-pp.cpu_startup_entry
      1.22 ±  3%     +37.6       38.84 ± 12%  perf-profile.children.cycles-pp.do_idle
     92.37           -92.4        0.00        perf-profile.self.cycles-pp.osq_lock
      0.00            +2.1        2.06 ± 52%  perf-profile.self.cycles-pp._raw_spin_lock
      0.00            +2.6        2.61 ± 36%  perf-profile.self.cycles-pp.smp_call_function_single
      0.00            +3.7        3.68 ± 37%  perf-profile.self.cycles-pp.mutex_unlock
      0.00            +6.6        6.65 ± 77%  perf-profile.self.cycles-pp.__intel_pmu_enable_all
      1.19 ±  3%     +29.2       30.38 ± 15%  perf-profile.self.cycles-pp.acpi_safe_halt




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-02-06  8:02     ` Oliver Sang
@ 2025-02-07 18:10       ` Yang Shi
  2025-02-13  2:04         ` Oliver Sang
  0 siblings, 1 reply; 35+ messages in thread
From: Yang Shi @ 2025-02-07 18:10 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, linux-kernel, arnd, gregkh, Liam.Howlett,
	lorenzo.stoakes, vbabka, jannh, willy, liushixin2, akpm,
	linux-mm




On 2/6/25 12:02 AM, Oliver Sang wrote:
> hi, Yang Shi,
>
> On Fri, Jan 31, 2025 at 10:38:03AM -0800, Yang Shi wrote:
>>
>>
>> On 1/27/25 7:14 PM, kernel test robot wrote:
>>> hi, All,
>>>
>>> we don't have enough knowledge to understand fully the discussion for this
>>> patch, we saw "NACK" but there were more discussions later.
>>> so below report is just FYI what we observed in our tests. thanks
>> Thanks for the report. It was nack'ed because of the change to smaps/maps
>> files in proc.
>>
>>> Hello,
>>>
>>> kernel test robot noticed a 858.5% improvement of vm-scalability.throughput on:
>>>
>>>
>>> commit: 7143ee2391f1ea15e6791e129870473543634de2 ("[PATCH] /dev/zero: make private mapping full anonymous mapping")
>>> url: https://github.com/intel-lab-lkp/linux/commits/Yang-Shi/dev-zero-make-private-mapping-full-anonymous-mapping/20250114-063339
>>> base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git a68d3cbfade64392507302f3a920113b60dc811f
>>> patch link: https://lore.kernel.org/all/20250113223033.4054534-1-yang@os.amperecomputing.com/
>>> patch subject: [PATCH] /dev/zero: make private mapping full anonymous mapping
>>>
>>> testcase: vm-scalability
>>> config: x86_64-rhel-9.4
>>> compiler: gcc-12
>>> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
>>> parameters:
>>>
>>> 	runtime: 300s
>>> 	test: small-allocs
>> It seems this benchmark allocates huge amount of small areas (each area is
>> as big as 40K) by mmap'ing /dev/zero.
>>
>> This patch makes /dev/zero mapping a full anonymous mapping, so the later
>> vma_link_file() is actually skipped, which needs acquire file rmap lock then
>> insert the mapping into file rmap tree. The below profiling also showed
>> this.
>> Quoted here so that we don't have to scroll down:
>>
>>>        95.60           -95.2        0.42 ±113%  perf-profile.children.cycles-pp.__mmap
>>>        94.14           -93.6        0.54 ±106%  perf-profile.children.cycles-pp.__mmap_new_vma
>>>        93.79           -93.6        0.21 ±171%  perf-profile.children.cycles-pp.vma_link_file
>>>        93.40           -93.4        0.00        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
>>>        93.33           -93.3        0.00        perf-profile.children.cycles-pp.rwsem_optimistic_spin
>>>        93.44           -93.2        0.24 ±178%  perf-profile.children.cycles-pp.down_write
>>>        94.55           -93.1        1.40 ± 51%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
>>>        94.25           -93.0        1.30 ± 59%  perf-profile.children.cycles-pp.__mmap_region
>>>        92.91           -92.9        0.00        perf-profile.children.cycles-pp.osq_lock
>>>        94.45           -92.7        1.72 ± 34%  perf-profile.children.cycles-pp.do_mmap
>>>        94.46           -92.6        1.83 ± 31%  perf-profile.children.cycles-pp.vm_mmap_pgoff
>> It significantly speed up mmap for this benchmark and the rmap lock
>> contention is reduced significantly for both multi-processes and
>> multi-threads.
>>
>> The benchmark itself may exaggerate the improvement, but it may really speed
>> up some real life workloads. For example, multiple applications which may
>> allocate anonymous mapping by mmap'ing /dev/zero, then they may have
>> contention on /dev/zero's rmap lock.
>>
>> It doesn't make too much sense to link /dev/zero anonymous vmas to the file
>> rmap tree. So the below patch should be able to speed up the benchmark too.
> sorry for late and thanks a lot for information!
>
>> Oliver, can you please give this patch a try?
> it seems this is an alternative patch?

Yes

> since we applied your "/dev/zero: make private mapping full anonymous mapping"
> patch upon a68d3cbfad like below:
>
> * 7143ee2391f1e /dev/zero: make private mapping full anonymous mapping
> * a68d3cbfade64 memstick: core: fix kernel-doc notation
>
> so I applied below patch also upon a68d3cbfad.
>
> we saw big improvement but not that big.
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
>    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability
>
> commit:
>    a68d3cbfad ("memstick: core: fix kernel-doc notation")
>    52ec85cb99  <--- your patch
>
>
> a68d3cbfade64392 52ec85cb99e9b31dc304eae965a
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>    14364828 ±  4%    +410.6%   73349239 ±  3%  vm-scalability.throughput
>
> full comparison as below [1] just FYI.

Thanks for the update. I stared at the profiling report for a whole day, 
but I didn't figure out where that 400% lost. I just saw the number of 
page faults was fewer. And it seems like the reduction of page faults 
match the 400% loss. So I did more trace and profiling.

The test case did the below stuff in a tight loop:
   mmap 40K memory from /dev/zero (read only)
   read the area

So two major factors to the performance: mmap and page fault. The 
alternative patch did reduce the overhead of mmap to the same level as 
the original patch.

The further perf profiling showed the cost of page fault is higher than 
the original patch. But the profiling of page fault was interesting:

-   44.87%     0.01%  usemem [kernel.kallsyms]                   [k] 
do_translation_fault
    - 44.86% do_translation_fault
       - 44.83% do_page_fault
          - 44.53% handle_mm_fault
               9.04% __handle_mm_fault

Page fault consumed 40% of cpu time in handle_mm_fault, but 
__handle_mm_fault just consumed 9%, I expected it should be the major 
consumer.

So I annotated handle_mm_fault, then found the most time was consumed by 
lru_gen_enter_fault() -> vma_has_recency() (my kernel has multi-gen LRU 
enabled):

       │     if (vma->vm_file && (vma->vm_file->f_mode & FMODE_NOREUSE))
        │     ↓ cbz     x1, b4
   0.00 │       ldr     w0, [x1, #12]
  99.59 │       eor     x0, x0, #0x800000
   0.00 │       ubfx    w0, w0, #23, #1
        │     current->in_lru_fault = vma_has_recency(vma);
   0.00 │ b4:   ldrh    w1, [x2, #1992]
   0.01 │       bfi     w1, w0, #5, #1
   0.00 │       strh    w1, [x2, #1992]


vma_has_recency() read vma->vm_file->f_mode if vma->vm_file is not NULL. 
But that load took a long time. So I inspected struct file and saw:

struct file {
     file_ref_t            f_ref;
     spinlock_t            f_lock;
     fmode_t                f_mode;
     const struct file_operations    *f_op;
     ...
}

The f_mode is in the same cache line with f_ref (my kernel does NOT have 
spin lock debug enabled). The test case mmap /dev/zero in a tight loop, 
so the refcount is modified (fget/fput) very frequently, this resulted 
in somehow false sharing.

So I tried the below patch on top of the alternative patch:

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index f9157a0c42a5..ba11dc0b1c7c 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -608,6 +608,9 @@ static inline bool vma_has_recency(struct 
vm_area_struct *vma)
         if (vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))
                 return false;

+       if (vma_is_anonymous(vma))
+               return true;
+
         if (vma->vm_file && (vma->vm_file->f_mode & FMODE_NOREUSE))
                 return false;

This made the profiling of page fault look normal:

                         - 1.90% do_translation_fault
                            - 1.87% do_page_fault
                               - 1.49% handle_mm_fault
                                  - 1.36% __handle_mm_fault

Please try this in your test.

But AFAICT I have never seen performance issue reported due to the false 
sharing of refcount and other fields in struct file. This benchmark 
stressed this quite badly.

>
>>
>> diff --git a/mm/vma.c b/mm/vma.c
>> index bb2119e5a0d0..1092222c40ae 100644
>> --- a/mm/vma.c
>> +++ b/mm/vma.c
>> @@ -1633,6 +1633,9 @@ static void unlink_file_vma_batch_process(struct
>> unlink_vma_file_batch *vb)
>>   void unlink_file_vma_batch_add(struct unlink_vma_file_batch *vb,
>>                                 struct vm_area_struct *vma)
>>   {
>> +       if (vma_is_anonymous(vma))
>> +               return;
>> +
>>          if (vma->vm_file == NULL)
>>                  return;
>>
>> @@ -1658,6 +1661,9 @@ void unlink_file_vma(struct vm_area_struct *vma)
>>   {
>>          struct file *file = vma->vm_file;
>>
>> +       if (vma_is_anonymous(vma))
>> +               return;
>> +
>>          if (file) {
>>                  struct address_space *mapping = file->f_mapping;
>>
>> @@ -1672,6 +1678,9 @@ void vma_link_file(struct vm_area_struct *vma)
>>          struct file *file = vma->vm_file;
>>          struct address_space *mapping;
>>
>> +       if (vma_is_anonymous(vma))
>> +               return;
>> +
>>          if (file) {
>>                  mapping = file->f_mapping;
>>                  i_mmap_lock_write(mapping);
>>
>>
>> Because /dev/zero's private mapping is an anonymous mapping with valid
>> vm_file, so we need to bail out early if the vma is anonymous even though it
>> has vm_file. IMHO, making /dev/zero private mapping a full anonymous mapping
>> looks more clean.
>>
> [1]
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
>    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability
>
> commit:
>    a68d3cbfad ("memstick: core: fix kernel-doc notation")
>    52ec85cb99  <--- your patch
>
>
> a68d3cbfade64392 52ec85cb99e9b31dc304eae965a
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>   5.262e+09 ±  3%     -45.0%  2.896e+09 ±  6%  cpuidle..time
>     7924008 ±  3%     -79.3%    1643339 ± 11%  cpuidle..usage
>     1871164 ±  4%     -22.4%    1452554 ± 12%  numa-numastat.node3.local_node
>     1952164 ±  3%     -20.1%    1560294 ± 12%  numa-numastat.node3.numa_hit
>      399.52           -68.2%     126.86        uptime.boot
>       14507           -15.7%      12232        uptime.idle
>        6.99 ±  3%    +147.9%      17.34 ±  4%  vmstat.cpu.id
>        1.71          +473.6%       9.79 ±  2%  vmstat.cpu.us
>       34204 ±  5%     -72.9%       9272 ±  7%  vmstat.system.cs
>      266575           -21.2%     210191        vmstat.system.in
>        3408 ±  5%     -99.8%       8.38 ± 48%  perf-c2c.DRAM.local
>       18076 ±  3%     -99.8%      32.25 ± 27%  perf-c2c.DRAM.remote
>        8082 ±  5%     -99.8%      15.50 ± 64%  perf-c2c.HITM.local
>        6544 ±  6%     -99.8%      13.62 ± 51%  perf-c2c.HITM.remote
>       14627 ±  4%     -99.8%      29.12 ± 53%  perf-c2c.HITM.total
>        6.49 ±  3%      +8.8       15.24 ±  5%  mpstat.cpu.all.idle%
>        0.63            -0.3        0.32 ±  4%  mpstat.cpu.all.irq%
>        0.03 ±  2%      +0.2        0.26 ±  2%  mpstat.cpu.all.soft%
>       91.17           -17.0       74.15        mpstat.cpu.all.sys%
>        1.68 ±  2%      +8.3       10.03 ±  2%  mpstat.cpu.all.usr%
>      337.33           -97.4%       8.88 ± 75%  mpstat.max_utilization.seconds
>      352.76           -77.3%      79.95 ±  2%  time.elapsed_time
>      352.76           -77.3%      79.95 ±  2%  time.elapsed_time.max
>      225965 ±  7%     -16.0%     189844 ±  6%  time.involuntary_context_switches
>   9.592e+08 ±  4%     +11.9%  1.074e+09        time.minor_page_faults
>       20852            -8.8%      19012        time.percent_of_cpu_this_job_got
>       72302           -81.4%      13425 ±  3%  time.system_time
>        1260 ±  3%     +41.0%       1777        time.user_time
>     5393707 ±  5%     -98.4%      86880 ± 17%  time.voluntary_context_switches
>     1609925           -50.3%     800493        meminfo.Active
>     1609925           -50.3%     800493        meminfo.Active(anon)
>      160837 ± 33%     -63.9%      58119 ± 13%  meminfo.AnonHugePages
>     4435665           -18.5%    3614714        meminfo.Cached
>     1775547           -43.8%     998415        meminfo.Committed_AS
>      148539           -43.7%      83699 ±  4%  meminfo.Mapped
>     4245538 ±  4%     -20.9%    3356561        meminfo.PageTables
>    14166291 ±  4%      -9.6%   12806082        meminfo.SUnreclaim
>      929777           -88.2%     109274 ±  3%  meminfo.Shmem
>    14315492 ±  4%      -9.6%   12947821        meminfo.Slab
>       64129 ±  4%    +418.9%     332751 ±  3%  vm-scalability.median
>       45.40 ±  5%   +1961.8        2007 ±  8%  vm-scalability.stddev%
>    14364828 ±  4%    +410.6%   73349239 ±  3%  vm-scalability.throughput
>      352.76           -77.3%      79.95 ±  2%  vm-scalability.time.elapsed_time
>      352.76           -77.3%      79.95 ±  2%  vm-scalability.time.elapsed_time.max
>      225965 ±  7%     -16.0%     189844 ±  6%  vm-scalability.time.involuntary_context_switches
>   9.592e+08 ±  4%     +11.9%  1.074e+09        vm-scalability.time.minor_page_faults
>       20852            -8.8%      19012        vm-scalability.time.percent_of_cpu_this_job_got
>       72302           -81.4%      13425 ±  3%  vm-scalability.time.system_time
>        1260 ±  3%     +41.0%       1777        vm-scalability.time.user_time
>     5393707 ±  5%     -98.4%      86880 ± 17%  vm-scalability.time.voluntary_context_switches
>   4.316e+09 ±  4%     +11.9%  4.832e+09        vm-scalability.workload
>      265763 ±  4%     -20.5%     211398 ±  4%  numa-vmstat.node0.nr_page_table_pages
>       31364 ±106%     -85.0%       4690 ±169%  numa-vmstat.node0.nr_shmem
>       12205 ± 67%     -74.1%       3161 ±199%  numa-vmstat.node1.nr_mapped
>      265546 ±  4%     -21.8%     207742 ±  4%  numa-vmstat.node1.nr_page_table_pages
>       44052 ± 71%     -86.0%       6163 ±161%  numa-vmstat.node1.nr_shmem
>      885590 ±  4%      -9.9%     797649 ±  4%  numa-vmstat.node1.nr_slab_unreclaimable
>      264589 ±  4%     -21.2%     208598 ±  4%  numa-vmstat.node2.nr_page_table_pages
>      881598 ±  4%     -10.0%     793829 ±  4%  numa-vmstat.node2.nr_slab_unreclaimable
>      192683 ± 30%     -61.0%      75078 ± 70%  numa-vmstat.node3.nr_active_anon
>      286819 ±108%     -93.0%      19993 ± 39%  numa-vmstat.node3.nr_file_pages
>       13124 ± 49%     -92.3%       1006 ± 57%  numa-vmstat.node3.nr_mapped
>      264499 ±  4%     -22.1%     206135 ±  2%  numa-vmstat.node3.nr_page_table_pages
>      139810 ± 14%     -90.5%      13229 ± 89%  numa-vmstat.node3.nr_shmem
>      880199 ±  4%     -11.8%     776210 ±  5%  numa-vmstat.node3.nr_slab_unreclaimable
>      192683 ± 30%     -61.0%      75077 ± 70%  numa-vmstat.node3.nr_zone_active_anon
>     1951359 ±  3%     -20.1%    1558936 ± 12%  numa-vmstat.node3.numa_hit
>     1870359 ±  4%     -22.4%    1451195 ± 12%  numa-vmstat.node3.numa_local
>      402515           -50.3%     200150        proc-vmstat.nr_active_anon
>      170568            +1.9%     173746        proc-vmstat.nr_anon_pages
>     4257257            +0.9%    4296664        proc-vmstat.nr_dirty_background_threshold
>     8524925            +0.9%    8603835        proc-vmstat.nr_dirty_threshold
>     1109246           -18.5%     903959        proc-vmstat.nr_file_pages
>    42815276            +0.9%   43210344        proc-vmstat.nr_free_pages
>       37525           -43.6%      21164 ±  4%  proc-vmstat.nr_mapped
>     1059932 ±  4%     -21.1%     836810        proc-vmstat.nr_page_table_pages
>      232507           -88.2%      27341 ±  3%  proc-vmstat.nr_shmem
>       37297            -5.0%      35436        proc-vmstat.nr_slab_reclaimable
>     3537843 ±  4%      -9.8%    3192506        proc-vmstat.nr_slab_unreclaimable
>      402515           -50.3%     200150        proc-vmstat.nr_zone_active_anon
>       61931 ±  8%     -83.8%      10023 ± 45%  proc-vmstat.numa_hint_faults
>       15755 ± 21%     -87.1%       2039 ± 97%  proc-vmstat.numa_hint_faults_local
>     6916516 ±  3%      -7.1%    6425430        proc-vmstat.numa_hit
>     6568542 ±  3%      -7.5%    6077764        proc-vmstat.numa_local
>      293942 ±  3%     -69.6%      89435 ± 49%  proc-vmstat.numa_pte_updates
>   9.608e+08 ±  4%     +11.8%  1.074e+09        proc-vmstat.pgfault
>       55981 ±  2%     -63.1%      20641 ±  2%  proc-vmstat.pgreuse
>     1063552 ±  4%     -20.3%     847673 ±  4%  numa-meminfo.node0.PageTables
>     3565610 ±  4%      -8.0%    3279375 ±  3%  numa-meminfo.node0.SUnreclaim
>      125455 ±106%     -85.2%      18620 ±168%  numa-meminfo.node0.Shmem
>     3592377 ±  4%      -7.1%    3336072 ±  4%  numa-meminfo.node0.Slab
>       48482 ± 67%     -74.3%      12475 ±199%  numa-meminfo.node1.Mapped
>     1062709 ±  4%     -21.7%     831966 ±  4%  numa-meminfo.node1.PageTables
>     3543793 ±  4%     -10.0%    3189589 ±  4%  numa-meminfo.node1.SUnreclaim
>      176171 ± 71%     -86.0%      24677 ±161%  numa-meminfo.node1.Shmem
>     3593431 ±  4%     -10.4%    3220352 ±  4%  numa-meminfo.node1.Slab
>     1058901 ±  4%     -21.3%     833124 ±  4%  numa-meminfo.node2.PageTables
>     3527862 ±  4%     -10.2%    3168666 ±  5%  numa-meminfo.node2.SUnreclaim
>     3565750 ±  4%     -10.3%    3200248 ±  5%  numa-meminfo.node2.Slab
>      770405 ± 30%     -61.0%     300435 ± 70%  numa-meminfo.node3.Active
>      770405 ± 30%     -61.0%     300435 ± 70%  numa-meminfo.node3.Active(anon)
>     1146977 ±108%     -93.0%      80110 ± 40%  numa-meminfo.node3.FilePages
>       52663 ± 47%     -91.6%       4397 ± 56%  numa-meminfo.node3.Mapped
>     6368902 ± 20%     -21.2%    5021246 ±  2%  numa-meminfo.node3.MemUsed
>     1058539 ±  4%     -22.2%     823061 ±  3%  numa-meminfo.node3.PageTables
>     3522496 ±  4%     -12.1%    3096728 ±  6%  numa-meminfo.node3.SUnreclaim
>      558943 ± 14%     -90.5%      53054 ± 89%  numa-meminfo.node3.Shmem
>     3557392 ±  4%     -12.3%    3119454 ±  6%  numa-meminfo.node3.Slab
>        0.82 ±  4%     -39.7%       0.50 ± 12%  perf-stat.i.MPKI
>   2.714e+10 ±  2%    +185.7%  7.755e+10 ±  6%  perf-stat.i.branch-instructions
>        0.11 ±  3%      +0.1        0.20 ±  5%  perf-stat.i.branch-miss-rate%
>    24932893          +156.6%   63980942 ±  5%  perf-stat.i.branch-misses
>       64.93           -10.1       54.87 ±  2%  perf-stat.i.cache-miss-rate%
>       34508 ±  4%     -61.4%      13315 ± 10%  perf-stat.i.context-switches
>        7.67           -63.7%       2.79 ±  6%  perf-stat.i.cpi
>      224605           +10.8%     248972 ±  4%  perf-stat.i.cpu-clock
>      696.35 ±  2%     -57.4%     296.79 ±  3%  perf-stat.i.cpu-migrations
>   1.102e+11          +128.5%  2.518e+11 ±  6%  perf-stat.i.instructions
>        0.14          +198.2%       0.42 ±  5%  perf-stat.i.ipc
>       24.25 ±  3%    +375.8%     115.36 ±  3%  perf-stat.i.metric.K/sec
>     2722043 ±  3%    +439.7%   14690226 ±  6%  perf-stat.i.minor-faults
>     2722043 ±  3%    +439.7%   14690226 ±  6%  perf-stat.i.page-faults
>      224605           +10.8%     248972 ±  4%  perf-stat.i.task-clock
>        0.81 ±  3%     -52.5%       0.39 ± 14%  perf-stat.overall.MPKI
>        0.09            -0.0        0.08 ±  2%  perf-stat.overall.branch-miss-rate%
>       64.81            -6.4       58.40        perf-stat.overall.cache-miss-rate%
>        7.24           -56.3%       3.17 ±  3%  perf-stat.overall.cpi
>        0.14          +129.0%       0.32 ±  3%  perf-stat.overall.ipc
>        9012 ±  2%     -57.5%       3827        perf-stat.overall.path-length
>   2.701e+10 ±  2%    +159.6%  7.012e+10 ±  2%  perf-stat.ps.branch-instructions
>    24708939          +119.2%   54173035        perf-stat.ps.branch-misses
>       34266 ±  5%     -73.9%       8949 ±  7%  perf-stat.ps.context-switches
>   7.941e+11            -9.1%  7.219e+11        perf-stat.ps.cpu-cycles
>      693.54 ±  2%     -68.6%     217.73 ±  5%  perf-stat.ps.cpu-migrations
>   1.097e+11          +108.1%  2.282e+11 ±  2%  perf-stat.ps.instructions
>     2710577 ±  3%    +388.7%   13246535 ±  2%  perf-stat.ps.minor-faults
>     2710577 ±  3%    +388.7%   13246536 ±  2%  perf-stat.ps.page-faults
>   3.886e+13 ±  2%     -52.4%  1.849e+13        perf-stat.total.instructions
>    64052898 ±  5%     -96.2%    2460331 ±166%  sched_debug.cfs_rq:/.avg_vruntime.avg
>    95701822 ±  7%     -85.1%   14268127 ±116%  sched_debug.cfs_rq:/.avg_vruntime.max
>    43098762 ±  6%     -96.0%    1715136 ±173%  sched_debug.cfs_rq:/.avg_vruntime.min
>     9223270 ±  9%     -84.2%    1457904 ±122%  sched_debug.cfs_rq:/.avg_vruntime.stddev
>        0.78 ±  2%     -77.0%       0.18 ±130%  sched_debug.cfs_rq:/.h_nr_running.avg
>    43049468 ± 22%     -89.3%    4590302 ±180%  sched_debug.cfs_rq:/.left_deadline.max
>     3836405 ± 37%     -85.6%     550773 ±176%  sched_debug.cfs_rq:/.left_deadline.stddev
>    43049467 ± 22%     -89.3%    4590279 ±180%  sched_debug.cfs_rq:/.left_vruntime.max
>     3836405 ± 37%     -85.6%     550772 ±176%  sched_debug.cfs_rq:/.left_vruntime.stddev
>    64052901 ±  5%     -96.2%    2460341 ±166%  sched_debug.cfs_rq:/.min_vruntime.avg
>    95701822 ±  7%     -85.1%   14268127 ±116%  sched_debug.cfs_rq:/.min_vruntime.max
>    43098762 ±  6%     -96.0%    1715136 ±173%  sched_debug.cfs_rq:/.min_vruntime.min
>     9223270 ±  9%     -84.2%    1457902 ±122%  sched_debug.cfs_rq:/.min_vruntime.stddev
>        0.77 ±  2%     -77.4%       0.17 ±128%  sched_debug.cfs_rq:/.nr_running.avg
>        1.61 ± 24%    +396.0%       7.96 ± 62%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
>       86.69          +424.4%     454.62 ± 24%  sched_debug.cfs_rq:/.removed.runnable_avg.max
>       11.14 ± 13%    +409.8%      56.79 ± 35%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
>        1.61 ± 24%    +396.0%       7.96 ± 62%  sched_debug.cfs_rq:/.removed.util_avg.avg
>       86.69          +424.4%     454.62 ± 24%  sched_debug.cfs_rq:/.removed.util_avg.max
>       11.14 ± 13%    +409.8%      56.79 ± 35%  sched_debug.cfs_rq:/.removed.util_avg.stddev
>    43049467 ± 22%     -89.3%    4590282 ±180%  sched_debug.cfs_rq:/.right_vruntime.max
>     3836405 ± 37%     -85.6%     550772 ±176%  sched_debug.cfs_rq:/.right_vruntime.stddev
>      286633 ± 43%    +262.3%    1038592 ± 36%  sched_debug.cfs_rq:/.runnable_avg.avg
>    34728895 ± 30%    +349.2%   1.56e+08 ± 26%  sched_debug.cfs_rq:/.runnable_avg.max
>     2845573 ± 30%    +325.9%   12119045 ± 26%  sched_debug.cfs_rq:/.runnable_avg.stddev
>      769.03           -69.9%     231.86 ± 84%  sched_debug.cfs_rq:/.util_avg.avg
>        1621 ±  5%     -31.5%       1111 ±  8%  sched_debug.cfs_rq:/.util_avg.max
>      724.17 ±  2%     -89.6%      75.66 ±147%  sched_debug.cfs_rq:/.util_est.avg
>        1360 ± 15%     -39.2%     826.88 ± 37%  sched_debug.cfs_rq:/.util_est.max
>      766944 ±  3%     +18.1%     905901        sched_debug.cpu.avg_idle.avg
>      321459 ±  2%     -35.6%     207172 ± 10%  sched_debug.cpu.avg_idle.stddev
>      195573           -72.7%      53401 ± 24%  sched_debug.cpu.clock.avg
>      195596           -72.7%      53442 ± 24%  sched_debug.cpu.clock.max
>      195548           -72.7%      53352 ± 24%  sched_debug.cpu.clock.min
>      194424           -72.6%      53229 ± 24%  sched_debug.cpu.clock_task.avg
>      194608           -72.6%      53383 ± 24%  sched_debug.cpu.clock_task.max
>      181834           -77.5%      40964 ± 31%  sched_debug.cpu.clock_task.min
>        4241 ±  2%     -80.6%     821.65 ±142%  sched_debug.cpu.curr->pid.avg
>        9799 ±  2%     -55.4%       4365 ± 17%  sched_debug.cpu.curr->pid.max
>        1365 ± 10%     -48.0%     709.44 ±  5%  sched_debug.cpu.curr->pid.stddev
>      537665 ±  4%     +31.2%     705318 ± 14%  sched_debug.cpu.max_idle_balance_cost.max
>        3119 ± 56%    +579.1%      21184 ± 39%  sched_debug.cpu.max_idle_balance_cost.stddev
>        0.78 ±  2%     -76.3%       0.18 ±135%  sched_debug.cpu.nr_running.avg
>       25773 ±  5%     -96.1%       1007 ± 41%  sched_debug.cpu.nr_switches.avg
>       48669 ± 10%     -76.5%      11448 ± 13%  sched_debug.cpu.nr_switches.max
>       19006 ±  7%     -98.6%     258.81 ± 64%  sched_debug.cpu.nr_switches.min
>        4142 ±  8%     -66.3%       1396 ± 17%  sched_debug.cpu.nr_switches.stddev
>        0.07 ± 23%     -92.9%       0.01 ± 41%  sched_debug.cpu.nr_uninterruptible.avg
>      240.19 ± 16%     -82.1%      42.94 ± 41%  sched_debug.cpu.nr_uninterruptible.max
>      -77.92           -88.1%      -9.25        sched_debug.cpu.nr_uninterruptible.min
>       37.87 ±  5%     -85.8%       5.36 ± 13%  sched_debug.cpu.nr_uninterruptible.stddev
>      195549           -72.7%      53356 ± 24%  sched_debug.cpu_clk
>      194699           -73.0%      52506 ± 25%  sched_debug.ktime
>        0.00          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.avg
>        0.17          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.max
>        0.01          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.stddev
>      196368           -72.4%      54191 ± 24%  sched_debug.sched_clk
>        0.17 ±142%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>        0.19 ± 34%     -51.3%       0.09 ± 37%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        0.14 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>        0.14 ± 73%     -82.5%       0.03 ±168%  perf-sched.sch_delay.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>        0.11 ± 59%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.04 ±132%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>        0.02 ± 31%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        0.00 ±223%  +51950.0%       0.26 ±212%  perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
>        0.25 ± 59%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        0.12 ±145%     -99.1%       0.00 ±141%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>        0.25 ± 41%     -81.6%       0.05 ± 69%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
>        0.11 ± 59%     -87.1%       0.01 ±198%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
>        0.40 ± 50%     -97.8%       0.01 ± 30%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        2.25 ±138%     -99.6%       0.01 ±  7%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>        0.32 ±104%     -97.3%       0.01 ± 38%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>        0.01 ± 12%     -34.9%       0.01 ± 18%  perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        0.01 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>        0.19 ±185%     -95.6%       0.01 ± 44%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>        0.07 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        0.26 ± 17%     -98.8%       0.00 ± 10%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>        0.03 ± 51%     -69.7%       0.01 ± 67%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        0.01 ± 55%    +721.9%       0.10 ± 29%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.01 ±128%     -83.6%       0.00 ± 20%  perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>        0.06 ± 31%   +1921.5%       1.23 ±165%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1.00 ±151%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>       25.45 ± 94%     -98.6%       0.36 ± 61%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        4.56 ± 67%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>        3.55 ± 97%     -98.9%       0.04 ±189%  perf-sched.sch_delay.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>        3.16 ± 78%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.30 ±159%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>        0.03 ± 86%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        0.00 ±223%  +3.2e+06%      15.79 ±259%  perf-sched.sch_delay.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
>        3.09 ± 45%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        3.51 ± 21%     -86.1%       0.49 ± 72%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>        3.59 ± 11%     -92.0%       0.29 ±165%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
>        1.60 ± 69%     -95.7%       0.07 ±243%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
>        0.81 ± 43%     -98.5%       0.01 ± 43%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        1.02 ± 88%     -98.1%       0.02 ± 47%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>        9.68 ± 32%     -92.2%       0.76 ± 72%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>       12.26 ±109%     -92.9%       0.87 ±101%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        0.03 ±106%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>       37.84 ± 47%    -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        4.68 ± 36%     -99.8%       0.01 ± 65%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>        0.36 ±186%     -96.3%       0.01 ± 90%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>       97903 ±  4%     -38.3%      60433 ± 29%  perf-sched.total_wait_and_delay.count.ms
>        3.97 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>      302.41 ±  5%     -27.4%     219.54 ± 14%  perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1.48 ±  6%     -90.9%       0.14 ± 79%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>      327.16 ±  9%     -46.6%     174.81 ± 24%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      369.37 ±  2%     -75.3%      91.05 ± 35%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>        0.96 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>      187.66          +120.6%     413.97 ± 14%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        1831 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        6.17 ± 45%     -79.7%       1.25 ±142%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       40.50 ±  8%    +245.7%     140.00 ± 23%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>       13.17 ±  2%    +624.4%      95.38 ± 19%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       73021 ±  3%    -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>       11323 ±  3%     -75.9%       2725 ± 28%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        1887 ± 45%     -96.1%      73.88 ± 78%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        1238           -34.5%     811.25 ± 13%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       35.19 ± 57%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>       20.79 ± 19%     -95.9%       0.84 ± 93%  perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        1240 ± 20%     -14.4%       1062 ± 10%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      500.34           +31.2%     656.38 ± 39%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       58.83 ± 39%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        1237 ± 34%    +151.7%       3114 ± 25%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       49.27 ±119%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
>       58.17 ±187%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>        3.78 ±  5%     -97.6%       0.09 ± 37%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        2.99 ±  4%     +15.4%       3.45 ± 10%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>        3.92 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>        4.71 ±  8%     -99.5%       0.02 ±170%  perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>        1.67 ± 20%     -92.7%       0.12 ± 30%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>        2.10 ± 27%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.01 ± 44%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>        1.67 ± 21%     -94.3%       0.10 ± 35%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.04 ±133%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        2.30 ± 14%     -95.5%       0.10 ± 42%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>        2.00 ± 74%   +2917.4%      60.44 ± 33%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>       29.19 ±  5%     -38.5%      17.96 ± 28%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
>        0.37 ± 30%   +5524.5%      20.95 ± 30%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      302.40 ±  5%     -27.4%     219.53 ± 14%  perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1.40 ±  6%     -92.7%       0.10 ± 18%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        0.72 ±220%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>      326.84 ±  9%     -46.6%     174.54 ± 24%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      369.18 ±  2%     -75.3%      91.04 ± 35%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>        0.89 ±  6%    -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>      187.58          +120.6%     413.77 ± 14%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        2.36 ± 29%   +1759.6%      43.80 ± 33%  perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.01 ±156%     -97.9%       0.00 ±264%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>      340.69 ±135%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
>      535.09 ±128%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>       22.04 ± 32%     -98.4%       0.36 ± 61%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>       13.57 ± 17%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>       13.54 ± 10%     -99.7%       0.04 ±189%  perf-sched.wait_time.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>       10.17 ± 19%     -95.2%       0.49 ± 56%  perf-sched.wait_time.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>       11.35 ± 25%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.01 ± 32%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>       10.62 ±  9%     -96.5%       0.38 ± 72%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.20 ±199%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>       14.42 ± 22%     -96.6%       0.49 ± 72%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>        4.00 ± 74%  +19182.5%     772.23 ± 40%  perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>       10.75 ± 98%   +6512.2%     710.88 ± 56%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       15.80 ±  8%     -95.2%       0.76 ± 72%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>       11.64 ± 61%     -98.9%       0.13 ±132%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
>        2.94 ±213%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>        1240 ± 20%     -14.3%       1062 ± 10%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      500.11           +31.2%     656.37 ± 39%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       32.65 ± 33%    -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        1237 ± 34%    +151.6%       3113 ± 25%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       95.59           -95.6        0.00        perf-profile.calltrace.cycles-pp.__mmap
>       95.54           -95.5        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       95.54           -95.5        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
>       94.54           -94.5        0.00        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       94.46           -94.0        0.41 ±138%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       94.14           -93.7        0.40 ±136%  perf-profile.calltrace.cycles-pp.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
>       93.79           -93.5        0.31 ±134%  perf-profile.calltrace.cycles-pp.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
>       93.40           -93.4        0.00        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>       93.33           -93.3        0.00        perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma
>       93.44           -93.3        0.14 ±264%  perf-profile.calltrace.cycles-pp.down_write.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap
>       94.45           -93.0        1.42 ± 60%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       94.25           -92.9        1.33 ± 61%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>       92.89           -92.9        0.00        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file
>        0.00            +1.1        1.09 ± 33%  perf-profile.calltrace.cycles-pp.dup_mmap.dup_mm.copy_process.kernel_clone.__do_sys_clone
>        0.00            +1.4        1.37 ± 49%  perf-profile.calltrace.cycles-pp.setlocale
>        0.00            +1.6        1.64 ± 47%  perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry
>        0.00            +1.6        1.64 ± 47%  perf-profile.calltrace.cycles-pp.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
>        0.00            +1.6        1.65 ± 43%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +1.8        1.76 ± 44%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +1.9        1.93 ± 26%  perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
>        0.00            +2.2        2.16 ± 44%  perf-profile.calltrace.cycles-pp.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
>        0.00            +2.2        2.23 ± 33%  perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +2.4        2.37 ± 36%  perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
>        0.00            +2.5        2.48 ± 32%  perf-profile.calltrace.cycles-pp.get_cpu_sleep_time_us.get_idle_time.uptime_proc_show.seq_read_iter.vfs_read
>        0.00            +2.5        2.50 ± 45%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>        0.00            +2.5        2.54 ± 47%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
>        0.00            +2.5        2.54 ± 47%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
>        0.00            +2.6        2.62 ± 35%  perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +2.6        2.62 ± 35%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +2.6        2.62 ± 35%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +2.6        2.62 ± 35%  perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +2.7        2.68 ± 35%  perf-profile.calltrace.cycles-pp.get_idle_time.uptime_proc_show.seq_read_iter.vfs_read.ksys_read
>        0.00            +2.8        2.77 ± 33%  perf-profile.calltrace.cycles-pp.uptime_proc_show.seq_read_iter.vfs_read.ksys_read.do_syscall_64
>        0.00            +2.8        2.82 ± 32%  perf-profile.calltrace.cycles-pp._Fork
>        0.00            +2.8        2.84 ± 45%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>        0.00            +2.8        2.84 ± 45%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
>        0.00            +2.9        2.89 ± 39%  perf-profile.calltrace.cycles-pp.event_function_call.perf_event_release_kernel.perf_release.__fput.task_work_run
>        0.00            +2.9        2.89 ± 39%  perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_event_release_kernel.perf_release.__fput
>        0.00            +3.1        3.10 ± 64%  perf-profile.calltrace.cycles-pp.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.1        3.10 ± 64%  perf-profile.calltrace.cycles-pp.seq_read_iter.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64
>        0.00            +3.1        3.13 ± 33%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault
>        0.00            +3.2        3.18 ± 37%  perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.2        3.20 ± 28%  perf-profile.calltrace.cycles-pp.mutex_unlock.sw_perf_event_destroy._free_event.perf_event_release_kernel.perf_release
>        0.00            +3.2        3.24 ± 39%  perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.2        3.24 ± 36%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.2        3.24 ± 36%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
>        0.00            +3.2        3.24 ± 36%  perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.2        3.24 ± 36%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.8        3.85 ± 39%  perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
>        0.00            +3.8        3.85 ± 39%  perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
>        0.00            +3.8        3.85 ± 39%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
>        0.00            +3.8        3.85 ± 39%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve
>        0.00            +3.8        3.85 ± 39%  perf-profile.calltrace.cycles-pp.execve
>        0.00            +4.0        4.04 ± 43%  perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +4.0        4.04 ± 43%  perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
>        0.00            +4.1        4.10 ± 30%  perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
>        0.00            +4.2        4.18 ± 31%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput
>        0.00            +4.2        4.18 ± 31%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
>        0.00            +4.2        4.20 ± 28%  perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
>        0.00            +4.2        4.25 ± 65%  perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
>        0.00            +4.3        4.27 ± 26%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>        0.00            +4.3        4.30 ± 22%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.get_signal
>        0.00            +4.3        4.30 ± 22%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
>        0.00            +4.5        4.46 ± 59%  perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +4.6        4.57 ± 58%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen
>        0.00            +4.7        4.68 ± 55%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn
>        0.00            +4.7        4.68 ± 55%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn.perf_mmap__push
>        0.00            +4.7        4.68 ± 55%  perf-profile.calltrace.cycles-pp.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
>        0.00            +4.7        4.68 ± 55%  perf-profile.calltrace.cycles-pp.write.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist
>        0.00            +4.7        4.68 ± 55%  perf-profile.calltrace.cycles-pp.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record
>        0.00            +4.9        4.90 ± 57%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>        0.00            +4.9        4.92 ± 26%  perf-profile.calltrace.cycles-pp.sw_perf_event_destroy._free_event.perf_event_release_kernel.perf_release.__fput
>        0.00            +5.0        4.99 ±100%  perf-profile.calltrace.cycles-pp.__intel_pmu_enable_all.perf_rotate_context.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt
>        0.00            +5.0        4.99 ±100%  perf-profile.calltrace.cycles-pp.perf_rotate_context.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
>        0.00            +5.1        5.08 ±102%  perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
>        0.00            +5.1        5.14 ± 28%  perf-profile.calltrace.cycles-pp.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin
>        0.00            +5.1        5.14 ± 28%  perf-profile.calltrace.cycles-pp.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin.handle_internal_command
>        0.00            +5.4        5.43 ± 25%  perf-profile.calltrace.cycles-pp._free_event.perf_event_release_kernel.perf_release.__fput.task_work_run
>        0.00            +5.8        5.82 ± 94%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry
>        0.00            +5.8        5.82 ± 94%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
>        0.00            +6.1        6.07 ± 90%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
>        0.00            +6.6        6.62 ± 24%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.run_builtin.handle_internal_command.main
>        0.00            +6.6        6.62 ± 24%  perf-profile.calltrace.cycles-pp.cmd_record.run_builtin.handle_internal_command.main
>        0.00            +6.8        6.76 ± 18%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
>        0.00            +7.6        7.56 ± 76%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
>        0.00            +8.0        8.03 ± 27%  perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.00            +8.0        8.03 ± 27%  perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.00            +8.0        8.05 ± 68%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
>        0.00            +8.1        8.13 ± 28%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.00            +8.1        8.13 ± 28%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
>        0.00            +8.1        8.13 ± 28%  perf-profile.calltrace.cycles-pp.read
>        0.00            +9.1        9.05 ± 35%  perf-profile.calltrace.cycles-pp.handle_internal_command.main
>        0.00            +9.1        9.05 ± 35%  perf-profile.calltrace.cycles-pp.main
>        0.00            +9.1        9.05 ± 35%  perf-profile.calltrace.cycles-pp.run_builtin.handle_internal_command.main
>        0.00            +9.3        9.26 ± 30%  perf-profile.calltrace.cycles-pp.perf_event_release_kernel.perf_release.__fput.task_work_run.do_exit
>        0.00            +9.3        9.26 ± 30%  perf-profile.calltrace.cycles-pp.perf_release.__fput.task_work_run.do_exit.do_group_exit
>        0.00           +10.1       10.14 ± 28%  perf-profile.calltrace.cycles-pp.__fput.task_work_run.do_exit.do_group_exit.get_signal
>        0.00           +10.2       10.23 ± 27%  perf-profile.calltrace.cycles-pp.task_work_run.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
>        0.00           +11.0       10.98 ± 55%  perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
>        0.00           +20.6       20.64 ± 30%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00           +20.6       20.64 ± 30%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>        1.21 ±  3%     +36.6       37.80 ± 12%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
>        1.21 ±  3%     +36.6       37.80 ± 12%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>        1.22 ±  3%     +36.8       38.00 ± 13%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>        1.22 ±  3%     +36.9       38.10 ± 13%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
>        1.22 ±  3%     +36.9       38.10 ± 13%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
>        1.21 ±  3%     +37.2       38.43 ± 11%  perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
>        1.21 ±  3%     +37.2       38.43 ± 11%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>        1.21 ±  3%     +37.3       38.54 ± 12%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
>        1.22 ±  3%     +37.6       38.84 ± 12%  perf-profile.calltrace.cycles-pp.common_startup_64
>        2.19 ±  3%     +53.9       56.10 ± 19%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
>       95.60           -95.2        0.41 ±138%  perf-profile.children.cycles-pp.__mmap
>       94.14           -93.7        0.49 ±130%  perf-profile.children.cycles-pp.__mmap_new_vma
>       93.79           -93.5        0.31 ±134%  perf-profile.children.cycles-pp.vma_link_file
>       93.40           -93.4        0.00        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
>       93.33           -93.3        0.00        perf-profile.children.cycles-pp.rwsem_optimistic_spin
>       94.55           -93.1        1.42 ± 60%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
>       92.91           -92.9        0.00        perf-profile.children.cycles-pp.osq_lock
>       93.44           -92.7        0.75 ±109%  perf-profile.children.cycles-pp.down_write
>       94.46           -92.6        1.84 ± 34%  perf-profile.children.cycles-pp.vm_mmap_pgoff
>       94.45           -92.6        1.84 ± 34%  perf-profile.children.cycles-pp.do_mmap
>       94.25           -92.6        1.66 ± 37%  perf-profile.children.cycles-pp.__mmap_region
>       95.58           -44.8       50.78 ± 11%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>       95.58           -44.8       50.78 ± 11%  perf-profile.children.cycles-pp.do_syscall_64
>        0.00            +1.1        1.09 ± 33%  perf-profile.children.cycles-pp.dup_mmap
>        0.00            +1.4        1.37 ± 49%  perf-profile.children.cycles-pp.setlocale
>        0.00            +1.9        1.93 ± 26%  perf-profile.children.cycles-pp.dup_mm
>        0.03 ± 70%      +2.0        1.99 ± 36%  perf-profile.children.cycles-pp.handle_softirqs
>        0.00            +2.0        1.99 ± 36%  perf-profile.children.cycles-pp.__irq_exit_rcu
>        0.00            +2.0        2.02 ± 38%  perf-profile.children.cycles-pp.folios_put_refs
>        0.00            +2.1        2.06 ± 52%  perf-profile.children.cycles-pp._raw_spin_lock
>        0.00            +2.2        2.16 ± 44%  perf-profile.children.cycles-pp.do_pte_missing
>        0.00            +2.2        2.21 ± 68%  perf-profile.children.cycles-pp.link_path_walk
>        0.00            +2.2        2.23 ± 33%  perf-profile.children.cycles-pp.copy_process
>        0.00            +2.3        2.30 ± 40%  perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
>        0.00            +2.3        2.30 ± 40%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
>        0.00            +2.3        2.34 ± 46%  perf-profile.children.cycles-pp.walk_component
>        0.00            +2.4        2.37 ± 36%  perf-profile.children.cycles-pp.zap_present_ptes
>        0.00            +2.5        2.48 ± 32%  perf-profile.children.cycles-pp.get_cpu_sleep_time_us
>        0.00            +2.6        2.62 ± 35%  perf-profile.children.cycles-pp.__do_sys_clone
>        0.00            +2.6        2.62 ± 35%  perf-profile.children.cycles-pp.kernel_clone
>        0.00            +2.7        2.68 ± 35%  perf-profile.children.cycles-pp.get_idle_time
>        0.00            +2.8        2.77 ± 33%  perf-profile.children.cycles-pp.uptime_proc_show
>        0.00            +2.9        2.91 ± 32%  perf-profile.children.cycles-pp._Fork
>        0.00            +3.1        3.10 ± 64%  perf-profile.children.cycles-pp.proc_reg_read_iter
>        0.00            +3.2        3.24 ± 39%  perf-profile.children.cycles-pp.bprm_execve
>        0.00            +3.2        3.24 ± 36%  perf-profile.children.cycles-pp.__x64_sys_exit_group
>        0.00            +3.2        3.24 ± 36%  perf-profile.children.cycles-pp.x64_sys_call
>        0.00            +3.8        3.85 ± 39%  perf-profile.children.cycles-pp.__x64_sys_execve
>        0.00            +3.8        3.85 ± 39%  perf-profile.children.cycles-pp.do_execveat_common
>        0.00            +3.8        3.85 ± 39%  perf-profile.children.cycles-pp.execve
>        0.00            +4.0        3.99 ± 38%  perf-profile.children.cycles-pp.mutex_unlock
>        0.00            +4.2        4.19 ± 31%  perf-profile.children.cycles-pp.zap_pte_range
>        0.00            +4.2        4.25 ± 65%  perf-profile.children.cycles-pp.generic_perform_write
>        0.00            +4.3        4.29 ± 29%  perf-profile.children.cycles-pp.unmap_page_range
>        0.00            +4.3        4.29 ± 29%  perf-profile.children.cycles-pp.zap_pmd_range
>        0.00            +4.3        4.31 ± 51%  perf-profile.children.cycles-pp.do_filp_open
>        0.00            +4.3        4.31 ± 51%  perf-profile.children.cycles-pp.path_openat
>        0.19 ± 23%      +4.4        4.60 ± 26%  perf-profile.children.cycles-pp.__handle_mm_fault
>        0.00            +4.5        4.46 ± 59%  perf-profile.children.cycles-pp.shmem_file_write_iter
>        0.00            +4.5        4.55 ± 24%  perf-profile.children.cycles-pp.event_function_call
>        0.00            +4.5        4.55 ± 24%  perf-profile.children.cycles-pp.smp_call_function_single
>        0.00            +4.6        4.58 ± 30%  perf-profile.children.cycles-pp.unmap_vmas
>        0.51 ±  6%      +4.6        5.14 ± 24%  perf-profile.children.cycles-pp.handle_mm_fault
>        0.00            +4.7        4.68 ± 55%  perf-profile.children.cycles-pp.record__pushfn
>        0.00            +4.7        4.68 ± 55%  perf-profile.children.cycles-pp.writen
>        0.00            +4.8        4.80 ± 48%  perf-profile.children.cycles-pp.do_sys_openat2
>        0.77 ±  3%      +4.8        5.59 ± 21%  perf-profile.children.cycles-pp.exc_page_fault
>        0.76 ±  3%      +4.8        5.59 ± 21%  perf-profile.children.cycles-pp.do_user_addr_fault
>        0.00            +4.9        4.90 ± 57%  perf-profile.children.cycles-pp.ksys_write
>        0.00            +4.9        4.90 ± 57%  perf-profile.children.cycles-pp.vfs_write
>        0.00            +4.9        4.90 ± 48%  perf-profile.children.cycles-pp.__x64_sys_openat
>        0.00            +4.9        4.92 ± 26%  perf-profile.children.cycles-pp.sw_perf_event_destroy
>        0.00            +5.0        4.99 ±100%  perf-profile.children.cycles-pp.perf_rotate_context
>        0.00            +5.0        5.01 ± 54%  perf-profile.children.cycles-pp.write
>        0.00            +5.1        5.09 ±102%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
>        0.00            +5.4        5.43 ± 25%  perf-profile.children.cycles-pp._free_event
>        1.18            +5.6        6.78 ± 20%  perf-profile.children.cycles-pp.asm_exc_page_fault
>        0.46            +5.6        6.07 ± 90%  perf-profile.children.cycles-pp.__hrtimer_run_queues
>        0.00            +5.7        5.75 ± 39%  perf-profile.children.cycles-pp.perf_mmap__push
>        0.00            +5.7        5.75 ± 39%  perf-profile.children.cycles-pp.record__mmap_read_evlist
>        0.53            +5.8        6.28 ± 89%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
>        0.53            +5.8        6.28 ± 89%  perf-profile.children.cycles-pp.hrtimer_interrupt
>        0.00            +6.6        6.65 ± 77%  perf-profile.children.cycles-pp.__intel_pmu_enable_all
>        0.00            +6.8        6.85 ± 20%  perf-profile.children.cycles-pp.exit_mm
>        0.58 ±  2%      +7.6        8.14 ± 75%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
>        0.00            +7.7        7.67 ± 23%  perf-profile.children.cycles-pp.exit_mmap
>        0.00            +7.7        7.67 ± 30%  perf-profile.children.cycles-pp.seq_read_iter
>        0.00            +7.7        7.72 ± 80%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
>        0.00            +7.8        7.75 ± 23%  perf-profile.children.cycles-pp.__mmput
>        0.00            +8.0        8.03 ± 27%  perf-profile.children.cycles-pp.ksys_read
>        0.00            +8.0        8.03 ± 27%  perf-profile.children.cycles-pp.vfs_read
>        0.00            +8.1        8.13 ± 28%  perf-profile.children.cycles-pp.read
>        0.02 ±141%      +9.0        9.05 ± 35%  perf-profile.children.cycles-pp.__cmd_record
>        0.02 ±141%      +9.0        9.05 ± 35%  perf-profile.children.cycles-pp.cmd_record
>        0.02 ±141%      +9.0        9.05 ± 35%  perf-profile.children.cycles-pp.handle_internal_command
>        0.02 ±141%      +9.0        9.05 ± 35%  perf-profile.children.cycles-pp.main
>        0.02 ±141%      +9.0        9.05 ± 35%  perf-profile.children.cycles-pp.run_builtin
>        0.00            +9.3        9.26 ± 30%  perf-profile.children.cycles-pp.perf_event_release_kernel
>        0.00            +9.3        9.26 ± 30%  perf-profile.children.cycles-pp.perf_release
>        1.02 ±  4%      +9.3       10.33 ± 27%  perf-profile.children.cycles-pp.task_work_run
>        0.00           +11.0       11.05 ± 28%  perf-profile.children.cycles-pp.__fput
>        0.00           +15.8       15.85 ± 25%  perf-profile.children.cycles-pp.arch_do_signal_or_restart
>        0.00           +15.8       15.85 ± 25%  perf-profile.children.cycles-pp.get_signal
>        0.00           +19.1       19.09 ± 19%  perf-profile.children.cycles-pp.do_exit
>        0.00           +19.1       19.09 ± 19%  perf-profile.children.cycles-pp.do_group_exit
>        1.70 ±  2%     +30.7       32.41 ± 21%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>        1.22 ±  3%     +36.9       38.10 ± 13%  perf-profile.children.cycles-pp.start_secondary
>        1.21 ±  3%     +37.2       38.43 ± 11%  perf-profile.children.cycles-pp.acpi_idle_do_entry
>        1.21 ±  3%     +37.2       38.43 ± 11%  perf-profile.children.cycles-pp.acpi_idle_enter
>        1.21 ±  3%     +37.2       38.43 ± 11%  perf-profile.children.cycles-pp.acpi_safe_halt
>        1.22 ±  3%     +37.3       38.54 ± 12%  perf-profile.children.cycles-pp.cpuidle_idle_call
>        1.21 ±  3%     +37.3       38.54 ± 12%  perf-profile.children.cycles-pp.cpuidle_enter
>        1.21 ±  3%     +37.3       38.54 ± 12%  perf-profile.children.cycles-pp.cpuidle_enter_state
>        1.22 ±  3%     +37.6       38.84 ± 12%  perf-profile.children.cycles-pp.common_startup_64
>        1.22 ±  3%     +37.6       38.84 ± 12%  perf-profile.children.cycles-pp.cpu_startup_entry
>        1.22 ±  3%     +37.6       38.84 ± 12%  perf-profile.children.cycles-pp.do_idle
>       92.37           -92.4        0.00        perf-profile.self.cycles-pp.osq_lock
>        0.00            +2.1        2.06 ± 52%  perf-profile.self.cycles-pp._raw_spin_lock
>        0.00            +2.6        2.61 ± 36%  perf-profile.self.cycles-pp.smp_call_function_single
>        0.00            +3.7        3.68 ± 37%  perf-profile.self.cycles-pp.mutex_unlock
>        0.00            +6.6        6.65 ± 77%  perf-profile.self.cycles-pp.__intel_pmu_enable_all
>        1.19 ±  3%     +29.2       30.38 ± 15%  perf-profile.self.cycles-pp.acpi_safe_halt
>
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-02-07 18:10       ` Yang Shi
@ 2025-02-13  2:04         ` Oliver Sang
  2025-02-14 22:53           ` Yang Shi
  0 siblings, 1 reply; 35+ messages in thread
From: Oliver Sang @ 2025-02-13  2:04 UTC (permalink / raw)
  To: Yang Shi
  Cc: oe-lkp, lkp, linux-kernel, arnd, gregkh, Liam.Howlett,
	lorenzo.stoakes, vbabka, jannh, willy, liushixin2, akpm,
	linux-mm, oliver.sang

hi, Yang Shi,

On Fri, Feb 07, 2025 at 10:10:37AM -0800, Yang Shi wrote:
> 
> On 2/6/25 12:02 AM, Oliver Sang wrote:

[...]

> 
> > since we applied your "/dev/zero: make private mapping full anonymous mapping"
> > patch upon a68d3cbfad like below:
> > 
> > * 7143ee2391f1e /dev/zero: make private mapping full anonymous mapping
> > * a68d3cbfade64 memstick: core: fix kernel-doc notation
> > 
> > so I applied below patch also upon a68d3cbfad.
> > 
> > we saw big improvement but not that big.
> > 
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
> >    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability
> > 
> > commit:
> >    a68d3cbfad ("memstick: core: fix kernel-doc notation")
> >    52ec85cb99  <--- your patch
> > 
> > 
> > a68d3cbfade64392 52ec85cb99e9b31dc304eae965a
> > ---------------- ---------------------------
> >           %stddev     %change         %stddev
> >               \          |                \
> >    14364828 ±  4%    +410.6%   73349239 ±  3%  vm-scalability.throughput
> > 
> > full comparison as below [1] just FYI.
> 
> Thanks for the update. I stared at the profiling report for a whole day, but
> I didn't figure out where that 400% lost. I just saw the number of page
> faults was fewer. And it seems like the reduction of page faults match the
> 400% loss. So I did more trace and profiling.
> 
> The test case did the below stuff in a tight loop:
>   mmap 40K memory from /dev/zero (read only)
>   read the area
> 
> So two major factors to the performance: mmap and page fault. The
> alternative patch did reduce the overhead of mmap to the same level as the
> original patch.
> 
> The further perf profiling showed the cost of page fault is higher than the
> original patch. But the profiling of page fault was interesting:
> 
> -   44.87%     0.01%  usemem [kernel.kallsyms]                   [k]
> do_translation_fault
>    - 44.86% do_translation_fault
>       - 44.83% do_page_fault
>          - 44.53% handle_mm_fault
>               9.04% __handle_mm_fault
> 
> Page fault consumed 40% of cpu time in handle_mm_fault, but
> __handle_mm_fault just consumed 9%, I expected it should be the major
> consumer.
> 
> So I annotated handle_mm_fault, then found the most time was consumed by
> lru_gen_enter_fault() -> vma_has_recency() (my kernel has multi-gen LRU
> enabled):
> 
>       │     if (vma->vm_file && (vma->vm_file->f_mode & FMODE_NOREUSE))
>        │     ↓ cbz     x1, b4
>   0.00 │       ldr     w0, [x1, #12]
>  99.59 │       eor     x0, x0, #0x800000
>   0.00 │       ubfx    w0, w0, #23, #1
>        │     current->in_lru_fault = vma_has_recency(vma);
>   0.00 │ b4:   ldrh    w1, [x2, #1992]
>   0.01 │       bfi     w1, w0, #5, #1
>   0.00 │       strh    w1, [x2, #1992]
> 
> 
> vma_has_recency() read vma->vm_file->f_mode if vma->vm_file is not NULL. But
> that load took a long time. So I inspected struct file and saw:
> 
> struct file {
>     file_ref_t            f_ref;
>     spinlock_t            f_lock;
>     fmode_t                f_mode;
>     const struct file_operations    *f_op;
>     ...
> }
> 
> The f_mode is in the same cache line with f_ref (my kernel does NOT have
> spin lock debug enabled). The test case mmap /dev/zero in a tight loop, so
> the refcount is modified (fget/fput) very frequently, this resulted in
> somehow false sharing.
> 
> So I tried the below patch on top of the alternative patch:
> 
> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> index f9157a0c42a5..ba11dc0b1c7c 100644
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -608,6 +608,9 @@ static inline bool vma_has_recency(struct vm_area_struct
> *vma)
>         if (vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))
>                 return false;
> 
> +       if (vma_is_anonymous(vma))
> +               return true;
> +
>         if (vma->vm_file && (vma->vm_file->f_mode & FMODE_NOREUSE))
>                 return false;
> 
> This made the profiling of page fault look normal:
> 
>                         - 1.90% do_translation_fault
>                            - 1.87% do_page_fault
>                               - 1.49% handle_mm_fault
>                                  - 1.36% __handle_mm_fault
> 
> Please try this in your test.
> 
> But AFAICT I have never seen performance issue reported due to the false
> sharing of refcount and other fields in struct file. This benchmark stressed
> this quite badly.

I applied your above patch upon alternative patch last time, then saw more
improvement (+445.2% vs a68d3cbfad), but still not that big as in our original
report.

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability

commit:
  a68d3cbfad ("memstick: core: fix kernel-doc notation")
  52ec85cb99  <--- a68d3cbfad + alternative
  d4a204fefe  <--- a68d3cbfad + alternative + new patch in vma_has_recency()

a68d3cbfade64392 52ec85cb99e9b31dc304eae965a d4a204fefec91546a317e52ae19
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
  14364828 ±  4%    +410.6%   73349239 ±  3%    +445.2%   78318730 ±  4%  vm-scalability.throughput


full comparison is as below:

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability

commit:
  a68d3cbfad ("memstick: core: fix kernel-doc notation")
  52ec85cb99  <--- a68d3cbfad + alternative
  d4a204fefe  <--- a68d3cbfad + alternative + new patch in vma_has_recency()

a68d3cbfade64392 52ec85cb99e9b31dc304eae965a d4a204fefec91546a317e52ae19
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
 5.262e+09 ±  3%     -45.0%  2.896e+09 ±  6%     +10.0%  5.791e+09 ±126%  cpuidle..time
   7924008 ±  3%     -79.3%    1643339 ± 11%     -77.4%    1791703 ± 12%  cpuidle..usage
   1871164 ±  4%     -22.4%    1452554 ± 12%     -20.9%    1479724 ± 13%  numa-numastat.node3.local_node
   1952164 ±  3%     -20.1%    1560294 ± 12%     -19.1%    1580192 ± 12%  numa-numastat.node3.numa_hit
    399.52           -68.2%     126.86           -65.9%     136.26 ± 23%  uptime.boot
     14507           -15.7%      12232            +5.2%      15256 ± 48%  uptime.idle
      6.99 ±  3%    +147.9%      17.34 ±  4%    +249.9%      24.47 ± 62%  vmstat.cpu.id
      1.71          +473.6%       9.79 ±  2%    +437.6%       9.18 ± 19%  vmstat.cpu.us
     34204 ±  5%     -72.9%       9272 ±  7%     -73.5%       9074 ± 16%  vmstat.system.cs
    266575           -21.2%     210191           -26.9%     194776 ± 20%  vmstat.system.in
      3408 ±  5%     -99.8%       8.38 ± 48%     -99.6%      13.38 ± 68%  perf-c2c.DRAM.local
     18076 ±  3%     -99.8%      32.25 ± 27%     -99.7%      54.12 ± 35%  perf-c2c.DRAM.remote
      8082 ±  5%     -99.8%      15.50 ± 64%     -99.7%      26.38 ± 52%  perf-c2c.HITM.local
      6544 ±  6%     -99.8%      13.62 ± 51%     -99.7%      19.25 ± 43%  perf-c2c.HITM.remote
     14627 ±  4%     -99.8%      29.12 ± 53%     -99.7%      45.62 ± 43%  perf-c2c.HITM.total
      6.49 ±  3%      +8.8       15.24 ±  5%     +15.9       22.44 ± 71%  mpstat.cpu.all.idle%
      0.63            -0.3        0.32 ±  4%      -0.3        0.31 ± 22%  mpstat.cpu.all.irq%
      0.03 ±  2%      +0.2        0.26 ±  2%      +0.2        0.25 ± 20%  mpstat.cpu.all.soft%
     91.17           -17.0       74.15           -23.6       67.58 ± 20%  mpstat.cpu.all.sys%
      1.68 ±  2%      +8.3       10.03 ±  2%      +7.7        9.42 ± 19%  mpstat.cpu.all.usr%
    337.33           -97.4%       8.88 ± 75%     -98.2%       6.00 ± 88%  mpstat.max_utilization.seconds
    352.76           -77.3%      79.95 ±  2%     -78.5%      75.89 ±  3%  time.elapsed_time
    352.76           -77.3%      79.95 ±  2%     -78.5%      75.89 ±  3%  time.elapsed_time.max
    225965 ±  7%     -16.0%     189844 ±  6%     -20.6%     179334 ±  3%  time.involuntary_context_switches
 9.592e+08 ±  4%     +11.9%  1.074e+09           +11.9%  1.074e+09        time.minor_page_faults
     20852            -8.8%      19012            -9.8%      18815        time.percent_of_cpu_this_job_got
     72302           -81.4%      13425 ±  3%     -82.6%      12566 ±  4%  time.system_time
      1260 ±  3%     +41.0%       1777           +36.2%       1716        time.user_time
   5393707 ±  5%     -98.4%      86880 ± 17%     -98.2%      96659 ± 22%  time.voluntary_context_switches
   1609925           -50.3%     800493           -51.0%     788816 ±  2%  meminfo.Active
   1609925           -50.3%     800493           -51.0%     788816 ±  2%  meminfo.Active(anon)
    160837 ± 33%     -63.9%      58119 ± 13%     -65.9%      54899 ± 31%  meminfo.AnonHugePages
   4435665           -18.5%    3614714           -18.7%    3604829        meminfo.Cached
   1775547           -43.8%     998415           -44.8%     980447 ±  3%  meminfo.Committed_AS
    148539           -43.7%      83699 ±  4%     -46.1%      80050 ±  2%  meminfo.Mapped
   4245538 ±  4%     -20.9%    3356561           -28.0%    3056817 ± 20%  meminfo.PageTables
  14166291 ±  4%      -9.6%   12806082           -15.9%   11919101 ± 19%  meminfo.SUnreclaim
    929777           -88.2%     109274 ±  3%     -89.4%      98935 ± 15%  meminfo.Shmem
  14315492 ±  4%      -9.6%   12947821           -15.7%   12061412 ± 19%  meminfo.Slab
  25676018 ±  3%     +10.9%   28487403           +16.3%   29863951 ±  8%  meminfo.max_used_kB
     64129 ±  4%    +418.9%     332751 ±  3%    +453.6%     355040 ±  4%  vm-scalability.median
     45.40 ±  5%   +1961.8        2007 ±  8%   +2094.7        2140 ± 11%  vm-scalability.stddev%
  14364828 ±  4%    +410.6%   73349239 ±  3%    +445.2%   78318730 ±  4%  vm-scalability.throughput
    352.76           -77.3%      79.95 ±  2%     -78.5%      75.89 ±  3%  vm-scalability.time.elapsed_time
    352.76           -77.3%      79.95 ±  2%     -78.5%      75.89 ±  3%  vm-scalability.time.elapsed_time.max
    225965 ±  7%     -16.0%     189844 ±  6%     -20.6%     179334 ±  3%  vm-scalability.time.involuntary_context_switches
 9.592e+08 ±  4%     +11.9%  1.074e+09           +11.9%  1.074e+09        vm-scalability.time.minor_page_faults
     20852            -8.8%      19012            -9.8%      18815        vm-scalability.time.percent_of_cpu_this_job_got
     72302           -81.4%      13425 ±  3%     -82.6%      12566 ±  4%  vm-scalability.time.system_time
      1260 ±  3%     +41.0%       1777           +36.2%       1716        vm-scalability.time.user_time
   5393707 ±  5%     -98.4%      86880 ± 17%     -98.2%      96659 ± 22%  vm-scalability.time.voluntary_context_switches
 4.316e+09 ±  4%     +11.9%  4.832e+09           +11.9%  4.832e+09        vm-scalability.workload
    265763 ±  4%     -20.5%     211398 ±  4%     -28.7%     189557 ± 22%  numa-vmstat.node0.nr_page_table_pages
     31364 ±106%     -85.0%       4690 ±169%     -66.5%      10503 ±106%  numa-vmstat.node0.nr_shmem
    891094 ±  4%      -8.0%     819697 ±  3%     -17.0%     739565 ± 21%  numa-vmstat.node0.nr_slab_unreclaimable
     12205 ± 67%     -74.1%       3161 ±199%     -30.0%       8543 ± 98%  numa-vmstat.node1.nr_mapped
    265546 ±  4%     -21.8%     207742 ±  4%     -27.1%     193704 ± 22%  numa-vmstat.node1.nr_page_table_pages
     44052 ± 71%     -86.0%       6163 ±161%     -92.9%       3126 ±239%  numa-vmstat.node1.nr_shmem
    885590 ±  4%      -9.9%     797649 ±  4%     -15.0%     752585 ± 21%  numa-vmstat.node1.nr_slab_unreclaimable
    264589 ±  4%     -21.2%     208598 ±  4%     -28.0%     190497 ± 20%  numa-vmstat.node2.nr_page_table_pages
    881598 ±  4%     -10.0%     793829 ±  4%     -15.3%     747142 ± 19%  numa-vmstat.node2.nr_slab_unreclaimable
    192683 ± 30%     -61.0%      75078 ± 70%     -90.4%      18510 ±122%  numa-vmstat.node3.nr_active_anon
    286819 ±108%     -93.0%      19993 ± 39%     -88.8%      32096 ± 44%  numa-vmstat.node3.nr_file_pages
     13124 ± 49%     -92.3%       1006 ± 57%     -96.1%     510.58 ± 55%  numa-vmstat.node3.nr_mapped
    264499 ±  4%     -22.1%     206135 ±  2%     -30.9%     182777 ± 21%  numa-vmstat.node3.nr_page_table_pages
    139810 ± 14%     -90.5%      13229 ± 89%     -99.4%     844.61 ± 73%  numa-vmstat.node3.nr_shmem
    880199 ±  4%     -11.8%     776210 ±  5%     -18.3%     718982 ± 21%  numa-vmstat.node3.nr_slab_unreclaimable
    192683 ± 30%     -61.0%      75077 ± 70%     -90.4%      18510 ±122%  numa-vmstat.node3.nr_zone_active_anon
   1951359 ±  3%     -20.1%    1558936 ± 12%     -19.1%    1578968 ± 12%  numa-vmstat.node3.numa_hit
   1870359 ±  4%     -22.4%    1451195 ± 12%     -21.0%    1478500 ± 13%  numa-vmstat.node3.numa_local
    402515           -50.3%     200150           -51.0%     197173 ±  2%  proc-vmstat.nr_active_anon
    170568            +1.9%     173746            +1.7%     173416        proc-vmstat.nr_anon_pages
   4257257            +0.9%    4296664            +1.7%    4330365        proc-vmstat.nr_dirty_background_threshold
   8524925            +0.9%    8603835            +1.7%    8671318        proc-vmstat.nr_dirty_threshold
   1109246           -18.5%     903959           -18.7%     901412        proc-vmstat.nr_file_pages
  42815276            +0.9%   43210344            +1.7%   43547728        proc-vmstat.nr_free_pages
     37525           -43.6%      21164 ±  4%     -46.1%      20229 ±  2%  proc-vmstat.nr_mapped
   1059932 ±  4%     -21.1%     836810           -28.3%     760302 ± 20%  proc-vmstat.nr_page_table_pages
    232507           -88.2%      27341 ±  3%     -89.4%      24701 ± 15%  proc-vmstat.nr_shmem
     37297            -5.0%      35436            -4.6%      35576        proc-vmstat.nr_slab_reclaimable
   3537843 ±  4%      -9.8%    3192506           -16.1%    2966663 ± 20%  proc-vmstat.nr_slab_unreclaimable
    402515           -50.3%     200150           -51.0%     197173 ±  2%  proc-vmstat.nr_zone_active_anon
     61931 ±  8%     -83.8%      10023 ± 45%     -76.8%      14345 ± 33%  proc-vmstat.numa_hint_faults
     15755 ± 21%     -87.1%       2039 ± 97%     -79.9%       3159 ± 84%  proc-vmstat.numa_hint_faults_local
   6916516 ±  3%      -7.1%    6425430            -7.0%    6429349        proc-vmstat.numa_hit
   6568542 ±  3%      -7.5%    6077764            -7.4%    6081764        proc-vmstat.numa_local
    293942 ±  3%     -69.6%      89435 ± 49%     -68.7%      92135 ± 33%  proc-vmstat.numa_pte_updates
 9.608e+08 ±  4%     +11.8%  1.074e+09           +11.8%  1.074e+09        proc-vmstat.pgfault
     55981 ±  2%     -63.1%      20641 ±  2%     -61.6%      21497 ± 15%  proc-vmstat.pgreuse
   1063552 ±  4%     -20.3%     847673 ±  4%     -28.4%     761616 ± 21%  numa-meminfo.node0.PageTables
   3565610 ±  4%      -8.0%    3279375 ±  3%     -16.8%    2967130 ± 20%  numa-meminfo.node0.SUnreclaim
    125455 ±106%     -85.2%      18620 ±168%     -66.2%      42381 ±106%  numa-meminfo.node0.Shmem
   3592377 ±  4%      -7.1%    3336072 ±  4%     -16.2%    3011209 ± 20%  numa-meminfo.node0.Slab
     48482 ± 67%     -74.3%      12475 ±199%     -30.6%      33629 ± 99%  numa-meminfo.node1.Mapped
   1062709 ±  4%     -21.7%     831966 ±  4%     -26.7%     778849 ± 22%  numa-meminfo.node1.PageTables
   3543793 ±  4%     -10.0%    3189589 ±  4%     -14.8%    3018852 ± 21%  numa-meminfo.node1.SUnreclaim
    176171 ± 71%     -86.0%      24677 ±161%     -92.9%      12510 ±239%  numa-meminfo.node1.Shmem
   3593431 ±  4%     -10.4%    3220352 ±  4%     -14.6%    3069779 ± 21%  numa-meminfo.node1.Slab
   1058901 ±  4%     -21.3%     833124 ±  4%     -27.7%     766065 ± 19%  numa-meminfo.node2.PageTables
   3527862 ±  4%     -10.2%    3168666 ±  5%     -15.0%    2999540 ± 19%  numa-meminfo.node2.SUnreclaim
   3565750 ±  4%     -10.3%    3200248 ±  5%     -15.2%    3022861 ± 19%  numa-meminfo.node2.Slab
    770405 ± 30%     -61.0%     300435 ± 70%     -90.4%      74044 ±122%  numa-meminfo.node3.Active
    770405 ± 30%     -61.0%     300435 ± 70%     -90.4%      74044 ±122%  numa-meminfo.node3.Active(anon)
    380096 ± 50%     -32.8%     255397 ± 73%     -78.2%      82996 ±115%  numa-meminfo.node3.AnonPages.max
   1146977 ±108%     -93.0%      80110 ± 40%     -88.8%     128436 ± 44%  numa-meminfo.node3.FilePages
     52663 ± 47%     -91.6%       4397 ± 56%     -96.0%       2104 ± 52%  numa-meminfo.node3.Mapped
   6368902 ± 20%     -21.2%    5021246 ±  2%     -27.8%    4597733 ± 18%  numa-meminfo.node3.MemUsed
   1058539 ±  4%     -22.2%     823061 ±  3%     -30.6%     734757 ± 20%  numa-meminfo.node3.PageTables
   3522496 ±  4%     -12.1%    3096728 ±  6%     -18.1%    2885117 ± 21%  numa-meminfo.node3.SUnreclaim
    558943 ± 14%     -90.5%      53054 ± 89%     -99.4%       3423 ± 71%  numa-meminfo.node3.Shmem
   3557392 ±  4%     -12.3%    3119454 ±  6%     -18.2%    2909118 ± 20%  numa-meminfo.node3.Slab
      0.82 ±  4%     -39.7%       0.50 ± 12%     -28.2%       0.59 ± 34%  perf-stat.i.MPKI
 2.714e+10 ±  2%    +185.7%  7.755e+10 ±  6%    +174.8%  7.457e+10 ± 27%  perf-stat.i.branch-instructions
      0.11 ±  3%      +0.1        0.20 ±  5%      +0.3        0.40 ±121%  perf-stat.i.branch-miss-rate%
  24932893          +156.6%   63980942 ±  5%    +150.2%   62383567 ± 25%  perf-stat.i.branch-misses
     64.93           -10.1       54.87 ±  2%     -13.6       51.34 ± 20%  perf-stat.i.cache-miss-rate%
     34508 ±  4%     -61.4%      13315 ± 10%     -64.1%      12391 ± 25%  perf-stat.i.context-switches
      7.67           -63.7%       2.79 ±  6%     -67.4%       2.50 ± 14%  perf-stat.i.cpi
    224605           +10.8%     248972 ±  4%     +11.8%     251127 ±  4%  perf-stat.i.cpu-clock
    696.35 ±  2%     -57.4%     296.79 ±  3%     -59.8%     279.73 ±  5%  perf-stat.i.cpu-migrations
     10834 ±  4%     -12.5%       9483 ± 20%     -20.2%       8648 ± 28%  perf-stat.i.cycles-between-cache-misses
 1.102e+11          +128.5%  2.518e+11 ±  6%    +119.9%  2.423e+11 ± 27%  perf-stat.i.instructions
      0.14          +198.2%       0.42 ±  5%    +239.7%       0.48 ± 21%  perf-stat.i.ipc
     24.25 ±  3%    +375.8%     115.36 ±  3%    +353.8%     110.03 ± 26%  perf-stat.i.metric.K/sec
   2722043 ±  3%    +439.7%   14690226 ±  6%    +418.1%   14103930 ± 27%  perf-stat.i.minor-faults
   2722043 ±  3%    +439.7%   14690226 ±  6%    +418.1%   14103929 ± 27%  perf-stat.i.page-faults
    224605           +10.8%     248972 ±  4%     +11.8%     251127 ±  4%  perf-stat.i.task-clock
      0.81 ±  3%     -52.5%       0.39 ± 14%     -59.6%       0.33 ± 38%  perf-stat.overall.MPKI
      0.09            -0.0        0.08 ±  2%      -0.0        0.07 ± 37%  perf-stat.overall.branch-miss-rate%
     64.81            -6.4       58.40           -13.3       51.49 ± 37%  perf-stat.overall.cache-miss-rate%
      7.24           -56.3%       3.17 ±  3%     -63.8%       2.62 ± 38%  perf-stat.overall.cpi
      8933 ±  4%      -6.0%       8401 ± 16%     -21.3%       7029 ± 38%  perf-stat.overall.cycles-between-cache-misses
      0.14          +129.0%       0.32 ±  3%    +112.0%       0.29 ± 38%  perf-stat.overall.ipc
      9012 ±  2%     -57.5%       3827           -62.8%       3349 ± 37%  perf-stat.overall.path-length
 2.701e+10 ±  2%    +159.6%  7.012e+10 ±  2%    +117.1%  5.863e+10 ± 43%  perf-stat.ps.branch-instructions
  24708939          +119.2%   54173035           +81.0%   44726149 ± 43%  perf-stat.ps.branch-misses
     34266 ±  5%     -73.9%       8949 ±  7%     -77.8%       7599 ± 41%  perf-stat.ps.context-switches
 7.941e+11            -9.1%  7.219e+11           -27.9%  5.729e+11 ± 44%  perf-stat.ps.cpu-cycles
    693.54 ±  2%     -68.6%     217.73 ±  5%     -74.1%     179.66 ± 38%  perf-stat.ps.cpu-migrations
 1.097e+11          +108.1%  2.282e+11 ±  2%     +73.9%  1.907e+11 ± 43%  perf-stat.ps.instructions
   2710577 ±  3%    +388.7%   13246535 ±  2%    +308.6%   11076222 ± 44%  perf-stat.ps.minor-faults
   2710577 ±  3%    +388.7%   13246536 ±  2%    +308.6%   11076222 ± 44%  perf-stat.ps.page-faults
 3.886e+13 ±  2%     -52.4%  1.849e+13           -58.3%  1.619e+13 ± 37%  perf-stat.total.instructions
  64052898 ±  5%     -96.2%    2460331 ±166%     -93.1%    4432025 ±129%  sched_debug.cfs_rq:/.avg_vruntime.avg
  95701822 ±  7%     -85.1%   14268127 ±116%     -60.2%   38124846 ±118%  sched_debug.cfs_rq:/.avg_vruntime.max
  43098762 ±  6%     -96.0%    1715136 ±173%     -93.3%    2867368 ±131%  sched_debug.cfs_rq:/.avg_vruntime.min
   9223270 ±  9%     -84.2%    1457904 ±122%     -61.0%    3595639 ±113%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.00 ± 22%     -80.1%       0.00 ±185%     -86.8%       0.00 ±173%  sched_debug.cfs_rq:/.h_nr_delayed.avg
      0.69 ±  8%     -73.0%       0.19 ±185%     -82.0%       0.12 ±173%  sched_debug.cfs_rq:/.h_nr_delayed.max
      0.05 ± 12%     -76.3%       0.01 ±185%     -84.2%       0.01 ±173%  sched_debug.cfs_rq:/.h_nr_delayed.stddev
      0.78 ±  2%     -77.0%       0.18 ±130%     -71.9%       0.22 ±107%  sched_debug.cfs_rq:/.h_nr_running.avg
  43049468 ± 22%     -89.3%    4590302 ±180%     -89.0%    4726833 ±129%  sched_debug.cfs_rq:/.left_deadline.max
   3836405 ± 37%     -85.6%     550773 ±176%     -77.5%     864733 ±132%  sched_debug.cfs_rq:/.left_deadline.stddev
  43049467 ± 22%     -89.3%    4590279 ±180%     -89.0%    4726820 ±129%  sched_debug.cfs_rq:/.left_vruntime.max
   3836405 ± 37%     -85.6%     550772 ±176%     -77.5%     862614 ±132%  sched_debug.cfs_rq:/.left_vruntime.stddev
  64052901 ±  5%     -96.2%    2460341 ±166%     -93.1%    4432036 ±129%  sched_debug.cfs_rq:/.min_vruntime.avg
  95701822 ±  7%     -85.1%   14268127 ±116%     -60.2%   38124846 ±118%  sched_debug.cfs_rq:/.min_vruntime.max
  43098762 ±  6%     -96.0%    1715136 ±173%     -93.3%    2867368 ±131%  sched_debug.cfs_rq:/.min_vruntime.min
   9223270 ±  9%     -84.2%    1457902 ±122%     -61.0%    3595638 ±113%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.77 ±  2%     -77.4%       0.17 ±128%     -72.3%       0.21 ±107%  sched_debug.cfs_rq:/.nr_running.avg
      1.61 ± 24%    +396.0%       7.96 ± 62%    +355.1%       7.31 ± 52%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
     86.69          +424.4%     454.62 ± 24%    +400.6%     433.98 ± 26%  sched_debug.cfs_rq:/.removed.runnable_avg.max
     11.14 ± 13%    +409.8%      56.79 ± 35%    +373.6%      52.77 ± 34%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
      1.61 ± 24%    +396.0%       7.96 ± 62%    +355.1%       7.31 ± 52%  sched_debug.cfs_rq:/.removed.util_avg.avg
     86.69          +424.4%     454.62 ± 24%    +400.6%     433.98 ± 26%  sched_debug.cfs_rq:/.removed.util_avg.max
     11.14 ± 13%    +409.8%      56.79 ± 35%    +373.6%      52.77 ± 34%  sched_debug.cfs_rq:/.removed.util_avg.stddev
  43049467 ± 22%     -89.3%    4590282 ±180%     -89.0%    4726821 ±129%  sched_debug.cfs_rq:/.right_vruntime.max
   3836405 ± 37%     -85.6%     550772 ±176%     -77.5%     862614 ±132%  sched_debug.cfs_rq:/.right_vruntime.stddev
    286633 ± 43%    +262.3%    1038592 ± 36%    +188.3%     826260 ± 58%  sched_debug.cfs_rq:/.runnable_avg.avg
  34728895 ± 30%    +349.2%   1.56e+08 ± 26%    +293.3%  1.366e+08 ± 60%  sched_debug.cfs_rq:/.runnable_avg.max
   2845573 ± 30%    +325.9%   12119045 ± 26%    +251.3%    9995202 ± 55%  sched_debug.cfs_rq:/.runnable_avg.stddev
    769.03           -69.9%     231.86 ± 84%     -66.3%     259.37 ± 72%  sched_debug.cfs_rq:/.util_avg.avg
      1621 ±  5%     -31.5%       1111 ±  8%     -35.4%       1048 ±  8%  sched_debug.cfs_rq:/.util_avg.max
    159.12 ±  8%     +22.3%     194.66 ± 12%     +35.0%     214.82 ± 14%  sched_debug.cfs_rq:/.util_avg.stddev
    724.17 ±  2%     -89.6%      75.66 ±147%     -88.3%      84.74 ±123%  sched_debug.cfs_rq:/.util_est.avg
      1360 ± 15%     -39.2%     826.88 ± 37%     -29.0%     965.90 ± 48%  sched_debug.cfs_rq:/.util_est.max
    766944 ±  3%     +18.1%     905901           +21.7%     933047 ±  2%  sched_debug.cpu.avg_idle.avg
   1067639 ±  5%     +30.0%    1387534 ± 16%     +38.2%    1475131 ± 15%  sched_debug.cpu.avg_idle.max
    321459 ±  2%     -35.6%     207172 ± 10%     -33.5%     213764 ± 15%  sched_debug.cpu.avg_idle.stddev
    195573           -72.7%      53401 ± 24%     -68.5%      61507 ± 35%  sched_debug.cpu.clock.avg
    195596           -72.7%      53442 ± 24%     -68.5%      61565 ± 35%  sched_debug.cpu.clock.max
    195548           -72.7%      53352 ± 24%     -68.6%      61431 ± 35%  sched_debug.cpu.clock.min
    194424           -72.6%      53229 ± 24%     -68.5%      61304 ± 35%  sched_debug.cpu.clock_task.avg
    194608           -72.6%      53383 ± 24%     -68.4%      61478 ± 34%  sched_debug.cpu.clock_task.max
    181834           -77.5%      40964 ± 31%     -73.0%      49012 ± 43%  sched_debug.cpu.clock_task.min
      4241 ±  2%     -80.6%     821.65 ±142%     -77.1%     971.85 ±116%  sched_debug.cpu.curr->pid.avg
      9799 ±  2%     -55.4%       4365 ± 17%     -51.6%       4747 ± 22%  sched_debug.cpu.curr->pid.max
      1365 ± 10%     -48.0%     709.44 ±  5%     -39.9%     820.19 ± 24%  sched_debug.cpu.curr->pid.stddev
    537665 ±  4%     +31.2%     705318 ± 14%     +44.0%     774261 ± 15%  sched_debug.cpu.max_idle_balance_cost.max
      3119 ± 56%    +579.1%      21184 ± 39%   +1048.3%      35821 ± 65%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.78 ±  2%     -76.3%       0.18 ±135%     -72.0%       0.22 ±114%  sched_debug.cpu.nr_running.avg
     25773 ±  5%     -96.1%       1007 ± 41%     -95.2%       1246 ± 53%  sched_debug.cpu.nr_switches.avg
     48669 ± 10%     -76.5%      11448 ± 13%     -66.5%      16288 ± 70%  sched_debug.cpu.nr_switches.max
     19006 ±  7%     -98.6%     258.81 ± 64%     -98.4%     311.75 ± 58%  sched_debug.cpu.nr_switches.min
      4142 ±  8%     -66.3%       1396 ± 17%     -58.3%       1726 ± 51%  sched_debug.cpu.nr_switches.stddev
      0.07 ± 23%     -92.9%       0.01 ± 41%     -94.3%       0.00 ± 46%  sched_debug.cpu.nr_uninterruptible.avg
    240.19 ± 16%     -82.1%      42.94 ± 41%     -84.0%      38.50 ± 19%  sched_debug.cpu.nr_uninterruptible.max
    -77.92           -88.1%      -9.25           -84.9%     -11.77        sched_debug.cpu.nr_uninterruptible.min
     37.87 ±  5%     -85.8%       5.36 ± 13%     -85.3%       5.57 ±  5%  sched_debug.cpu.nr_uninterruptible.stddev
    195549           -72.7%      53356 ± 24%     -68.6%      61438 ± 35%  sched_debug.cpu_clk
    194699           -73.0%      52506 ± 25%     -68.9%      60588 ± 35%  sched_debug.ktime
      0.00          -100.0%       0.00           -62.5%       0.00 ±264%  sched_debug.rt_rq:.rt_nr_running.avg
      0.17          -100.0%       0.00           -62.5%       0.06 ±264%  sched_debug.rt_rq:.rt_nr_running.max
      0.01          -100.0%       0.00           -62.5%       0.00 ±264%  sched_debug.rt_rq:.rt_nr_running.stddev
    196368           -72.4%      54191 ± 24%     -68.3%      62327 ± 34%  sched_debug.sched_clk
      0.17 ±142%    -100.0%       0.00           -97.8%       0.00 ±264%  perf-sched.sch_delay.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
      0.19 ± 34%     -51.3%       0.09 ± 37%     -76.7%       0.04 ±110%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      0.14 ± 55%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      0.14 ± 73%     -82.5%       0.03 ±168%     -64.1%       0.05 ±177%  perf-sched.sch_delay.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
      0.11 ± 59%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.04 ±132%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.02 ± 31%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.00 ±223%  +51950.0%       0.26 ±212%   +6325.0%       0.03 ±124%  perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
      0.25 ± 59%    -100.0%       0.00           -64.9%       0.09 ±253%  perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.12 ±145%     -99.1%       0.00 ±141%     -99.5%       0.00 ±264%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      0.04 ± 55%     +99.5%       0.08 ±254%     -92.0%       0.00 ±103%  perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.25 ± 41%     -81.6%       0.05 ± 69%     -94.4%       0.01 ± 69%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      0.11 ± 59%     -87.1%       0.01 ±198%     -96.2%       0.00 ±128%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.40 ± 50%     -97.8%       0.01 ± 30%     -97.2%       0.01 ± 45%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2.25 ±138%     -99.6%       0.01 ±  7%     -63.9%       0.81 ±261%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      0.32 ±104%     -97.3%       0.01 ± 38%     -97.7%       0.01 ± 61%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.12 ± 21%     -61.6%       0.04 ±233%     -85.7%       0.02 ±190%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.01 ± 12%     -34.9%       0.01 ± 18%    +722.2%       0.07 ±251%  perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ± 42%     -41.4%       0.00 ± 72%     -76.6%       0.00 ± 77%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
      0.01 ± 20%    -100.0%       0.00           -96.4%       0.00 ±264%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      0.19 ±185%     -95.6%       0.01 ± 44%    +266.3%       0.70 ±261%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.07 ± 20%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      0.26 ± 17%     -98.8%       0.00 ± 10%     -98.9%       0.00 ± 39%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.03 ± 51%     -69.7%       0.01 ± 67%     -83.7%       0.01 ± 15%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.01 ± 55%    +721.9%       0.10 ± 29%   +1608.3%       0.20 ±227%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ±128%     -83.6%       0.00 ± 20%     -86.2%       0.00 ± 43%  perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      0.06 ± 31%   +1921.5%       1.23 ±165%  +13539.3%       8.30 ±201%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.00 ±151%    -100.0%       0.00           -99.6%       0.00 ±264%  perf-sched.sch_delay.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
     25.45 ± 94%     -98.6%       0.36 ± 61%     -99.4%       0.15 ±143%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      4.56 ± 67%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      3.55 ± 97%     -98.9%       0.04 ±189%     -98.5%       0.05 ±177%  perf-sched.sch_delay.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
      2.13 ± 67%     -77.2%       0.49 ± 56%     -88.8%       0.24 ±147%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
      3.16 ± 78%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.30 ±159%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      1.61 ±100%     -76.7%       0.38 ± 72%     -91.7%       0.13 ±145%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.03 ± 86%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.00 ±223%  +3.2e+06%      15.79 ±259%  +44450.0%       0.22 ±132%  perf-sched.sch_delay.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
      3.09 ± 45%    -100.0%       0.00           -94.6%       0.17 ±259%  perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      3.51 ± 21%     -86.1%       0.49 ± 72%     -90.7%       0.33 ±127%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      0.83 ±160%     -99.7%       0.00 ±141%     -99.9%       0.00 ±264%  perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      0.09 ± 31%    +179.7%       0.25 ±258%     -91.5%       0.01 ±132%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3.59 ± 11%     -92.0%       0.29 ±165%     -99.2%       0.03 ±118%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      1.60 ± 69%     -95.7%       0.07 ±243%     -99.0%       0.02 ±210%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.81 ± 43%     -98.5%       0.01 ± 43%     -98.3%       0.01 ± 41%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      1.02 ± 88%     -98.1%       0.02 ± 47%     -98.7%       0.01 ± 71%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      9.68 ± 32%     -92.2%       0.76 ± 72%     -78.1%       2.12 ±187%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.01 ± 49%     -51.9%       0.00 ± 72%     -80.8%       0.00 ± 77%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
     12.26 ±109%     -92.9%       0.87 ±101%     -86.9%       1.61 ±225%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      5.60 ±139%     -97.6%       0.13 ±132%     -99.3%       0.04 ±255%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.03 ±106%    -100.0%       0.00           -99.1%       0.00 ±264%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      2.11 ± 61%     -85.5%       0.31 ± 85%     -96.0%       0.08 ±124%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
     37.84 ± 47%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      4.68 ± 36%     -99.8%       0.01 ± 65%     -99.8%       0.01 ± 77%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      7.56 ± 74%     -51.5%       3.67 ±147%     -99.8%       0.02 ± 54%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.36 ±186%     -96.3%       0.01 ± 90%     -97.9%       0.01 ± 59%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     97903 ±  4%     -38.3%      60433 ± 29%     -71.4%      27976 ±109%  perf-sched.total_wait_and_delay.count.ms
      3.97 ±  6%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
    302.41 ±  5%     -27.4%     219.54 ± 14%     -10.8%     269.81 ± 60%  perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.48 ±  6%     -90.9%       0.14 ± 79%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
    327.16 ±  9%     -46.6%     174.81 ± 24%     -38.4%     201.64 ± 71%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    369.37 ±  2%     -75.3%      91.05 ± 35%     -77.7%      82.29 ±119%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.96 ±  6%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
    187.66          +120.6%     413.97 ± 14%    +116.9%     407.06 ± 43%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1831 ±  9%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      6.17 ± 45%     -79.7%       1.25 ±142%     -91.9%       0.50 ±264%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     14.33 ±  5%     +13.4%      16.25 ± 23%     -58.1%       6.00 ± 66%  perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
    810.00 ± 10%     -38.0%     502.25 ± 92%    -100.0%       0.00        perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     40.50 ±  8%    +245.7%     140.00 ± 23%     +72.5%      69.88 ± 91%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     13.17 ±  2%    +624.4%      95.38 ± 19%    +347.2%      58.88 ± 78%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     73021 ±  3%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
     11323 ±  3%     -75.9%       2725 ± 28%     -86.4%       1536 ± 34%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1887 ± 45%     -96.1%      73.88 ± 78%     -98.5%      28.75 ±120%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      1238           -34.5%     811.25 ± 13%     -58.6%     512.62 ± 49%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     35.19 ± 57%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
     20.79 ± 19%     -95.9%       0.84 ± 93%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      1240 ± 20%     -14.4%       1062 ± 10%     -25.2%     928.21 ± 40%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    500.34           +31.2%     656.38 ± 39%     -15.0%     425.46 ± 61%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     58.83 ± 39%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      1237 ± 34%    +151.7%       3114 ± 25%     +51.6%       1876 ± 64%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     49.27 ±119%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
     58.17 ±187%    -100.0%       0.00          -100.0%       0.00 ±264%  perf-sched.wait_time.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
      3.78 ±  5%     -97.6%       0.09 ± 37%     -98.8%       0.04 ±111%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      2.99 ±  4%     +15.4%       3.45 ± 10%     +28.8%       3.85 ± 54%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      3.92 ±  5%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      4.71 ±  8%     -99.5%       0.02 ±170%     -98.9%       0.05 ±177%  perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
      1.67 ± 20%     -92.7%       0.12 ± 30%     -96.8%       0.05 ±130%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
      2.10 ± 27%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.01 ± 44%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      1.67 ± 21%     -94.3%       0.10 ± 35%     -97.0%       0.05 ±137%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.04 ±133%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
     67.14 ± 73%     +75.6%     117.89 ±108%     -92.8%       4.82 ±259%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      1.65 ± 67%     -95.8%       0.07 ±128%     -99.2%       0.01 ±175%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
      2.30 ± 14%     -95.5%       0.10 ± 42%     -96.4%       0.08 ±108%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      2.00 ± 74%   +2917.4%      60.44 ± 33%   +1369.3%      29.43 ± 74%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
     29.19 ±  5%     -38.5%      17.96 ± 28%     -49.0%      14.89 ± 54%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      0.37 ± 30%   +5524.5%      20.95 ± 30%   +2028.0%       7.93 ±117%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    302.40 ±  5%     -27.4%     219.53 ± 14%     -10.8%     269.75 ± 60%  perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.40 ±  6%     -92.7%       0.10 ± 18%     -95.4%       0.06 ±109%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.72 ±220%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
    326.84 ±  9%     -46.6%     174.54 ± 24%     -38.6%     200.64 ± 72%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    369.18 ±  2%     -75.3%      91.04 ± 35%     -74.2%      95.16 ± 98%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.89 ±  6%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
    187.58          +120.6%     413.77 ± 14%    +116.9%     406.79 ± 43%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      2.36 ± 29%   +1759.6%      43.80 ± 33%   +3763.5%      90.99 ±115%  perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ±156%     -97.9%       0.00 ±264%     -98.9%       0.00 ±264%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
    750.01           -14.5%     641.50 ± 14%     -41.1%     442.13 ± 58%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    340.69 ±135%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
    535.09 ±128%    -100.0%       0.00          -100.0%       0.00 ±264%  perf-sched.wait_time.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
     22.04 ± 32%     -98.4%       0.36 ± 61%     -99.3%       0.15 ±143%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
     13.57 ± 17%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
     13.54 ± 10%     -99.7%       0.04 ±189%     -99.6%       0.05 ±177%  perf-sched.wait_time.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
     10.17 ± 19%     -95.2%       0.49 ± 56%     -97.7%       0.24 ±147%  perf-sched.wait_time.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
     11.35 ± 25%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.01 ± 32%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
     10.62 ±  9%     -96.5%       0.38 ± 72%     -98.7%       0.13 ±145%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.20 ±199%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      1559 ± 64%     -92.3%     120.30 ±109%     -99.4%       9.63 ±259%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      6.93 ± 53%     -98.1%       0.13 ± 99%     -99.8%       0.01 ±175%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
     14.42 ± 22%     -96.6%       0.49 ± 72%     -97.7%       0.33 ±127%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      4.00 ± 74%  +19182.5%     772.23 ± 40%   +7266.0%     295.00 ± 92%  perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
     10.75 ± 98%   +6512.2%     710.88 ± 56%   +2526.4%     282.37 ±130%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
     15.80 ±  8%     -95.2%       0.76 ± 72%     -86.6%       2.12 ±187%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     11.64 ± 61%     -98.9%       0.13 ±132%     -99.7%       0.04 ±255%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      2.94 ±213%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
    175.70 ±210%     -64.6%      62.26 ±263%     -99.8%       0.31 ±116%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      1240 ± 20%     -14.3%       1062 ± 10%     -25.2%     928.20 ± 40%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    500.11           +31.2%     656.37 ± 39%      -2.4%     487.96 ± 41%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     32.65 ± 33%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      1237 ± 34%    +151.6%       3113 ± 25%     +49.0%       1844 ± 63%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.36 ±190%     -97.2%       0.01 ±127%     -98.5%       0.01 ± 88%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     95.59           -95.6        0.00           -95.6        0.00        perf-profile.calltrace.cycles-pp.__mmap
     95.54           -95.5        0.00           -95.5        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     95.54           -95.5        0.00           -95.5        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
     94.54           -94.5        0.00           -94.5        0.00        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     94.46           -94.0        0.41 ±138%     -93.9        0.57 ±103%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     94.14           -93.7        0.40 ±136%     -93.6        0.50 ± 79%  perf-profile.calltrace.cycles-pp.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
     93.79           -93.5        0.31 ±134%     -93.2        0.58 ±111%  perf-profile.calltrace.cycles-pp.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
     93.40           -93.4        0.00           -93.4        0.00        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma.__mmap_region
     93.33           -93.3        0.00           -93.3        0.00        perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma
     93.44           -93.3        0.14 ±264%     -93.4        0.00        perf-profile.calltrace.cycles-pp.down_write.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap
     94.45           -93.0        1.42 ± 60%     -92.9        1.51 ± 51%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     94.25           -92.9        1.33 ± 61%     -92.8        1.43 ± 57%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
     92.89           -92.9        0.00           -92.9        0.00        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file
      0.00            +0.3        0.29 ±129%      +1.1        1.10 ± 27%  perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
      0.00            +0.3        0.32 ±129%      +1.7        1.70 ± 39%  perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
      0.00            +0.3        0.32 ±129%      +1.7        1.74 ± 40%  perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write
      0.00            +0.5        0.49 ± 78%      +1.7        1.74 ± 40%  perf-profile.calltrace.cycles-pp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.00            +1.1        1.09 ± 33%      +0.4        0.44 ±177%  perf-profile.calltrace.cycles-pp.dup_mmap.dup_mm.copy_process.kernel_clone.__do_sys_clone
      0.00            +1.3        1.32 ± 54%      +1.4        1.36 ± 33%  perf-profile.calltrace.cycles-pp.filp_close.put_files_struct.do_exit.do_group_exit.get_signal
      0.00            +1.3        1.32 ± 54%      +1.4        1.36 ± 33%  perf-profile.calltrace.cycles-pp.put_files_struct.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
      0.00            +1.4        1.37 ± 49%      +1.8        1.77 ± 50%  perf-profile.calltrace.cycles-pp.setlocale
      0.00            +1.4        1.39 ± 70%      +1.8        1.80 ± 48%  perf-profile.calltrace.cycles-pp.seq_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.4        1.39 ± 70%      +1.8        1.80 ± 48%  perf-profile.calltrace.cycles-pp.seq_read_iter.seq_read.vfs_read.ksys_read.do_syscall_64
      0.00            +1.5        1.55 ± 63%      +1.6        1.62 ± 37%  perf-profile.calltrace.cycles-pp.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.00            +1.6        1.60 ± 57%      +1.6        1.63 ± 87%  perf-profile.calltrace.cycles-pp.swevent_hlist_put_cpu.sw_perf_event_destroy._free_event.perf_event_release_kernel.perf_release
      0.00            +1.6        1.64 ± 47%      +0.9        0.90 ±101%  perf-profile.calltrace.cycles-pp.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
      0.00            +1.6        1.64 ± 47%      +1.0        1.02 ± 83%  perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry
      0.00            +1.6        1.65 ± 43%      +1.1        1.15 ± 76%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.8        1.76 ± 44%      +1.1        1.15 ± 76%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.9        1.93 ± 26%      +1.1        1.11 ±127%  perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
      0.00            +2.0        2.04 ± 66%      +3.6        3.65 ± 42%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
      0.00            +2.1        2.12 ± 58%      +3.6        3.65 ± 42%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
      0.00            +2.1        2.12 ± 58%      +3.6        3.65 ± 42%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
      0.00            +2.1        2.12 ± 58%      +3.7        3.71 ± 40%  perf-profile.calltrace.cycles-pp.open64
      0.00            +2.2        2.16 ± 44%      +1.6        1.62 ± 37%  perf-profile.calltrace.cycles-pp.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      0.00            +2.2        2.20 ± 74%      +3.6        3.65 ± 42%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
      0.00            +2.2        2.23 ± 33%      +1.4        1.40 ± 99%  perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +2.3        2.34 ±103%      +5.1        5.09 ± 64%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.perf_c2c__record.run_builtin.handle_internal_command
      0.00            +2.3        2.34 ±103%      +5.1        5.09 ± 64%  perf-profile.calltrace.cycles-pp.cmd_record.perf_c2c__record.run_builtin.handle_internal_command.main
      0.00            +2.3        2.34 ±103%      +5.1        5.09 ± 64%  perf-profile.calltrace.cycles-pp.perf_c2c__record.run_builtin.handle_internal_command.main
      0.00            +2.4        2.37 ± 36%      +1.9        1.93 ± 35%  perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      0.00            +2.5        2.48 ± 32%      +2.4        2.45 ± 60%  perf-profile.calltrace.cycles-pp.get_cpu_sleep_time_us.get_idle_time.uptime_proc_show.seq_read_iter.vfs_read
      0.00            +2.5        2.50 ± 45%      +1.2        1.21 ± 73%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00            +2.5        2.54 ± 47%      +1.3        1.28 ± 61%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
      0.00            +2.5        2.54 ± 47%      +1.3        1.28 ± 61%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      0.00            +2.6        2.62 ± 35%      +1.6        1.57 ± 91%  perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +2.6        2.62 ± 35%      +1.6        1.57 ± 91%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +2.6        2.62 ± 35%      +1.6        1.57 ± 91%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +2.6        2.62 ± 35%      +1.6        1.57 ± 91%  perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +2.7        2.67 ± 54%      +2.6        2.59 ± 40%  perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common
      0.00            +2.7        2.68 ± 35%      +3.0        3.02 ± 45%  perf-profile.calltrace.cycles-pp.get_idle_time.uptime_proc_show.seq_read_iter.vfs_read.ksys_read
      0.00            +2.8        2.77 ± 33%      +4.2        4.17 ± 35%  perf-profile.calltrace.cycles-pp.uptime_proc_show.seq_read_iter.vfs_read.ksys_read.do_syscall_64
      0.00            +2.8        2.82 ± 32%      +1.8        1.83 ± 85%  perf-profile.calltrace.cycles-pp._Fork
      0.00            +2.8        2.83 ± 48%      +2.6        2.59 ± 40%  perf-profile.calltrace.cycles-pp.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve
      0.00            +2.8        2.83 ± 48%      +2.7        2.68 ± 42%  perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
      0.00            +2.8        2.84 ± 45%      +1.2        1.21 ± 73%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00            +2.8        2.84 ± 45%      +1.2        1.21 ± 73%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
      0.00            +2.9        2.89 ± 39%      +3.1        3.14 ± 39%  perf-profile.calltrace.cycles-pp.event_function_call.perf_event_release_kernel.perf_release.__fput.task_work_run
      0.00            +2.9        2.89 ± 39%      +3.1        3.14 ± 39%  perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_event_release_kernel.perf_release.__fput
      0.00            +3.1        3.10 ± 64%      +0.9        0.91 ±264%  perf-profile.calltrace.cycles-pp.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.1        3.10 ± 64%      +0.9        0.91 ±264%  perf-profile.calltrace.cycles-pp.seq_read_iter.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64
      0.00            +3.1        3.13 ± 33%      +1.7        1.68 ± 77%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault
      0.00            +3.2        3.18 ± 37%      +4.3        4.31 ± 34%  perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.2        3.20 ± 28%      +3.0        3.02 ± 73%  perf-profile.calltrace.cycles-pp.mutex_unlock.sw_perf_event_destroy._free_event.perf_event_release_kernel.perf_release
      0.00            +3.2        3.24 ± 39%      +2.8        2.85 ± 49%  perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.2        3.24 ± 36%      +2.0        2.00 ± 56%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.2        3.24 ± 36%      +2.0        2.00 ± 56%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
      0.00            +3.2        3.24 ± 36%      +2.0        2.00 ± 56%  perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.2        3.24 ± 36%      +2.0        2.00 ± 56%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.8        3.85 ± 39%      +3.3        3.25 ± 47%  perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
      0.00            +3.8        3.85 ± 39%      +3.3        3.25 ± 47%  perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
      0.00            +3.8        3.85 ± 39%      +3.3        3.25 ± 47%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
      0.00            +3.8        3.85 ± 39%      +3.3        3.25 ± 47%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve
      0.00            +3.8        3.85 ± 39%      +3.3        3.29 ± 47%  perf-profile.calltrace.cycles-pp.execve
      0.00            +4.0        4.04 ± 43%      +5.2        5.21 ± 49%  perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +4.0        4.04 ± 43%      +5.2        5.21 ± 49%  perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
      0.00            +4.1        4.10 ± 30%      +2.6        2.56 ± 28%  perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
      0.00            +4.2        4.18 ± 31%      +2.8        2.82 ± 21%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput
      0.00            +4.2        4.18 ± 31%      +2.8        2.82 ± 21%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
      0.00            +4.2        4.20 ± 28%      +2.7        2.68 ± 34%  perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
      0.00            +4.2        4.25 ± 65%      +8.0        7.98 ± 43%  perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
      0.00            +4.3        4.27 ± 26%      +3.2        3.23 ± 34%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00            +4.3        4.30 ± 22%      +3.9        3.95 ± 32%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.get_signal
      0.00            +4.3        4.30 ± 22%      +3.9        3.95 ± 32%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
      0.00            +4.5        4.46 ± 59%      +8.1        8.07 ± 42%  perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +4.6        4.57 ± 58%      +8.1        8.07 ± 42%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen
      0.00            +4.7        4.68 ± 55%      +8.1        8.12 ± 43%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn
      0.00            +4.7        4.68 ± 55%      +8.1        8.12 ± 43%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn.perf_mmap__push
      0.00            +4.7        4.68 ± 55%      +8.2        8.16 ± 44%  perf-profile.calltrace.cycles-pp.write.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist
      0.00            +4.7        4.68 ± 55%      +8.4        8.39 ± 39%  perf-profile.calltrace.cycles-pp.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record
      0.00            +4.7        4.68 ± 55%      +8.6        8.61 ± 38%  perf-profile.calltrace.cycles-pp.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
      0.00            +4.9        4.90 ± 57%     +10.3       10.28 ± 65%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.00            +4.9        4.92 ± 26%      +4.6        4.56 ± 47%  perf-profile.calltrace.cycles-pp.sw_perf_event_destroy._free_event.perf_event_release_kernel.perf_release.__fput
      0.00            +5.0        4.99 ±100%      +2.6        2.64 ±101%  perf-profile.calltrace.cycles-pp.__intel_pmu_enable_all.perf_rotate_context.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt
      0.00            +5.0        4.99 ±100%      +2.6        2.64 ±101%  perf-profile.calltrace.cycles-pp.perf_rotate_context.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
      0.00            +5.1        5.08 ±102%      +2.6        2.64 ±101%  perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
      0.00            +5.1        5.14 ± 28%      +6.0        6.01 ± 41%  perf-profile.calltrace.cycles-pp.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin
      0.00            +5.1        5.14 ± 28%      +6.2        6.16 ± 39%  perf-profile.calltrace.cycles-pp.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin.handle_internal_command
      0.00            +5.4        5.43 ± 25%      +5.0        4.97 ± 45%  perf-profile.calltrace.cycles-pp._free_event.perf_event_release_kernel.perf_release.__fput.task_work_run
      0.00            +5.8        5.82 ± 94%      +4.2        4.21 ± 49%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
      0.00            +5.8        5.82 ± 94%      +4.3        4.35 ± 53%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry
      0.00            +6.1        6.07 ± 90%      +4.3        4.32 ± 58%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      0.00            +6.6        6.62 ± 24%      +7.0        6.99 ± 41%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.run_builtin.handle_internal_command.main
      0.00            +6.6        6.62 ± 24%      +7.0        6.99 ± 41%  perf-profile.calltrace.cycles-pp.cmd_record.run_builtin.handle_internal_command.main
      0.00            +6.8        6.76 ± 18%      +5.2        5.23 ± 25%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
      0.00            +7.6        7.56 ± 76%      +6.0        5.99 ± 38%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
      0.00            +8.0        8.03 ± 27%      +7.4        7.37 ± 52%  perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.00            +8.0        8.03 ± 27%      +7.4        7.37 ± 52%  perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.00            +8.0        8.05 ± 68%      +6.3        6.27 ± 37%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
      0.00            +8.1        8.13 ± 28%      +7.4        7.37 ± 52%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.00            +8.1        8.13 ± 28%      +7.4        7.37 ± 52%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
      0.00            +8.1        8.13 ± 28%      +7.4        7.37 ± 52%  perf-profile.calltrace.cycles-pp.read
      0.00            +9.1        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.calltrace.cycles-pp.handle_internal_command.main
      0.00            +9.1        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.calltrace.cycles-pp.main
      0.00            +9.1        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.calltrace.cycles-pp.run_builtin.handle_internal_command.main
      0.00            +9.3        9.26 ± 30%      +9.0        8.96 ± 31%  perf-profile.calltrace.cycles-pp.perf_event_release_kernel.perf_release.__fput.task_work_run.do_exit
      0.00            +9.3        9.26 ± 30%      +9.0        8.96 ± 31%  perf-profile.calltrace.cycles-pp.perf_release.__fput.task_work_run.do_exit.do_group_exit
      0.00           +10.1       10.14 ± 28%     +10.0       10.04 ± 34%  perf-profile.calltrace.cycles-pp.__fput.task_work_run.do_exit.do_group_exit.get_signal
      0.00           +10.2       10.23 ± 27%     +10.7       10.65 ± 35%  perf-profile.calltrace.cycles-pp.task_work_run.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
      0.00           +11.0       10.98 ± 55%     +13.0       13.00 ± 27%  perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
      0.00           +20.6       20.64 ± 30%     +19.5       19.49 ± 43%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00           +20.6       20.64 ± 30%     +19.5       19.49 ± 43%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.21 ±  3%     +36.6       37.80 ± 12%     +34.1       35.32 ± 11%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      1.21 ±  3%     +36.6       37.80 ± 12%     +34.4       35.62 ± 11%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.22 ±  3%     +36.8       38.00 ± 13%     +34.8       36.05 ± 11%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.22 ±  3%     +36.9       38.10 ± 13%     +34.8       36.05 ± 11%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
      1.22 ±  3%     +36.9       38.10 ± 13%     +34.8       36.05 ± 11%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
      1.21 ±  3%     +37.2       38.43 ± 11%     +34.2       35.40 ±  8%  perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      1.21 ±  3%     +37.2       38.43 ± 11%     +34.2       35.40 ±  8%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      1.21 ±  3%     +37.3       38.54 ± 12%     +34.7       35.87 ± 10%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      1.22 ±  3%     +37.6       38.84 ± 12%     +35.4       36.60 ± 11%  perf-profile.calltrace.cycles-pp.common_startup_64
      2.19 ±  3%     +53.9       56.10 ± 19%     +48.4       50.63 ± 13%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
     95.60           -95.2        0.41 ±138%     -94.9        0.72 ± 95%  perf-profile.children.cycles-pp.__mmap
     94.14           -93.7        0.49 ±130%     -92.9        1.21 ± 33%  perf-profile.children.cycles-pp.__mmap_new_vma
     93.79           -93.5        0.31 ±134%     -93.1        0.71 ± 78%  perf-profile.children.cycles-pp.vma_link_file
     93.40           -93.4        0.00           -93.4        0.00        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
     93.33           -93.3        0.00           -93.3        0.00        perf-profile.children.cycles-pp.rwsem_optimistic_spin
     94.55           -93.1        1.42 ± 60%     -93.0        1.55 ± 50%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
     92.91           -92.9        0.00           -92.9        0.00        perf-profile.children.cycles-pp.osq_lock
     93.44           -92.7        0.75 ±109%     -93.4        0.06 ±264%  perf-profile.children.cycles-pp.down_write
     94.46           -92.6        1.84 ± 34%     -92.0        2.48 ± 28%  perf-profile.children.cycles-pp.vm_mmap_pgoff
     94.45           -92.6        1.84 ± 34%     -92.0        2.48 ± 28%  perf-profile.children.cycles-pp.do_mmap
     94.25           -92.6        1.66 ± 37%     -91.9        2.40 ± 30%  perf-profile.children.cycles-pp.__mmap_region
     95.58           -44.8       50.78 ± 11%     -42.8       52.76 ± 11%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     95.58           -44.8       50.78 ± 11%     -42.8       52.76 ± 11%  perf-profile.children.cycles-pp.do_syscall_64
      0.00            +0.1        0.09 ±264%      +1.0        0.96 ± 46%  perf-profile.children.cycles-pp.kcpustat_cpu_fetch
      0.25 ±  3%      +0.2        0.45 ±133%      +0.7        0.92 ± 41%  perf-profile.children.cycles-pp.vma_interval_tree_insert
      0.00            +0.3        0.29 ±129%      +1.2        1.16 ± 26%  perf-profile.children.cycles-pp.do_open
      0.00            +0.3        0.32 ±129%      +1.8        1.79 ± 43%  perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
      0.00            +0.3        0.32 ±129%      +1.8        1.83 ± 44%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.00            +0.5        0.49 ± 78%      +1.8        1.83 ± 44%  perf-profile.children.cycles-pp.shmem_write_begin
      0.00            +1.1        1.09 ± 33%      +0.5        0.48 ±160%  perf-profile.children.cycles-pp.dup_mmap
      0.00            +1.1        1.11 ±106%      +1.6        1.60 ± 54%  perf-profile.children.cycles-pp.__open64_nocancel
      0.00            +1.1        1.15 ±102%      +1.2        1.16 ± 86%  perf-profile.children.cycles-pp.evlist_cpu_iterator__next
      0.00            +1.3        1.32 ± 54%      +1.4        1.36 ± 33%  perf-profile.children.cycles-pp.filp_close
      0.00            +1.3        1.32 ± 54%      +1.5        1.47 ± 29%  perf-profile.children.cycles-pp.put_files_struct
      0.00            +1.4        1.37 ± 49%      +1.8        1.77 ± 50%  perf-profile.children.cycles-pp.setlocale
      0.00            +1.4        1.39 ± 70%      +1.8        1.80 ± 48%  perf-profile.children.cycles-pp.seq_read
      0.00            +1.5        1.55 ± 63%      +1.7        1.75 ± 30%  perf-profile.children.cycles-pp.do_read_fault
      0.00            +1.7        1.66 ± 76%      +0.9        0.91 ± 44%  perf-profile.children.cycles-pp.event_function
      0.00            +1.7        1.66 ± 76%      +0.9        0.91 ± 44%  perf-profile.children.cycles-pp.remote_function
      0.00            +1.7        1.70 ± 71%      +1.5        1.53 ± 73%  perf-profile.children.cycles-pp.lookup_fast
      0.00            +1.7        1.73 ± 53%      +1.4        1.40 ± 77%  perf-profile.children.cycles-pp.swevent_hlist_put_cpu
      0.04 ± 44%      +1.8        1.83 ± 96%      +2.4        2.47 ± 44%  perf-profile.children.cycles-pp.__schedule
      0.00            +1.9        1.93 ± 26%      +1.1        1.15 ±120%  perf-profile.children.cycles-pp.dup_mm
      0.03 ± 70%      +2.0        1.99 ± 36%      +1.2        1.23 ± 81%  perf-profile.children.cycles-pp.handle_softirqs
      0.00            +2.0        1.99 ± 36%      +1.1        1.13 ± 67%  perf-profile.children.cycles-pp.__irq_exit_rcu
      0.00            +2.0        2.02 ± 38%      +1.3        1.33 ± 57%  perf-profile.children.cycles-pp.folios_put_refs
      0.00            +2.1        2.06 ± 52%      +1.4        1.38 ± 77%  perf-profile.children.cycles-pp._raw_spin_lock
      0.00            +2.1        2.12 ± 58%      +3.7        3.71 ± 40%  perf-profile.children.cycles-pp.open64
      0.00            +2.2        2.16 ± 44%      +1.7        1.75 ± 30%  perf-profile.children.cycles-pp.do_pte_missing
      0.00            +2.2        2.21 ± 68%      +2.2        2.18 ± 58%  perf-profile.children.cycles-pp.link_path_walk
      0.00            +2.2        2.23 ± 33%      +1.4        1.40 ± 99%  perf-profile.children.cycles-pp.copy_process
      0.00            +2.3        2.30 ± 40%      +1.8        1.78 ± 48%  perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
      0.00            +2.3        2.30 ± 40%      +1.8        1.78 ± 48%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
      0.00            +2.3        2.34 ±103%      +5.1        5.09 ± 64%  perf-profile.children.cycles-pp.perf_c2c__record
      0.00            +2.3        2.34 ± 46%      +1.5        1.52 ± 99%  perf-profile.children.cycles-pp.walk_component
      0.00            +2.4        2.37 ± 36%      +2.0        2.04 ± 32%  perf-profile.children.cycles-pp.zap_present_ptes
      0.00            +2.5        2.48 ± 32%      +2.5        2.51 ± 55%  perf-profile.children.cycles-pp.get_cpu_sleep_time_us
      0.00            +2.5        2.50 ± 73%      +1.6        1.56 ± 76%  perf-profile.children.cycles-pp.__evlist__enable
      0.00            +2.6        2.62 ± 35%      +1.6        1.57 ± 91%  perf-profile.children.cycles-pp.__do_sys_clone
      0.00            +2.6        2.62 ± 35%      +1.6        1.57 ± 91%  perf-profile.children.cycles-pp.kernel_clone
      0.00            +2.7        2.67 ± 54%      +2.6        2.59 ± 40%  perf-profile.children.cycles-pp.load_elf_binary
      0.00            +2.7        2.68 ± 35%      +3.0        3.02 ± 45%  perf-profile.children.cycles-pp.get_idle_time
      0.00            +2.8        2.77 ± 33%      +4.2        4.17 ± 35%  perf-profile.children.cycles-pp.uptime_proc_show
      0.00            +2.8        2.83 ± 48%      +2.6        2.59 ± 40%  perf-profile.children.cycles-pp.search_binary_handler
      0.00            +2.8        2.83 ± 48%      +2.7        2.68 ± 42%  perf-profile.children.cycles-pp.exec_binprm
      0.00            +2.9        2.91 ± 32%      +1.8        1.83 ± 85%  perf-profile.children.cycles-pp._Fork
      0.00            +3.1        3.10 ± 64%      +0.9        0.95 ±252%  perf-profile.children.cycles-pp.proc_reg_read_iter
      0.00            +3.2        3.24 ± 39%      +2.8        2.85 ± 49%  perf-profile.children.cycles-pp.bprm_execve
      0.00            +3.2        3.24 ± 36%      +2.0        2.00 ± 56%  perf-profile.children.cycles-pp.__x64_sys_exit_group
      0.00            +3.2        3.24 ± 36%      +2.1        2.09 ± 53%  perf-profile.children.cycles-pp.x64_sys_call
      0.00            +3.8        3.85 ± 39%      +3.3        3.29 ± 47%  perf-profile.children.cycles-pp.execve
      0.00            +3.8        3.85 ± 39%      +3.3        3.34 ± 49%  perf-profile.children.cycles-pp.__x64_sys_execve
      0.00            +3.8        3.85 ± 39%      +3.3        3.34 ± 49%  perf-profile.children.cycles-pp.do_execveat_common
      0.00            +4.0        3.99 ± 38%      +4.1        4.06 ± 54%  perf-profile.children.cycles-pp.mutex_unlock
      0.00            +4.2        4.19 ± 31%      +3.0        3.02 ± 20%  perf-profile.children.cycles-pp.zap_pte_range
      0.00            +4.2        4.25 ± 65%      +8.0        7.98 ± 43%  perf-profile.children.cycles-pp.generic_perform_write
      0.00            +4.3        4.29 ± 29%      +3.0        3.02 ± 20%  perf-profile.children.cycles-pp.unmap_page_range
      0.00            +4.3        4.29 ± 29%      +3.0        3.02 ± 20%  perf-profile.children.cycles-pp.zap_pmd_range
      0.00            +4.3        4.31 ± 51%      +5.3        5.31 ± 46%  perf-profile.children.cycles-pp.do_filp_open
      0.00            +4.3        4.31 ± 51%      +5.3        5.31 ± 46%  perf-profile.children.cycles-pp.path_openat
      0.19 ± 23%      +4.4        4.60 ± 26%      +3.4        3.54 ± 27%  perf-profile.children.cycles-pp.__handle_mm_fault
      0.00            +4.5        4.46 ± 59%      +8.1        8.07 ± 42%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.00            +4.5        4.55 ± 24%      +4.0        3.97 ± 39%  perf-profile.children.cycles-pp.smp_call_function_single
      0.00            +4.5        4.55 ± 24%      +4.1        4.06 ± 38%  perf-profile.children.cycles-pp.event_function_call
      0.00            +4.6        4.58 ± 30%      +3.2        3.19 ± 24%  perf-profile.children.cycles-pp.unmap_vmas
      0.51 ±  6%      +4.6        5.14 ± 24%      +3.6        4.06 ± 30%  perf-profile.children.cycles-pp.handle_mm_fault
      0.00            +4.7        4.68 ± 55%      +8.4        8.41 ± 39%  perf-profile.children.cycles-pp.writen
      0.00            +4.7        4.68 ± 55%      +8.5        8.49 ± 39%  perf-profile.children.cycles-pp.record__pushfn
      0.00            +4.8        4.80 ± 48%      +6.1        6.15 ± 34%  perf-profile.children.cycles-pp.do_sys_openat2
      0.77 ±  3%      +4.8        5.59 ± 21%      +4.3        5.07 ± 29%  perf-profile.children.cycles-pp.exc_page_fault
      0.76 ±  3%      +4.8        5.59 ± 21%      +4.3        5.07 ± 29%  perf-profile.children.cycles-pp.do_user_addr_fault
      0.00            +4.9        4.90 ± 57%     +10.3       10.28 ± 65%  perf-profile.children.cycles-pp.vfs_write
      0.00            +4.9        4.90 ± 57%     +10.4       10.41 ± 63%  perf-profile.children.cycles-pp.ksys_write
      0.00            +4.9        4.90 ± 48%      +6.1        6.15 ± 34%  perf-profile.children.cycles-pp.__x64_sys_openat
      0.00            +4.9        4.92 ± 26%      +4.7        4.66 ± 47%  perf-profile.children.cycles-pp.sw_perf_event_destroy
      0.00            +5.0        4.99 ±100%      +2.6        2.64 ±101%  perf-profile.children.cycles-pp.perf_rotate_context
      0.00            +5.0        5.01 ± 54%     +10.9       10.87 ± 59%  perf-profile.children.cycles-pp.write
      0.00            +5.1        5.09 ±102%      +2.7        2.74 ± 94%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
      0.00            +5.4        5.43 ± 25%      +5.0        4.97 ± 45%  perf-profile.children.cycles-pp._free_event
      1.18            +5.6        6.78 ± 20%      +5.5        6.71 ± 24%  perf-profile.children.cycles-pp.asm_exc_page_fault
      0.46            +5.6        6.07 ± 90%      +4.1        4.54 ± 53%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.00            +5.7        5.75 ± 39%     +10.2       10.22 ± 24%  perf-profile.children.cycles-pp.perf_mmap__push
      0.00            +5.7        5.75 ± 39%     +10.4       10.38 ± 23%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.53            +5.8        6.28 ± 89%      +4.4        4.91 ± 50%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.53            +5.8        6.28 ± 89%      +4.4        4.91 ± 50%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.00            +6.6        6.65 ± 77%      +3.3        3.32 ± 91%  perf-profile.children.cycles-pp.__intel_pmu_enable_all
      0.00            +6.8        6.85 ± 20%      +5.2        5.23 ± 25%  perf-profile.children.cycles-pp.exit_mm
      0.58 ±  2%      +7.6        8.14 ± 75%      +6.0        6.55 ± 38%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.00            +7.7        7.67 ± 23%      +6.1        6.14 ± 15%  perf-profile.children.cycles-pp.exit_mmap
      0.00            +7.7        7.67 ± 30%      +7.0        7.05 ± 50%  perf-profile.children.cycles-pp.seq_read_iter
      0.00            +7.7        7.72 ± 80%      +8.2        8.15 ± 51%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
      0.00            +7.8        7.75 ± 23%      +6.1        6.14 ± 15%  perf-profile.children.cycles-pp.__mmput
      0.00            +8.0        8.03 ± 27%      +7.4        7.37 ± 52%  perf-profile.children.cycles-pp.ksys_read
      0.00            +8.0        8.03 ± 27%      +7.4        7.37 ± 52%  perf-profile.children.cycles-pp.vfs_read
      0.00            +8.1        8.13 ± 28%      +7.4        7.37 ± 52%  perf-profile.children.cycles-pp.read
      0.02 ±141%      +9.0        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.children.cycles-pp.__cmd_record
      0.02 ±141%      +9.0        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.children.cycles-pp.cmd_record
      0.02 ±141%      +9.0        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.children.cycles-pp.handle_internal_command
      0.02 ±141%      +9.0        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.children.cycles-pp.main
      0.02 ±141%      +9.0        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.children.cycles-pp.run_builtin
      0.00            +9.3        9.26 ± 30%      +9.0        8.96 ± 31%  perf-profile.children.cycles-pp.perf_event_release_kernel
      0.00            +9.3        9.26 ± 30%      +9.0        8.96 ± 31%  perf-profile.children.cycles-pp.perf_release
      1.02 ±  4%      +9.3       10.33 ± 27%      +9.8       10.80 ± 35%  perf-profile.children.cycles-pp.task_work_run
      0.00           +11.0       11.05 ± 28%     +10.4       10.37 ± 32%  perf-profile.children.cycles-pp.__fput
      0.00           +15.8       15.85 ± 25%     +16.1       16.11 ± 29%  perf-profile.children.cycles-pp.get_signal
      0.00           +15.8       15.85 ± 25%     +16.2       16.17 ± 29%  perf-profile.children.cycles-pp.arch_do_signal_or_restart
      0.00           +19.1       19.09 ± 19%     +18.1       18.06 ± 29%  perf-profile.children.cycles-pp.do_exit
      0.00           +19.1       19.09 ± 19%     +18.1       18.06 ± 29%  perf-profile.children.cycles-pp.do_group_exit
      1.70 ±  2%     +30.7       32.41 ± 21%     +27.2       28.87 ± 12%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      1.22 ±  3%     +36.9       38.10 ± 13%     +34.8       36.05 ± 11%  perf-profile.children.cycles-pp.start_secondary
      1.21 ±  3%     +37.2       38.43 ± 11%     +34.2       35.40 ±  8%  perf-profile.children.cycles-pp.acpi_idle_do_entry
      1.21 ±  3%     +37.2       38.43 ± 11%     +34.2       35.40 ±  8%  perf-profile.children.cycles-pp.acpi_idle_enter
      1.21 ±  3%     +37.2       38.43 ± 11%     +34.2       35.40 ±  8%  perf-profile.children.cycles-pp.acpi_safe_halt
      1.22 ±  3%     +37.3       38.54 ± 12%     +35.0       36.18 ± 10%  perf-profile.children.cycles-pp.cpuidle_idle_call
      1.21 ±  3%     +37.3       38.54 ± 12%     +34.7       35.87 ± 10%  perf-profile.children.cycles-pp.cpuidle_enter
      1.21 ±  3%     +37.3       38.54 ± 12%     +34.7       35.87 ± 10%  perf-profile.children.cycles-pp.cpuidle_enter_state
      1.22 ±  3%     +37.6       38.84 ± 12%     +35.4       36.60 ± 11%  perf-profile.children.cycles-pp.common_startup_64
      1.22 ±  3%     +37.6       38.84 ± 12%     +35.4       36.60 ± 11%  perf-profile.children.cycles-pp.cpu_startup_entry
      1.22 ±  3%     +37.6       38.84 ± 12%     +35.4       36.60 ± 11%  perf-profile.children.cycles-pp.do_idle
     92.37           -92.4        0.00           -92.4        0.00        perf-profile.self.cycles-pp.osq_lock
      0.00            +0.1        0.09 ±264%      +0.8        0.84 ± 51%  perf-profile.self.cycles-pp.kcpustat_cpu_fetch
      0.00            +2.1        2.06 ± 52%      +1.4        1.38 ± 77%  perf-profile.self.cycles-pp._raw_spin_lock
      0.00            +2.6        2.61 ± 36%      +2.8        2.75 ± 48%  perf-profile.self.cycles-pp.smp_call_function_single
      0.00            +3.7        3.68 ± 37%      +3.7        3.70 ± 64%  perf-profile.self.cycles-pp.mutex_unlock
      0.00            +6.6        6.65 ± 77%      +3.3        3.32 ± 91%  perf-profile.self.cycles-pp.__intel_pmu_enable_all
      1.19 ±  3%     +29.2       30.38 ± 15%     +27.9       29.13 ± 13%  perf-profile.self.cycles-pp.acpi_safe_halt




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-02-13  2:04         ` Oliver Sang
@ 2025-02-14 22:53           ` Yang Shi
  2025-02-18  6:30             ` Oliver Sang
  0 siblings, 1 reply; 35+ messages in thread
From: Yang Shi @ 2025-02-14 22:53 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, linux-kernel, arnd, gregkh, Liam.Howlett,
	lorenzo.stoakes, vbabka, jannh, willy, liushixin2, akpm,
	linux-mm




On 2/12/25 6:04 PM, Oliver Sang wrote:
> hi, Yang Shi,
>
> On Fri, Feb 07, 2025 at 10:10:37AM -0800, Yang Shi wrote:
>> On 2/6/25 12:02 AM, Oliver Sang wrote:
> [...]
>
>>> since we applied your "/dev/zero: make private mapping full anonymous mapping"
>>> patch upon a68d3cbfad like below:
>>>
>>> * 7143ee2391f1e /dev/zero: make private mapping full anonymous mapping
>>> * a68d3cbfade64 memstick: core: fix kernel-doc notation
>>>
>>> so I applied below patch also upon a68d3cbfad.
>>>
>>> we saw big improvement but not that big.
>>>
>>> =========================================================================================
>>> compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
>>>     gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability
>>>
>>> commit:
>>>     a68d3cbfad ("memstick: core: fix kernel-doc notation")
>>>     52ec85cb99  <--- your patch
>>>
>>>
>>> a68d3cbfade64392 52ec85cb99e9b31dc304eae965a
>>> ---------------- ---------------------------
>>>            %stddev     %change         %stddev
>>>                \          |                \
>>>     14364828 ±  4%    +410.6%   73349239 ±  3%  vm-scalability.throughput
>>>
>>> full comparison as below [1] just FYI.
>> Thanks for the update. I stared at the profiling report for a whole day, but
>> I didn't figure out where that 400% lost. I just saw the number of page
>> faults was fewer. And it seems like the reduction of page faults match the
>> 400% loss. So I did more trace and profiling.
>>
>> The test case did the below stuff in a tight loop:
>>    mmap 40K memory from /dev/zero (read only)
>>    read the area
>>
>> So two major factors to the performance: mmap and page fault. The
>> alternative patch did reduce the overhead of mmap to the same level as the
>> original patch.
>>
>> The further perf profiling showed the cost of page fault is higher than the
>> original patch. But the profiling of page fault was interesting:
>>
>> -   44.87%     0.01%  usemem [kernel.kallsyms]                   [k]
>> do_translation_fault
>>     - 44.86% do_translation_fault
>>        - 44.83% do_page_fault
>>           - 44.53% handle_mm_fault
>>                9.04% __handle_mm_fault
>>
>> Page fault consumed 40% of cpu time in handle_mm_fault, but
>> __handle_mm_fault just consumed 9%, I expected it should be the major
>> consumer.
>>
>> So I annotated handle_mm_fault, then found the most time was consumed by
>> lru_gen_enter_fault() -> vma_has_recency() (my kernel has multi-gen LRU
>> enabled):
>>
>>        │     if (vma->vm_file && (vma->vm_file->f_mode & FMODE_NOREUSE))
>>         │     ↓ cbz     x1, b4
>>    0.00 │       ldr     w0, [x1, #12]
>>   99.59 │       eor     x0, x0, #0x800000
>>    0.00 │       ubfx    w0, w0, #23, #1
>>         │     current->in_lru_fault = vma_has_recency(vma);
>>    0.00 │ b4:   ldrh    w1, [x2, #1992]
>>    0.01 │       bfi     w1, w0, #5, #1
>>    0.00 │       strh    w1, [x2, #1992]
>>
>>
>> vma_has_recency() read vma->vm_file->f_mode if vma->vm_file is not NULL. But
>> that load took a long time. So I inspected struct file and saw:
>>
>> struct file {
>>      file_ref_t            f_ref;
>>      spinlock_t            f_lock;
>>      fmode_t                f_mode;
>>      const struct file_operations    *f_op;
>>      ...
>> }
>>
>> The f_mode is in the same cache line with f_ref (my kernel does NOT have
>> spin lock debug enabled). The test case mmap /dev/zero in a tight loop, so
>> the refcount is modified (fget/fput) very frequently, this resulted in
>> somehow false sharing.
>>
>> So I tried the below patch on top of the alternative patch:
>>
>> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
>> index f9157a0c42a5..ba11dc0b1c7c 100644
>> --- a/include/linux/mm_inline.h
>> +++ b/include/linux/mm_inline.h
>> @@ -608,6 +608,9 @@ static inline bool vma_has_recency(struct vm_area_struct
>> *vma)
>>          if (vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))
>>                  return false;
>>
>> +       if (vma_is_anonymous(vma))
>> +               return true;
>> +
>>          if (vma->vm_file && (vma->vm_file->f_mode & FMODE_NOREUSE))
>>                  return false;
>>
>> This made the profiling of page fault look normal:
>>
>>                          - 1.90% do_translation_fault
>>                             - 1.87% do_page_fault
>>                                - 1.49% handle_mm_fault
>>                                   - 1.36% __handle_mm_fault
>>
>> Please try this in your test.
>>
>> But AFAICT I have never seen performance issue reported due to the false
>> sharing of refcount and other fields in struct file. This benchmark stressed
>> this quite badly.
> I applied your above patch upon alternative patch last time, then saw more
> improvement (+445.2% vs a68d3cbfad), but still not that big as in our original
> report.

Thanks for the update. It looks like the problem is still in page 
faults. I did my test on arm64 machine. I also noticed struct file has 
"__randomize_layout", so it may have different layout on x86 than arm64?

The page fault handler may also access other fields of struct file that 
may cause false sharing, for example, accessing f_mapping to read gfp 
flags. This may not be a problem on my machine, but may be more costly 
on yours depending on the real layout of struct file on the machines,

Can you please try the below patch on top of the current patches? Thank 
you so much for your patience.

diff --git a/mm/memory.c b/mm/memory.c
index 539c0f7c6d54..1fa9dbce0f66 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3214,6 +3214,9 @@ static gfp_t __get_fault_gfp_mask(struct 
vm_area_struct *vma)
  {
         struct file *vm_file = vma->vm_file;

+       if (vma_is_anonymous(vma))
+               return GFP_KERNEL;
+
         if (vm_file)
                 return mapping_gfp_mask(vm_file->f_mapping) | __GFP_FS 
| __GFP_IO;

>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
>    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability
>
> commit:
>    a68d3cbfad ("memstick: core: fix kernel-doc notation")
>    52ec85cb99  <--- a68d3cbfad + alternative
>    d4a204fefe  <--- a68d3cbfad + alternative + new patch in vma_has_recency()
>
> a68d3cbfade64392 52ec85cb99e9b31dc304eae965a d4a204fefec91546a317e52ae19
> ---------------- --------------------------- ---------------------------
>           %stddev     %change         %stddev     %change         %stddev
>               \          |                \          |                \
>    14364828 ±  4%    +410.6%   73349239 ±  3%    +445.2%   78318730 ±  4%  vm-scalability.throughput
>
>
> full comparison is as below:
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
>    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability
>
> commit:
>    a68d3cbfad ("memstick: core: fix kernel-doc notation")
>    52ec85cb99  <--- a68d3cbfad + alternative
>    d4a204fefe  <--- a68d3cbfad + alternative + new patch in vma_has_recency()
>
> a68d3cbfade64392 52ec85cb99e9b31dc304eae965a d4a204fefec91546a317e52ae19
> ---------------- --------------------------- ---------------------------
>           %stddev     %change         %stddev     %change         %stddev
>               \          |                \          |                \
>   5.262e+09 ±  3%     -45.0%  2.896e+09 ±  6%     +10.0%  5.791e+09 ±126%  cpuidle..time
>     7924008 ±  3%     -79.3%    1643339 ± 11%     -77.4%    1791703 ± 12%  cpuidle..usage
>     1871164 ±  4%     -22.4%    1452554 ± 12%     -20.9%    1479724 ± 13%  numa-numastat.node3.local_node
>     1952164 ±  3%     -20.1%    1560294 ± 12%     -19.1%    1580192 ± 12%  numa-numastat.node3.numa_hit
>      399.52           -68.2%     126.86           -65.9%     136.26 ± 23%  uptime.boot
>       14507           -15.7%      12232            +5.2%      15256 ± 48%  uptime.idle
>        6.99 ±  3%    +147.9%      17.34 ±  4%    +249.9%      24.47 ± 62%  vmstat.cpu.id
>        1.71          +473.6%       9.79 ±  2%    +437.6%       9.18 ± 19%  vmstat.cpu.us
>       34204 ±  5%     -72.9%       9272 ±  7%     -73.5%       9074 ± 16%  vmstat.system.cs
>      266575           -21.2%     210191           -26.9%     194776 ± 20%  vmstat.system.in
>        3408 ±  5%     -99.8%       8.38 ± 48%     -99.6%      13.38 ± 68%  perf-c2c.DRAM.local
>       18076 ±  3%     -99.8%      32.25 ± 27%     -99.7%      54.12 ± 35%  perf-c2c.DRAM.remote
>        8082 ±  5%     -99.8%      15.50 ± 64%     -99.7%      26.38 ± 52%  perf-c2c.HITM.local
>        6544 ±  6%     -99.8%      13.62 ± 51%     -99.7%      19.25 ± 43%  perf-c2c.HITM.remote
>       14627 ±  4%     -99.8%      29.12 ± 53%     -99.7%      45.62 ± 43%  perf-c2c.HITM.total
>        6.49 ±  3%      +8.8       15.24 ±  5%     +15.9       22.44 ± 71%  mpstat.cpu.all.idle%
>        0.63            -0.3        0.32 ±  4%      -0.3        0.31 ± 22%  mpstat.cpu.all.irq%
>        0.03 ±  2%      +0.2        0.26 ±  2%      +0.2        0.25 ± 20%  mpstat.cpu.all.soft%
>       91.17           -17.0       74.15           -23.6       67.58 ± 20%  mpstat.cpu.all.sys%
>        1.68 ±  2%      +8.3       10.03 ±  2%      +7.7        9.42 ± 19%  mpstat.cpu.all.usr%
>      337.33           -97.4%       8.88 ± 75%     -98.2%       6.00 ± 88%  mpstat.max_utilization.seconds
>      352.76           -77.3%      79.95 ±  2%     -78.5%      75.89 ±  3%  time.elapsed_time
>      352.76           -77.3%      79.95 ±  2%     -78.5%      75.89 ±  3%  time.elapsed_time.max
>      225965 ±  7%     -16.0%     189844 ±  6%     -20.6%     179334 ±  3%  time.involuntary_context_switches
>   9.592e+08 ±  4%     +11.9%  1.074e+09           +11.9%  1.074e+09        time.minor_page_faults
>       20852            -8.8%      19012            -9.8%      18815        time.percent_of_cpu_this_job_got
>       72302           -81.4%      13425 ±  3%     -82.6%      12566 ±  4%  time.system_time
>        1260 ±  3%     +41.0%       1777           +36.2%       1716        time.user_time
>     5393707 ±  5%     -98.4%      86880 ± 17%     -98.2%      96659 ± 22%  time.voluntary_context_switches
>     1609925           -50.3%     800493           -51.0%     788816 ±  2%  meminfo.Active
>     1609925           -50.3%     800493           -51.0%     788816 ±  2%  meminfo.Active(anon)
>      160837 ± 33%     -63.9%      58119 ± 13%     -65.9%      54899 ± 31%  meminfo.AnonHugePages
>     4435665           -18.5%    3614714           -18.7%    3604829        meminfo.Cached
>     1775547           -43.8%     998415           -44.8%     980447 ±  3%  meminfo.Committed_AS
>      148539           -43.7%      83699 ±  4%     -46.1%      80050 ±  2%  meminfo.Mapped
>     4245538 ±  4%     -20.9%    3356561           -28.0%    3056817 ± 20%  meminfo.PageTables
>    14166291 ±  4%      -9.6%   12806082           -15.9%   11919101 ± 19%  meminfo.SUnreclaim
>      929777           -88.2%     109274 ±  3%     -89.4%      98935 ± 15%  meminfo.Shmem
>    14315492 ±  4%      -9.6%   12947821           -15.7%   12061412 ± 19%  meminfo.Slab
>    25676018 ±  3%     +10.9%   28487403           +16.3%   29863951 ±  8%  meminfo.max_used_kB
>       64129 ±  4%    +418.9%     332751 ±  3%    +453.6%     355040 ±  4%  vm-scalability.median
>       45.40 ±  5%   +1961.8        2007 ±  8%   +2094.7        2140 ± 11%  vm-scalability.stddev%
>    14364828 ±  4%    +410.6%   73349239 ±  3%    +445.2%   78318730 ±  4%  vm-scalability.throughput
>      352.76           -77.3%      79.95 ±  2%     -78.5%      75.89 ±  3%  vm-scalability.time.elapsed_time
>      352.76           -77.3%      79.95 ±  2%     -78.5%      75.89 ±  3%  vm-scalability.time.elapsed_time.max
>      225965 ±  7%     -16.0%     189844 ±  6%     -20.6%     179334 ±  3%  vm-scalability.time.involuntary_context_switches
>   9.592e+08 ±  4%     +11.9%  1.074e+09           +11.9%  1.074e+09        vm-scalability.time.minor_page_faults
>       20852            -8.8%      19012            -9.8%      18815        vm-scalability.time.percent_of_cpu_this_job_got
>       72302           -81.4%      13425 ±  3%     -82.6%      12566 ±  4%  vm-scalability.time.system_time
>        1260 ±  3%     +41.0%       1777           +36.2%       1716        vm-scalability.time.user_time
>     5393707 ±  5%     -98.4%      86880 ± 17%     -98.2%      96659 ± 22%  vm-scalability.time.voluntary_context_switches
>   4.316e+09 ±  4%     +11.9%  4.832e+09           +11.9%  4.832e+09        vm-scalability.workload
>      265763 ±  4%     -20.5%     211398 ±  4%     -28.7%     189557 ± 22%  numa-vmstat.node0.nr_page_table_pages
>       31364 ±106%     -85.0%       4690 ±169%     -66.5%      10503 ±106%  numa-vmstat.node0.nr_shmem
>      891094 ±  4%      -8.0%     819697 ±  3%     -17.0%     739565 ± 21%  numa-vmstat.node0.nr_slab_unreclaimable
>       12205 ± 67%     -74.1%       3161 ±199%     -30.0%       8543 ± 98%  numa-vmstat.node1.nr_mapped
>      265546 ±  4%     -21.8%     207742 ±  4%     -27.1%     193704 ± 22%  numa-vmstat.node1.nr_page_table_pages
>       44052 ± 71%     -86.0%       6163 ±161%     -92.9%       3126 ±239%  numa-vmstat.node1.nr_shmem
>      885590 ±  4%      -9.9%     797649 ±  4%     -15.0%     752585 ± 21%  numa-vmstat.node1.nr_slab_unreclaimable
>      264589 ±  4%     -21.2%     208598 ±  4%     -28.0%     190497 ± 20%  numa-vmstat.node2.nr_page_table_pages
>      881598 ±  4%     -10.0%     793829 ±  4%     -15.3%     747142 ± 19%  numa-vmstat.node2.nr_slab_unreclaimable
>      192683 ± 30%     -61.0%      75078 ± 70%     -90.4%      18510 ±122%  numa-vmstat.node3.nr_active_anon
>      286819 ±108%     -93.0%      19993 ± 39%     -88.8%      32096 ± 44%  numa-vmstat.node3.nr_file_pages
>       13124 ± 49%     -92.3%       1006 ± 57%     -96.1%     510.58 ± 55%  numa-vmstat.node3.nr_mapped
>      264499 ±  4%     -22.1%     206135 ±  2%     -30.9%     182777 ± 21%  numa-vmstat.node3.nr_page_table_pages
>      139810 ± 14%     -90.5%      13229 ± 89%     -99.4%     844.61 ± 73%  numa-vmstat.node3.nr_shmem
>      880199 ±  4%     -11.8%     776210 ±  5%     -18.3%     718982 ± 21%  numa-vmstat.node3.nr_slab_unreclaimable
>      192683 ± 30%     -61.0%      75077 ± 70%     -90.4%      18510 ±122%  numa-vmstat.node3.nr_zone_active_anon
>     1951359 ±  3%     -20.1%    1558936 ± 12%     -19.1%    1578968 ± 12%  numa-vmstat.node3.numa_hit
>     1870359 ±  4%     -22.4%    1451195 ± 12%     -21.0%    1478500 ± 13%  numa-vmstat.node3.numa_local
>      402515           -50.3%     200150           -51.0%     197173 ±  2%  proc-vmstat.nr_active_anon
>      170568            +1.9%     173746            +1.7%     173416        proc-vmstat.nr_anon_pages
>     4257257            +0.9%    4296664            +1.7%    4330365        proc-vmstat.nr_dirty_background_threshold
>     8524925            +0.9%    8603835            +1.7%    8671318        proc-vmstat.nr_dirty_threshold
>     1109246           -18.5%     903959           -18.7%     901412        proc-vmstat.nr_file_pages
>    42815276            +0.9%   43210344            +1.7%   43547728        proc-vmstat.nr_free_pages
>       37525           -43.6%      21164 ±  4%     -46.1%      20229 ±  2%  proc-vmstat.nr_mapped
>     1059932 ±  4%     -21.1%     836810           -28.3%     760302 ± 20%  proc-vmstat.nr_page_table_pages
>      232507           -88.2%      27341 ±  3%     -89.4%      24701 ± 15%  proc-vmstat.nr_shmem
>       37297            -5.0%      35436            -4.6%      35576        proc-vmstat.nr_slab_reclaimable
>     3537843 ±  4%      -9.8%    3192506           -16.1%    2966663 ± 20%  proc-vmstat.nr_slab_unreclaimable
>      402515           -50.3%     200150           -51.0%     197173 ±  2%  proc-vmstat.nr_zone_active_anon
>       61931 ±  8%     -83.8%      10023 ± 45%     -76.8%      14345 ± 33%  proc-vmstat.numa_hint_faults
>       15755 ± 21%     -87.1%       2039 ± 97%     -79.9%       3159 ± 84%  proc-vmstat.numa_hint_faults_local
>     6916516 ±  3%      -7.1%    6425430            -7.0%    6429349        proc-vmstat.numa_hit
>     6568542 ±  3%      -7.5%    6077764            -7.4%    6081764        proc-vmstat.numa_local
>      293942 ±  3%     -69.6%      89435 ± 49%     -68.7%      92135 ± 33%  proc-vmstat.numa_pte_updates
>   9.608e+08 ±  4%     +11.8%  1.074e+09           +11.8%  1.074e+09        proc-vmstat.pgfault
>       55981 ±  2%     -63.1%      20641 ±  2%     -61.6%      21497 ± 15%  proc-vmstat.pgreuse
>     1063552 ±  4%     -20.3%     847673 ±  4%     -28.4%     761616 ± 21%  numa-meminfo.node0.PageTables
>     3565610 ±  4%      -8.0%    3279375 ±  3%     -16.8%    2967130 ± 20%  numa-meminfo.node0.SUnreclaim
>      125455 ±106%     -85.2%      18620 ±168%     -66.2%      42381 ±106%  numa-meminfo.node0.Shmem
>     3592377 ±  4%      -7.1%    3336072 ±  4%     -16.2%    3011209 ± 20%  numa-meminfo.node0.Slab
>       48482 ± 67%     -74.3%      12475 ±199%     -30.6%      33629 ± 99%  numa-meminfo.node1.Mapped
>     1062709 ±  4%     -21.7%     831966 ±  4%     -26.7%     778849 ± 22%  numa-meminfo.node1.PageTables
>     3543793 ±  4%     -10.0%    3189589 ±  4%     -14.8%    3018852 ± 21%  numa-meminfo.node1.SUnreclaim
>      176171 ± 71%     -86.0%      24677 ±161%     -92.9%      12510 ±239%  numa-meminfo.node1.Shmem
>     3593431 ±  4%     -10.4%    3220352 ±  4%     -14.6%    3069779 ± 21%  numa-meminfo.node1.Slab
>     1058901 ±  4%     -21.3%     833124 ±  4%     -27.7%     766065 ± 19%  numa-meminfo.node2.PageTables
>     3527862 ±  4%     -10.2%    3168666 ±  5%     -15.0%    2999540 ± 19%  numa-meminfo.node2.SUnreclaim
>     3565750 ±  4%     -10.3%    3200248 ±  5%     -15.2%    3022861 ± 19%  numa-meminfo.node2.Slab
>      770405 ± 30%     -61.0%     300435 ± 70%     -90.4%      74044 ±122%  numa-meminfo.node3.Active
>      770405 ± 30%     -61.0%     300435 ± 70%     -90.4%      74044 ±122%  numa-meminfo.node3.Active(anon)
>      380096 ± 50%     -32.8%     255397 ± 73%     -78.2%      82996 ±115%  numa-meminfo.node3.AnonPages.max
>     1146977 ±108%     -93.0%      80110 ± 40%     -88.8%     128436 ± 44%  numa-meminfo.node3.FilePages
>       52663 ± 47%     -91.6%       4397 ± 56%     -96.0%       2104 ± 52%  numa-meminfo.node3.Mapped
>     6368902 ± 20%     -21.2%    5021246 ±  2%     -27.8%    4597733 ± 18%  numa-meminfo.node3.MemUsed
>     1058539 ±  4%     -22.2%     823061 ±  3%     -30.6%     734757 ± 20%  numa-meminfo.node3.PageTables
>     3522496 ±  4%     -12.1%    3096728 ±  6%     -18.1%    2885117 ± 21%  numa-meminfo.node3.SUnreclaim
>      558943 ± 14%     -90.5%      53054 ± 89%     -99.4%       3423 ± 71%  numa-meminfo.node3.Shmem
>     3557392 ±  4%     -12.3%    3119454 ±  6%     -18.2%    2909118 ± 20%  numa-meminfo.node3.Slab
>        0.82 ±  4%     -39.7%       0.50 ± 12%     -28.2%       0.59 ± 34%  perf-stat.i.MPKI
>   2.714e+10 ±  2%    +185.7%  7.755e+10 ±  6%    +174.8%  7.457e+10 ± 27%  perf-stat.i.branch-instructions
>        0.11 ±  3%      +0.1        0.20 ±  5%      +0.3        0.40 ±121%  perf-stat.i.branch-miss-rate%
>    24932893          +156.6%   63980942 ±  5%    +150.2%   62383567 ± 25%  perf-stat.i.branch-misses
>       64.93           -10.1       54.87 ±  2%     -13.6       51.34 ± 20%  perf-stat.i.cache-miss-rate%
>       34508 ±  4%     -61.4%      13315 ± 10%     -64.1%      12391 ± 25%  perf-stat.i.context-switches
>        7.67           -63.7%       2.79 ±  6%     -67.4%       2.50 ± 14%  perf-stat.i.cpi
>      224605           +10.8%     248972 ±  4%     +11.8%     251127 ±  4%  perf-stat.i.cpu-clock
>      696.35 ±  2%     -57.4%     296.79 ±  3%     -59.8%     279.73 ±  5%  perf-stat.i.cpu-migrations
>       10834 ±  4%     -12.5%       9483 ± 20%     -20.2%       8648 ± 28%  perf-stat.i.cycles-between-cache-misses
>   1.102e+11          +128.5%  2.518e+11 ±  6%    +119.9%  2.423e+11 ± 27%  perf-stat.i.instructions
>        0.14          +198.2%       0.42 ±  5%    +239.7%       0.48 ± 21%  perf-stat.i.ipc
>       24.25 ±  3%    +375.8%     115.36 ±  3%    +353.8%     110.03 ± 26%  perf-stat.i.metric.K/sec
>     2722043 ±  3%    +439.7%   14690226 ±  6%    +418.1%   14103930 ± 27%  perf-stat.i.minor-faults
>     2722043 ±  3%    +439.7%   14690226 ±  6%    +418.1%   14103929 ± 27%  perf-stat.i.page-faults
>      224605           +10.8%     248972 ±  4%     +11.8%     251127 ±  4%  perf-stat.i.task-clock
>        0.81 ±  3%     -52.5%       0.39 ± 14%     -59.6%       0.33 ± 38%  perf-stat.overall.MPKI
>        0.09            -0.0        0.08 ±  2%      -0.0        0.07 ± 37%  perf-stat.overall.branch-miss-rate%
>       64.81            -6.4       58.40           -13.3       51.49 ± 37%  perf-stat.overall.cache-miss-rate%
>        7.24           -56.3%       3.17 ±  3%     -63.8%       2.62 ± 38%  perf-stat.overall.cpi
>        8933 ±  4%      -6.0%       8401 ± 16%     -21.3%       7029 ± 38%  perf-stat.overall.cycles-between-cache-misses
>        0.14          +129.0%       0.32 ±  3%    +112.0%       0.29 ± 38%  perf-stat.overall.ipc
>        9012 ±  2%     -57.5%       3827           -62.8%       3349 ± 37%  perf-stat.overall.path-length
>   2.701e+10 ±  2%    +159.6%  7.012e+10 ±  2%    +117.1%  5.863e+10 ± 43%  perf-stat.ps.branch-instructions
>    24708939          +119.2%   54173035           +81.0%   44726149 ± 43%  perf-stat.ps.branch-misses
>       34266 ±  5%     -73.9%       8949 ±  7%     -77.8%       7599 ± 41%  perf-stat.ps.context-switches
>   7.941e+11            -9.1%  7.219e+11           -27.9%  5.729e+11 ± 44%  perf-stat.ps.cpu-cycles
>      693.54 ±  2%     -68.6%     217.73 ±  5%     -74.1%     179.66 ± 38%  perf-stat.ps.cpu-migrations
>   1.097e+11          +108.1%  2.282e+11 ±  2%     +73.9%  1.907e+11 ± 43%  perf-stat.ps.instructions
>     2710577 ±  3%    +388.7%   13246535 ±  2%    +308.6%   11076222 ± 44%  perf-stat.ps.minor-faults
>     2710577 ±  3%    +388.7%   13246536 ±  2%    +308.6%   11076222 ± 44%  perf-stat.ps.page-faults
>   3.886e+13 ±  2%     -52.4%  1.849e+13           -58.3%  1.619e+13 ± 37%  perf-stat.total.instructions
>    64052898 ±  5%     -96.2%    2460331 ±166%     -93.1%    4432025 ±129%  sched_debug.cfs_rq:/.avg_vruntime.avg
>    95701822 ±  7%     -85.1%   14268127 ±116%     -60.2%   38124846 ±118%  sched_debug.cfs_rq:/.avg_vruntime.max
>    43098762 ±  6%     -96.0%    1715136 ±173%     -93.3%    2867368 ±131%  sched_debug.cfs_rq:/.avg_vruntime.min
>     9223270 ±  9%     -84.2%    1457904 ±122%     -61.0%    3595639 ±113%  sched_debug.cfs_rq:/.avg_vruntime.stddev
>        0.00 ± 22%     -80.1%       0.00 ±185%     -86.8%       0.00 ±173%  sched_debug.cfs_rq:/.h_nr_delayed.avg
>        0.69 ±  8%     -73.0%       0.19 ±185%     -82.0%       0.12 ±173%  sched_debug.cfs_rq:/.h_nr_delayed.max
>        0.05 ± 12%     -76.3%       0.01 ±185%     -84.2%       0.01 ±173%  sched_debug.cfs_rq:/.h_nr_delayed.stddev
>        0.78 ±  2%     -77.0%       0.18 ±130%     -71.9%       0.22 ±107%  sched_debug.cfs_rq:/.h_nr_running.avg
>    43049468 ± 22%     -89.3%    4590302 ±180%     -89.0%    4726833 ±129%  sched_debug.cfs_rq:/.left_deadline.max
>     3836405 ± 37%     -85.6%     550773 ±176%     -77.5%     864733 ±132%  sched_debug.cfs_rq:/.left_deadline.stddev
>    43049467 ± 22%     -89.3%    4590279 ±180%     -89.0%    4726820 ±129%  sched_debug.cfs_rq:/.left_vruntime.max
>     3836405 ± 37%     -85.6%     550772 ±176%     -77.5%     862614 ±132%  sched_debug.cfs_rq:/.left_vruntime.stddev
>    64052901 ±  5%     -96.2%    2460341 ±166%     -93.1%    4432036 ±129%  sched_debug.cfs_rq:/.min_vruntime.avg
>    95701822 ±  7%     -85.1%   14268127 ±116%     -60.2%   38124846 ±118%  sched_debug.cfs_rq:/.min_vruntime.max
>    43098762 ±  6%     -96.0%    1715136 ±173%     -93.3%    2867368 ±131%  sched_debug.cfs_rq:/.min_vruntime.min
>     9223270 ±  9%     -84.2%    1457902 ±122%     -61.0%    3595638 ±113%  sched_debug.cfs_rq:/.min_vruntime.stddev
>        0.77 ±  2%     -77.4%       0.17 ±128%     -72.3%       0.21 ±107%  sched_debug.cfs_rq:/.nr_running.avg
>        1.61 ± 24%    +396.0%       7.96 ± 62%    +355.1%       7.31 ± 52%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
>       86.69          +424.4%     454.62 ± 24%    +400.6%     433.98 ± 26%  sched_debug.cfs_rq:/.removed.runnable_avg.max
>       11.14 ± 13%    +409.8%      56.79 ± 35%    +373.6%      52.77 ± 34%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
>        1.61 ± 24%    +396.0%       7.96 ± 62%    +355.1%       7.31 ± 52%  sched_debug.cfs_rq:/.removed.util_avg.avg
>       86.69          +424.4%     454.62 ± 24%    +400.6%     433.98 ± 26%  sched_debug.cfs_rq:/.removed.util_avg.max
>       11.14 ± 13%    +409.8%      56.79 ± 35%    +373.6%      52.77 ± 34%  sched_debug.cfs_rq:/.removed.util_avg.stddev
>    43049467 ± 22%     -89.3%    4590282 ±180%     -89.0%    4726821 ±129%  sched_debug.cfs_rq:/.right_vruntime.max
>     3836405 ± 37%     -85.6%     550772 ±176%     -77.5%     862614 ±132%  sched_debug.cfs_rq:/.right_vruntime.stddev
>      286633 ± 43%    +262.3%    1038592 ± 36%    +188.3%     826260 ± 58%  sched_debug.cfs_rq:/.runnable_avg.avg
>    34728895 ± 30%    +349.2%   1.56e+08 ± 26%    +293.3%  1.366e+08 ± 60%  sched_debug.cfs_rq:/.runnable_avg.max
>     2845573 ± 30%    +325.9%   12119045 ± 26%    +251.3%    9995202 ± 55%  sched_debug.cfs_rq:/.runnable_avg.stddev
>      769.03           -69.9%     231.86 ± 84%     -66.3%     259.37 ± 72%  sched_debug.cfs_rq:/.util_avg.avg
>        1621 ±  5%     -31.5%       1111 ±  8%     -35.4%       1048 ±  8%  sched_debug.cfs_rq:/.util_avg.max
>      159.12 ±  8%     +22.3%     194.66 ± 12%     +35.0%     214.82 ± 14%  sched_debug.cfs_rq:/.util_avg.stddev
>      724.17 ±  2%     -89.6%      75.66 ±147%     -88.3%      84.74 ±123%  sched_debug.cfs_rq:/.util_est.avg
>        1360 ± 15%     -39.2%     826.88 ± 37%     -29.0%     965.90 ± 48%  sched_debug.cfs_rq:/.util_est.max
>      766944 ±  3%     +18.1%     905901           +21.7%     933047 ±  2%  sched_debug.cpu.avg_idle.avg
>     1067639 ±  5%     +30.0%    1387534 ± 16%     +38.2%    1475131 ± 15%  sched_debug.cpu.avg_idle.max
>      321459 ±  2%     -35.6%     207172 ± 10%     -33.5%     213764 ± 15%  sched_debug.cpu.avg_idle.stddev
>      195573           -72.7%      53401 ± 24%     -68.5%      61507 ± 35%  sched_debug.cpu.clock.avg
>      195596           -72.7%      53442 ± 24%     -68.5%      61565 ± 35%  sched_debug.cpu.clock.max
>      195548           -72.7%      53352 ± 24%     -68.6%      61431 ± 35%  sched_debug.cpu.clock.min
>      194424           -72.6%      53229 ± 24%     -68.5%      61304 ± 35%  sched_debug.cpu.clock_task.avg
>      194608           -72.6%      53383 ± 24%     -68.4%      61478 ± 34%  sched_debug.cpu.clock_task.max
>      181834           -77.5%      40964 ± 31%     -73.0%      49012 ± 43%  sched_debug.cpu.clock_task.min
>        4241 ±  2%     -80.6%     821.65 ±142%     -77.1%     971.85 ±116%  sched_debug.cpu.curr->pid.avg
>        9799 ±  2%     -55.4%       4365 ± 17%     -51.6%       4747 ± 22%  sched_debug.cpu.curr->pid.max
>        1365 ± 10%     -48.0%     709.44 ±  5%     -39.9%     820.19 ± 24%  sched_debug.cpu.curr->pid.stddev
>      537665 ±  4%     +31.2%     705318 ± 14%     +44.0%     774261 ± 15%  sched_debug.cpu.max_idle_balance_cost.max
>        3119 ± 56%    +579.1%      21184 ± 39%   +1048.3%      35821 ± 65%  sched_debug.cpu.max_idle_balance_cost.stddev
>        0.78 ±  2%     -76.3%       0.18 ±135%     -72.0%       0.22 ±114%  sched_debug.cpu.nr_running.avg
>       25773 ±  5%     -96.1%       1007 ± 41%     -95.2%       1246 ± 53%  sched_debug.cpu.nr_switches.avg
>       48669 ± 10%     -76.5%      11448 ± 13%     -66.5%      16288 ± 70%  sched_debug.cpu.nr_switches.max
>       19006 ±  7%     -98.6%     258.81 ± 64%     -98.4%     311.75 ± 58%  sched_debug.cpu.nr_switches.min
>        4142 ±  8%     -66.3%       1396 ± 17%     -58.3%       1726 ± 51%  sched_debug.cpu.nr_switches.stddev
>        0.07 ± 23%     -92.9%       0.01 ± 41%     -94.3%       0.00 ± 46%  sched_debug.cpu.nr_uninterruptible.avg
>      240.19 ± 16%     -82.1%      42.94 ± 41%     -84.0%      38.50 ± 19%  sched_debug.cpu.nr_uninterruptible.max
>      -77.92           -88.1%      -9.25           -84.9%     -11.77        sched_debug.cpu.nr_uninterruptible.min
>       37.87 ±  5%     -85.8%       5.36 ± 13%     -85.3%       5.57 ±  5%  sched_debug.cpu.nr_uninterruptible.stddev
>      195549           -72.7%      53356 ± 24%     -68.6%      61438 ± 35%  sched_debug.cpu_clk
>      194699           -73.0%      52506 ± 25%     -68.9%      60588 ± 35%  sched_debug.ktime
>        0.00          -100.0%       0.00           -62.5%       0.00 ±264%  sched_debug.rt_rq:.rt_nr_running.avg
>        0.17          -100.0%       0.00           -62.5%       0.06 ±264%  sched_debug.rt_rq:.rt_nr_running.max
>        0.01          -100.0%       0.00           -62.5%       0.00 ±264%  sched_debug.rt_rq:.rt_nr_running.stddev
>      196368           -72.4%      54191 ± 24%     -68.3%      62327 ± 34%  sched_debug.sched_clk
>        0.17 ±142%    -100.0%       0.00           -97.8%       0.00 ±264%  perf-sched.sch_delay.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>        0.19 ± 34%     -51.3%       0.09 ± 37%     -76.7%       0.04 ±110%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        0.14 ± 55%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>        0.14 ± 73%     -82.5%       0.03 ±168%     -64.1%       0.05 ±177%  perf-sched.sch_delay.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>        0.11 ± 59%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.04 ±132%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>        0.02 ± 31%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        0.00 ±223%  +51950.0%       0.26 ±212%   +6325.0%       0.03 ±124%  perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
>        0.25 ± 59%    -100.0%       0.00           -64.9%       0.09 ±253%  perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        0.12 ±145%     -99.1%       0.00 ±141%     -99.5%       0.00 ±264%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>        0.04 ± 55%     +99.5%       0.08 ±254%     -92.0%       0.00 ±103%  perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        0.25 ± 41%     -81.6%       0.05 ± 69%     -94.4%       0.01 ± 69%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
>        0.11 ± 59%     -87.1%       0.01 ±198%     -96.2%       0.00 ±128%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
>        0.40 ± 50%     -97.8%       0.01 ± 30%     -97.2%       0.01 ± 45%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        2.25 ±138%     -99.6%       0.01 ±  7%     -63.9%       0.81 ±261%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>        0.32 ±104%     -97.3%       0.01 ± 38%     -97.7%       0.01 ± 61%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>        0.12 ± 21%     -61.6%       0.04 ±233%     -85.7%       0.02 ±190%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.01 ± 12%     -34.9%       0.01 ± 18%    +722.2%       0.07 ±251%  perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        0.01 ± 42%     -41.4%       0.00 ± 72%     -76.6%       0.00 ± 77%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
>        0.01 ± 20%    -100.0%       0.00           -96.4%       0.00 ±264%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>        0.19 ±185%     -95.6%       0.01 ± 44%    +266.3%       0.70 ±261%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>        0.07 ± 20%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        0.26 ± 17%     -98.8%       0.00 ± 10%     -98.9%       0.00 ± 39%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>        0.03 ± 51%     -69.7%       0.01 ± 67%     -83.7%       0.01 ± 15%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        0.01 ± 55%    +721.9%       0.10 ± 29%   +1608.3%       0.20 ±227%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.01 ±128%     -83.6%       0.00 ± 20%     -86.2%       0.00 ± 43%  perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>        0.06 ± 31%   +1921.5%       1.23 ±165%  +13539.3%       8.30 ±201%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1.00 ±151%    -100.0%       0.00           -99.6%       0.00 ±264%  perf-sched.sch_delay.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>       25.45 ± 94%     -98.6%       0.36 ± 61%     -99.4%       0.15 ±143%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        4.56 ± 67%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>        3.55 ± 97%     -98.9%       0.04 ±189%     -98.5%       0.05 ±177%  perf-sched.sch_delay.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>        2.13 ± 67%     -77.2%       0.49 ± 56%     -88.8%       0.24 ±147%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>        3.16 ± 78%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.30 ±159%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>        1.61 ±100%     -76.7%       0.38 ± 72%     -91.7%       0.13 ±145%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.03 ± 86%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        0.00 ±223%  +3.2e+06%      15.79 ±259%  +44450.0%       0.22 ±132%  perf-sched.sch_delay.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
>        3.09 ± 45%    -100.0%       0.00           -94.6%       0.17 ±259%  perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        3.51 ± 21%     -86.1%       0.49 ± 72%     -90.7%       0.33 ±127%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>        0.83 ±160%     -99.7%       0.00 ±141%     -99.9%       0.00 ±264%  perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>        0.09 ± 31%    +179.7%       0.25 ±258%     -91.5%       0.01 ±132%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        3.59 ± 11%     -92.0%       0.29 ±165%     -99.2%       0.03 ±118%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
>        1.60 ± 69%     -95.7%       0.07 ±243%     -99.0%       0.02 ±210%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
>        0.81 ± 43%     -98.5%       0.01 ± 43%     -98.3%       0.01 ± 41%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        1.02 ± 88%     -98.1%       0.02 ± 47%     -98.7%       0.01 ± 71%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>        9.68 ± 32%     -92.2%       0.76 ± 72%     -78.1%       2.12 ±187%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        0.01 ± 49%     -51.9%       0.00 ± 72%     -80.8%       0.00 ± 77%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
>       12.26 ±109%     -92.9%       0.87 ±101%     -86.9%       1.61 ±225%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        5.60 ±139%     -97.6%       0.13 ±132%     -99.3%       0.04 ±255%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
>        0.03 ±106%    -100.0%       0.00           -99.1%       0.00 ±264%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>        2.11 ± 61%     -85.5%       0.31 ± 85%     -96.0%       0.08 ±124%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>       37.84 ± 47%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        4.68 ± 36%     -99.8%       0.01 ± 65%     -99.8%       0.01 ± 77%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>        7.56 ± 74%     -51.5%       3.67 ±147%     -99.8%       0.02 ± 54%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        0.36 ±186%     -96.3%       0.01 ± 90%     -97.9%       0.01 ± 59%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>       97903 ±  4%     -38.3%      60433 ± 29%     -71.4%      27976 ±109%  perf-sched.total_wait_and_delay.count.ms
>        3.97 ±  6%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>      302.41 ±  5%     -27.4%     219.54 ± 14%     -10.8%     269.81 ± 60%  perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1.48 ±  6%     -90.9%       0.14 ± 79%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>      327.16 ±  9%     -46.6%     174.81 ± 24%     -38.4%     201.64 ± 71%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      369.37 ±  2%     -75.3%      91.05 ± 35%     -77.7%      82.29 ±119%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>        0.96 ±  6%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>      187.66          +120.6%     413.97 ± 14%    +116.9%     407.06 ± 43%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        1831 ±  9%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        6.17 ± 45%     -79.7%       1.25 ±142%     -91.9%       0.50 ±264%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       14.33 ±  5%     +13.4%      16.25 ± 23%     -58.1%       6.00 ± 66%  perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>      810.00 ± 10%     -38.0%     502.25 ± 92%    -100.0%       0.00        perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>       40.50 ±  8%    +245.7%     140.00 ± 23%     +72.5%      69.88 ± 91%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>       13.17 ±  2%    +624.4%      95.38 ± 19%    +347.2%      58.88 ± 78%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       73021 ±  3%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>       11323 ±  3%     -75.9%       2725 ± 28%     -86.4%       1536 ± 34%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        1887 ± 45%     -96.1%      73.88 ± 78%     -98.5%      28.75 ±120%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        1238           -34.5%     811.25 ± 13%     -58.6%     512.62 ± 49%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       35.19 ± 57%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>       20.79 ± 19%     -95.9%       0.84 ± 93%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        1240 ± 20%     -14.4%       1062 ± 10%     -25.2%     928.21 ± 40%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      500.34           +31.2%     656.38 ± 39%     -15.0%     425.46 ± 61%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       58.83 ± 39%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        1237 ± 34%    +151.7%       3114 ± 25%     +51.6%       1876 ± 64%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       49.27 ±119%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
>       58.17 ±187%    -100.0%       0.00          -100.0%       0.00 ±264%  perf-sched.wait_time.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>        3.78 ±  5%     -97.6%       0.09 ± 37%     -98.8%       0.04 ±111%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        2.99 ±  4%     +15.4%       3.45 ± 10%     +28.8%       3.85 ± 54%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>        3.92 ±  5%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>        4.71 ±  8%     -99.5%       0.02 ±170%     -98.9%       0.05 ±177%  perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>        1.67 ± 20%     -92.7%       0.12 ± 30%     -96.8%       0.05 ±130%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>        2.10 ± 27%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.01 ± 44%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>        1.67 ± 21%     -94.3%       0.10 ± 35%     -97.0%       0.05 ±137%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.04 ±133%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>       67.14 ± 73%     +75.6%     117.89 ±108%     -92.8%       4.82 ±259%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        1.65 ± 67%     -95.8%       0.07 ±128%     -99.2%       0.01 ±175%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
>        2.30 ± 14%     -95.5%       0.10 ± 42%     -96.4%       0.08 ±108%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>        2.00 ± 74%   +2917.4%      60.44 ± 33%   +1369.3%      29.43 ± 74%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>       29.19 ±  5%     -38.5%      17.96 ± 28%     -49.0%      14.89 ± 54%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
>        0.37 ± 30%   +5524.5%      20.95 ± 30%   +2028.0%       7.93 ±117%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      302.40 ±  5%     -27.4%     219.53 ± 14%     -10.8%     269.75 ± 60%  perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1.40 ±  6%     -92.7%       0.10 ± 18%     -95.4%       0.06 ±109%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        0.72 ±220%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>      326.84 ±  9%     -46.6%     174.54 ± 24%     -38.6%     200.64 ± 72%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      369.18 ±  2%     -75.3%      91.04 ± 35%     -74.2%      95.16 ± 98%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>        0.89 ±  6%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>      187.58          +120.6%     413.77 ± 14%    +116.9%     406.79 ± 43%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        2.36 ± 29%   +1759.6%      43.80 ± 33%   +3763.5%      90.99 ±115%  perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.01 ±156%     -97.9%       0.00 ±264%     -98.9%       0.00 ±264%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>      750.01           -14.5%     641.50 ± 14%     -41.1%     442.13 ± 58%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      340.69 ±135%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
>      535.09 ±128%    -100.0%       0.00          -100.0%       0.00 ±264%  perf-sched.wait_time.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>       22.04 ± 32%     -98.4%       0.36 ± 61%     -99.3%       0.15 ±143%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>       13.57 ± 17%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>       13.54 ± 10%     -99.7%       0.04 ±189%     -99.6%       0.05 ±177%  perf-sched.wait_time.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>       10.17 ± 19%     -95.2%       0.49 ± 56%     -97.7%       0.24 ±147%  perf-sched.wait_time.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>       11.35 ± 25%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.01 ± 32%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>       10.62 ±  9%     -96.5%       0.38 ± 72%     -98.7%       0.13 ±145%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.20 ±199%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        1559 ± 64%     -92.3%     120.30 ±109%     -99.4%       9.63 ±259%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        6.93 ± 53%     -98.1%       0.13 ± 99%     -99.8%       0.01 ±175%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
>       14.42 ± 22%     -96.6%       0.49 ± 72%     -97.7%       0.33 ±127%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>        4.00 ± 74%  +19182.5%     772.23 ± 40%   +7266.0%     295.00 ± 92%  perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>       10.75 ± 98%   +6512.2%     710.88 ± 56%   +2526.4%     282.37 ±130%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       15.80 ±  8%     -95.2%       0.76 ± 72%     -86.6%       2.12 ±187%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>       11.64 ± 61%     -98.9%       0.13 ±132%     -99.7%       0.04 ±255%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
>        2.94 ±213%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>      175.70 ±210%     -64.6%      62.26 ±263%     -99.8%       0.31 ±116%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>        1240 ± 20%     -14.3%       1062 ± 10%     -25.2%     928.20 ± 40%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      500.11           +31.2%     656.37 ± 39%      -2.4%     487.96 ± 41%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       32.65 ± 33%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        1237 ± 34%    +151.6%       3113 ± 25%     +49.0%       1844 ± 63%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        0.36 ±190%     -97.2%       0.01 ±127%     -98.5%       0.01 ± 88%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>       95.59           -95.6        0.00           -95.6        0.00        perf-profile.calltrace.cycles-pp.__mmap
>       95.54           -95.5        0.00           -95.5        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       95.54           -95.5        0.00           -95.5        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
>       94.54           -94.5        0.00           -94.5        0.00        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       94.46           -94.0        0.41 ±138%     -93.9        0.57 ±103%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       94.14           -93.7        0.40 ±136%     -93.6        0.50 ± 79%  perf-profile.calltrace.cycles-pp.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
>       93.79           -93.5        0.31 ±134%     -93.2        0.58 ±111%  perf-profile.calltrace.cycles-pp.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
>       93.40           -93.4        0.00           -93.4        0.00        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>       93.33           -93.3        0.00           -93.3        0.00        perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma
>       93.44           -93.3        0.14 ±264%     -93.4        0.00        perf-profile.calltrace.cycles-pp.down_write.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap
>       94.45           -93.0        1.42 ± 60%     -92.9        1.51 ± 51%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       94.25           -92.9        1.33 ± 61%     -92.8        1.43 ± 57%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>       92.89           -92.9        0.00           -92.9        0.00        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file
>        0.00            +0.3        0.29 ±129%      +1.1        1.10 ± 27%  perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
>        0.00            +0.3        0.32 ±129%      +1.7        1.70 ± 39%  perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
>        0.00            +0.3        0.32 ±129%      +1.7        1.74 ± 40%  perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write
>        0.00            +0.5        0.49 ± 78%      +1.7        1.74 ± 40%  perf-profile.calltrace.cycles-pp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        0.00            +1.1        1.09 ± 33%      +0.4        0.44 ±177%  perf-profile.calltrace.cycles-pp.dup_mmap.dup_mm.copy_process.kernel_clone.__do_sys_clone
>        0.00            +1.3        1.32 ± 54%      +1.4        1.36 ± 33%  perf-profile.calltrace.cycles-pp.filp_close.put_files_struct.do_exit.do_group_exit.get_signal
>        0.00            +1.3        1.32 ± 54%      +1.4        1.36 ± 33%  perf-profile.calltrace.cycles-pp.put_files_struct.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
>        0.00            +1.4        1.37 ± 49%      +1.8        1.77 ± 50%  perf-profile.calltrace.cycles-pp.setlocale
>        0.00            +1.4        1.39 ± 70%      +1.8        1.80 ± 48%  perf-profile.calltrace.cycles-pp.seq_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +1.4        1.39 ± 70%      +1.8        1.80 ± 48%  perf-profile.calltrace.cycles-pp.seq_read_iter.seq_read.vfs_read.ksys_read.do_syscall_64
>        0.00            +1.5        1.55 ± 63%      +1.6        1.62 ± 37%  perf-profile.calltrace.cycles-pp.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
>        0.00            +1.6        1.60 ± 57%      +1.6        1.63 ± 87%  perf-profile.calltrace.cycles-pp.swevent_hlist_put_cpu.sw_perf_event_destroy._free_event.perf_event_release_kernel.perf_release
>        0.00            +1.6        1.64 ± 47%      +0.9        0.90 ±101%  perf-profile.calltrace.cycles-pp.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
>        0.00            +1.6        1.64 ± 47%      +1.0        1.02 ± 83%  perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry
>        0.00            +1.6        1.65 ± 43%      +1.1        1.15 ± 76%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +1.8        1.76 ± 44%      +1.1        1.15 ± 76%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +1.9        1.93 ± 26%      +1.1        1.11 ±127%  perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
>        0.00            +2.0        2.04 ± 66%      +3.6        3.65 ± 42%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>        0.00            +2.1        2.12 ± 58%      +3.6        3.65 ± 42%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>        0.00            +2.1        2.12 ± 58%      +3.6        3.65 ± 42%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
>        0.00            +2.1        2.12 ± 58%      +3.7        3.71 ± 40%  perf-profile.calltrace.cycles-pp.open64
>        0.00            +2.2        2.16 ± 44%      +1.6        1.62 ± 37%  perf-profile.calltrace.cycles-pp.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
>        0.00            +2.2        2.20 ± 74%      +3.6        3.65 ± 42%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>        0.00            +2.2        2.23 ± 33%      +1.4        1.40 ± 99%  perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +2.3        2.34 ±103%      +5.1        5.09 ± 64%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.perf_c2c__record.run_builtin.handle_internal_command
>        0.00            +2.3        2.34 ±103%      +5.1        5.09 ± 64%  perf-profile.calltrace.cycles-pp.cmd_record.perf_c2c__record.run_builtin.handle_internal_command.main
>        0.00            +2.3        2.34 ±103%      +5.1        5.09 ± 64%  perf-profile.calltrace.cycles-pp.perf_c2c__record.run_builtin.handle_internal_command.main
>        0.00            +2.4        2.37 ± 36%      +1.9        1.93 ± 35%  perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
>        0.00            +2.5        2.48 ± 32%      +2.4        2.45 ± 60%  perf-profile.calltrace.cycles-pp.get_cpu_sleep_time_us.get_idle_time.uptime_proc_show.seq_read_iter.vfs_read
>        0.00            +2.5        2.50 ± 45%      +1.2        1.21 ± 73%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>        0.00            +2.5        2.54 ± 47%      +1.3        1.28 ± 61%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
>        0.00            +2.5        2.54 ± 47%      +1.3        1.28 ± 61%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
>        0.00            +2.6        2.62 ± 35%      +1.6        1.57 ± 91%  perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +2.6        2.62 ± 35%      +1.6        1.57 ± 91%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +2.6        2.62 ± 35%      +1.6        1.57 ± 91%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +2.6        2.62 ± 35%      +1.6        1.57 ± 91%  perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +2.7        2.67 ± 54%      +2.6        2.59 ± 40%  perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common
>        0.00            +2.7        2.68 ± 35%      +3.0        3.02 ± 45%  perf-profile.calltrace.cycles-pp.get_idle_time.uptime_proc_show.seq_read_iter.vfs_read.ksys_read
>        0.00            +2.8        2.77 ± 33%      +4.2        4.17 ± 35%  perf-profile.calltrace.cycles-pp.uptime_proc_show.seq_read_iter.vfs_read.ksys_read.do_syscall_64
>        0.00            +2.8        2.82 ± 32%      +1.8        1.83 ± 85%  perf-profile.calltrace.cycles-pp._Fork
>        0.00            +2.8        2.83 ± 48%      +2.6        2.59 ± 40%  perf-profile.calltrace.cycles-pp.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve
>        0.00            +2.8        2.83 ± 48%      +2.7        2.68 ± 42%  perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
>        0.00            +2.8        2.84 ± 45%      +1.2        1.21 ± 73%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>        0.00            +2.8        2.84 ± 45%      +1.2        1.21 ± 73%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
>        0.00            +2.9        2.89 ± 39%      +3.1        3.14 ± 39%  perf-profile.calltrace.cycles-pp.event_function_call.perf_event_release_kernel.perf_release.__fput.task_work_run
>        0.00            +2.9        2.89 ± 39%      +3.1        3.14 ± 39%  perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_event_release_kernel.perf_release.__fput
>        0.00            +3.1        3.10 ± 64%      +0.9        0.91 ±264%  perf-profile.calltrace.cycles-pp.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.1        3.10 ± 64%      +0.9        0.91 ±264%  perf-profile.calltrace.cycles-pp.seq_read_iter.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64
>        0.00            +3.1        3.13 ± 33%      +1.7        1.68 ± 77%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault
>        0.00            +3.2        3.18 ± 37%      +4.3        4.31 ± 34%  perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.2        3.20 ± 28%      +3.0        3.02 ± 73%  perf-profile.calltrace.cycles-pp.mutex_unlock.sw_perf_event_destroy._free_event.perf_event_release_kernel.perf_release
>        0.00            +3.2        3.24 ± 39%      +2.8        2.85 ± 49%  perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.2        3.24 ± 36%      +2.0        2.00 ± 56%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.2        3.24 ± 36%      +2.0        2.00 ± 56%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
>        0.00            +3.2        3.24 ± 36%      +2.0        2.00 ± 56%  perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.2        3.24 ± 36%      +2.0        2.00 ± 56%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.8        3.85 ± 39%      +3.3        3.25 ± 47%  perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
>        0.00            +3.8        3.85 ± 39%      +3.3        3.25 ± 47%  perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
>        0.00            +3.8        3.85 ± 39%      +3.3        3.25 ± 47%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
>        0.00            +3.8        3.85 ± 39%      +3.3        3.25 ± 47%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve
>        0.00            +3.8        3.85 ± 39%      +3.3        3.29 ± 47%  perf-profile.calltrace.cycles-pp.execve
>        0.00            +4.0        4.04 ± 43%      +5.2        5.21 ± 49%  perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +4.0        4.04 ± 43%      +5.2        5.21 ± 49%  perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
>        0.00            +4.1        4.10 ± 30%      +2.6        2.56 ± 28%  perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
>        0.00            +4.2        4.18 ± 31%      +2.8        2.82 ± 21%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput
>        0.00            +4.2        4.18 ± 31%      +2.8        2.82 ± 21%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
>        0.00            +4.2        4.20 ± 28%      +2.7        2.68 ± 34%  perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
>        0.00            +4.2        4.25 ± 65%      +8.0        7.98 ± 43%  perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
>        0.00            +4.3        4.27 ± 26%      +3.2        3.23 ± 34%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>        0.00            +4.3        4.30 ± 22%      +3.9        3.95 ± 32%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.get_signal
>        0.00            +4.3        4.30 ± 22%      +3.9        3.95 ± 32%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
>        0.00            +4.5        4.46 ± 59%      +8.1        8.07 ± 42%  perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +4.6        4.57 ± 58%      +8.1        8.07 ± 42%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen
>        0.00            +4.7        4.68 ± 55%      +8.1        8.12 ± 43%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn
>        0.00            +4.7        4.68 ± 55%      +8.1        8.12 ± 43%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn.perf_mmap__push
>        0.00            +4.7        4.68 ± 55%      +8.2        8.16 ± 44%  perf-profile.calltrace.cycles-pp.write.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist
>        0.00            +4.7        4.68 ± 55%      +8.4        8.39 ± 39%  perf-profile.calltrace.cycles-pp.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record
>        0.00            +4.7        4.68 ± 55%      +8.6        8.61 ± 38%  perf-profile.calltrace.cycles-pp.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
>        0.00            +4.9        4.90 ± 57%     +10.3       10.28 ± 65%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>        0.00            +4.9        4.92 ± 26%      +4.6        4.56 ± 47%  perf-profile.calltrace.cycles-pp.sw_perf_event_destroy._free_event.perf_event_release_kernel.perf_release.__fput
>        0.00            +5.0        4.99 ±100%      +2.6        2.64 ±101%  perf-profile.calltrace.cycles-pp.__intel_pmu_enable_all.perf_rotate_context.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt
>        0.00            +5.0        4.99 ±100%      +2.6        2.64 ±101%  perf-profile.calltrace.cycles-pp.perf_rotate_context.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
>        0.00            +5.1        5.08 ±102%      +2.6        2.64 ±101%  perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
>        0.00            +5.1        5.14 ± 28%      +6.0        6.01 ± 41%  perf-profile.calltrace.cycles-pp.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin
>        0.00            +5.1        5.14 ± 28%      +6.2        6.16 ± 39%  perf-profile.calltrace.cycles-pp.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin.handle_internal_command
>        0.00            +5.4        5.43 ± 25%      +5.0        4.97 ± 45%  perf-profile.calltrace.cycles-pp._free_event.perf_event_release_kernel.perf_release.__fput.task_work_run
>        0.00            +5.8        5.82 ± 94%      +4.2        4.21 ± 49%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
>        0.00            +5.8        5.82 ± 94%      +4.3        4.35 ± 53%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry
>        0.00            +6.1        6.07 ± 90%      +4.3        4.32 ± 58%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
>        0.00            +6.6        6.62 ± 24%      +7.0        6.99 ± 41%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.run_builtin.handle_internal_command.main
>        0.00            +6.6        6.62 ± 24%      +7.0        6.99 ± 41%  perf-profile.calltrace.cycles-pp.cmd_record.run_builtin.handle_internal_command.main
>        0.00            +6.8        6.76 ± 18%      +5.2        5.23 ± 25%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
>        0.00            +7.6        7.56 ± 76%      +6.0        5.99 ± 38%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
>        0.00            +8.0        8.03 ± 27%      +7.4        7.37 ± 52%  perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.00            +8.0        8.03 ± 27%      +7.4        7.37 ± 52%  perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.00            +8.0        8.05 ± 68%      +6.3        6.27 ± 37%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
>        0.00            +8.1        8.13 ± 28%      +7.4        7.37 ± 52%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.00            +8.1        8.13 ± 28%      +7.4        7.37 ± 52%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
>        0.00            +8.1        8.13 ± 28%      +7.4        7.37 ± 52%  perf-profile.calltrace.cycles-pp.read
>        0.00            +9.1        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.calltrace.cycles-pp.handle_internal_command.main
>        0.00            +9.1        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.calltrace.cycles-pp.main
>        0.00            +9.1        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.calltrace.cycles-pp.run_builtin.handle_internal_command.main
>        0.00            +9.3        9.26 ± 30%      +9.0        8.96 ± 31%  perf-profile.calltrace.cycles-pp.perf_event_release_kernel.perf_release.__fput.task_work_run.do_exit
>        0.00            +9.3        9.26 ± 30%      +9.0        8.96 ± 31%  perf-profile.calltrace.cycles-pp.perf_release.__fput.task_work_run.do_exit.do_group_exit
>        0.00           +10.1       10.14 ± 28%     +10.0       10.04 ± 34%  perf-profile.calltrace.cycles-pp.__fput.task_work_run.do_exit.do_group_exit.get_signal
>        0.00           +10.2       10.23 ± 27%     +10.7       10.65 ± 35%  perf-profile.calltrace.cycles-pp.task_work_run.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
>        0.00           +11.0       10.98 ± 55%     +13.0       13.00 ± 27%  perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
>        0.00           +20.6       20.64 ± 30%     +19.5       19.49 ± 43%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00           +20.6       20.64 ± 30%     +19.5       19.49 ± 43%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>        1.21 ±  3%     +36.6       37.80 ± 12%     +34.1       35.32 ± 11%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
>        1.21 ±  3%     +36.6       37.80 ± 12%     +34.4       35.62 ± 11%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>        1.22 ±  3%     +36.8       38.00 ± 13%     +34.8       36.05 ± 11%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>        1.22 ±  3%     +36.9       38.10 ± 13%     +34.8       36.05 ± 11%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
>        1.22 ±  3%     +36.9       38.10 ± 13%     +34.8       36.05 ± 11%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
>        1.21 ±  3%     +37.2       38.43 ± 11%     +34.2       35.40 ±  8%  perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
>        1.21 ±  3%     +37.2       38.43 ± 11%     +34.2       35.40 ±  8%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>        1.21 ±  3%     +37.3       38.54 ± 12%     +34.7       35.87 ± 10%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
>        1.22 ±  3%     +37.6       38.84 ± 12%     +35.4       36.60 ± 11%  perf-profile.calltrace.cycles-pp.common_startup_64
>        2.19 ±  3%     +53.9       56.10 ± 19%     +48.4       50.63 ± 13%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
>       95.60           -95.2        0.41 ±138%     -94.9        0.72 ± 95%  perf-profile.children.cycles-pp.__mmap
>       94.14           -93.7        0.49 ±130%     -92.9        1.21 ± 33%  perf-profile.children.cycles-pp.__mmap_new_vma
>       93.79           -93.5        0.31 ±134%     -93.1        0.71 ± 78%  perf-profile.children.cycles-pp.vma_link_file
>       93.40           -93.4        0.00           -93.4        0.00        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
>       93.33           -93.3        0.00           -93.3        0.00        perf-profile.children.cycles-pp.rwsem_optimistic_spin
>       94.55           -93.1        1.42 ± 60%     -93.0        1.55 ± 50%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
>       92.91           -92.9        0.00           -92.9        0.00        perf-profile.children.cycles-pp.osq_lock
>       93.44           -92.7        0.75 ±109%     -93.4        0.06 ±264%  perf-profile.children.cycles-pp.down_write
>       94.46           -92.6        1.84 ± 34%     -92.0        2.48 ± 28%  perf-profile.children.cycles-pp.vm_mmap_pgoff
>       94.45           -92.6        1.84 ± 34%     -92.0        2.48 ± 28%  perf-profile.children.cycles-pp.do_mmap
>       94.25           -92.6        1.66 ± 37%     -91.9        2.40 ± 30%  perf-profile.children.cycles-pp.__mmap_region
>       95.58           -44.8       50.78 ± 11%     -42.8       52.76 ± 11%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>       95.58           -44.8       50.78 ± 11%     -42.8       52.76 ± 11%  perf-profile.children.cycles-pp.do_syscall_64
>        0.00            +0.1        0.09 ±264%      +1.0        0.96 ± 46%  perf-profile.children.cycles-pp.kcpustat_cpu_fetch
>        0.25 ±  3%      +0.2        0.45 ±133%      +0.7        0.92 ± 41%  perf-profile.children.cycles-pp.vma_interval_tree_insert
>        0.00            +0.3        0.29 ±129%      +1.2        1.16 ± 26%  perf-profile.children.cycles-pp.do_open
>        0.00            +0.3        0.32 ±129%      +1.8        1.79 ± 43%  perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
>        0.00            +0.3        0.32 ±129%      +1.8        1.83 ± 44%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
>        0.00            +0.5        0.49 ± 78%      +1.8        1.83 ± 44%  perf-profile.children.cycles-pp.shmem_write_begin
>        0.00            +1.1        1.09 ± 33%      +0.5        0.48 ±160%  perf-profile.children.cycles-pp.dup_mmap
>        0.00            +1.1        1.11 ±106%      +1.6        1.60 ± 54%  perf-profile.children.cycles-pp.__open64_nocancel
>        0.00            +1.1        1.15 ±102%      +1.2        1.16 ± 86%  perf-profile.children.cycles-pp.evlist_cpu_iterator__next
>        0.00            +1.3        1.32 ± 54%      +1.4        1.36 ± 33%  perf-profile.children.cycles-pp.filp_close
>        0.00            +1.3        1.32 ± 54%      +1.5        1.47 ± 29%  perf-profile.children.cycles-pp.put_files_struct
>        0.00            +1.4        1.37 ± 49%      +1.8        1.77 ± 50%  perf-profile.children.cycles-pp.setlocale
>        0.00            +1.4        1.39 ± 70%      +1.8        1.80 ± 48%  perf-profile.children.cycles-pp.seq_read
>        0.00            +1.5        1.55 ± 63%      +1.7        1.75 ± 30%  perf-profile.children.cycles-pp.do_read_fault
>        0.00            +1.7        1.66 ± 76%      +0.9        0.91 ± 44%  perf-profile.children.cycles-pp.event_function
>        0.00            +1.7        1.66 ± 76%      +0.9        0.91 ± 44%  perf-profile.children.cycles-pp.remote_function
>        0.00            +1.7        1.70 ± 71%      +1.5        1.53 ± 73%  perf-profile.children.cycles-pp.lookup_fast
>        0.00            +1.7        1.73 ± 53%      +1.4        1.40 ± 77%  perf-profile.children.cycles-pp.swevent_hlist_put_cpu
>        0.04 ± 44%      +1.8        1.83 ± 96%      +2.4        2.47 ± 44%  perf-profile.children.cycles-pp.__schedule
>        0.00            +1.9        1.93 ± 26%      +1.1        1.15 ±120%  perf-profile.children.cycles-pp.dup_mm
>        0.03 ± 70%      +2.0        1.99 ± 36%      +1.2        1.23 ± 81%  perf-profile.children.cycles-pp.handle_softirqs
>        0.00            +2.0        1.99 ± 36%      +1.1        1.13 ± 67%  perf-profile.children.cycles-pp.__irq_exit_rcu
>        0.00            +2.0        2.02 ± 38%      +1.3        1.33 ± 57%  perf-profile.children.cycles-pp.folios_put_refs
>        0.00            +2.1        2.06 ± 52%      +1.4        1.38 ± 77%  perf-profile.children.cycles-pp._raw_spin_lock
>        0.00            +2.1        2.12 ± 58%      +3.7        3.71 ± 40%  perf-profile.children.cycles-pp.open64
>        0.00            +2.2        2.16 ± 44%      +1.7        1.75 ± 30%  perf-profile.children.cycles-pp.do_pte_missing
>        0.00            +2.2        2.21 ± 68%      +2.2        2.18 ± 58%  perf-profile.children.cycles-pp.link_path_walk
>        0.00            +2.2        2.23 ± 33%      +1.4        1.40 ± 99%  perf-profile.children.cycles-pp.copy_process
>        0.00            +2.3        2.30 ± 40%      +1.8        1.78 ± 48%  perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
>        0.00            +2.3        2.30 ± 40%      +1.8        1.78 ± 48%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
>        0.00            +2.3        2.34 ±103%      +5.1        5.09 ± 64%  perf-profile.children.cycles-pp.perf_c2c__record
>        0.00            +2.3        2.34 ± 46%      +1.5        1.52 ± 99%  perf-profile.children.cycles-pp.walk_component
>        0.00            +2.4        2.37 ± 36%      +2.0        2.04 ± 32%  perf-profile.children.cycles-pp.zap_present_ptes
>        0.00            +2.5        2.48 ± 32%      +2.5        2.51 ± 55%  perf-profile.children.cycles-pp.get_cpu_sleep_time_us
>        0.00            +2.5        2.50 ± 73%      +1.6        1.56 ± 76%  perf-profile.children.cycles-pp.__evlist__enable
>        0.00            +2.6        2.62 ± 35%      +1.6        1.57 ± 91%  perf-profile.children.cycles-pp.__do_sys_clone
>        0.00            +2.6        2.62 ± 35%      +1.6        1.57 ± 91%  perf-profile.children.cycles-pp.kernel_clone
>        0.00            +2.7        2.67 ± 54%      +2.6        2.59 ± 40%  perf-profile.children.cycles-pp.load_elf_binary
>        0.00            +2.7        2.68 ± 35%      +3.0        3.02 ± 45%  perf-profile.children.cycles-pp.get_idle_time
>        0.00            +2.8        2.77 ± 33%      +4.2        4.17 ± 35%  perf-profile.children.cycles-pp.uptime_proc_show
>        0.00            +2.8        2.83 ± 48%      +2.6        2.59 ± 40%  perf-profile.children.cycles-pp.search_binary_handler
>        0.00            +2.8        2.83 ± 48%      +2.7        2.68 ± 42%  perf-profile.children.cycles-pp.exec_binprm
>        0.00            +2.9        2.91 ± 32%      +1.8        1.83 ± 85%  perf-profile.children.cycles-pp._Fork
>        0.00            +3.1        3.10 ± 64%      +0.9        0.95 ±252%  perf-profile.children.cycles-pp.proc_reg_read_iter
>        0.00            +3.2        3.24 ± 39%      +2.8        2.85 ± 49%  perf-profile.children.cycles-pp.bprm_execve
>        0.00            +3.2        3.24 ± 36%      +2.0        2.00 ± 56%  perf-profile.children.cycles-pp.__x64_sys_exit_group
>        0.00            +3.2        3.24 ± 36%      +2.1        2.09 ± 53%  perf-profile.children.cycles-pp.x64_sys_call
>        0.00            +3.8        3.85 ± 39%      +3.3        3.29 ± 47%  perf-profile.children.cycles-pp.execve
>        0.00            +3.8        3.85 ± 39%      +3.3        3.34 ± 49%  perf-profile.children.cycles-pp.__x64_sys_execve
>        0.00            +3.8        3.85 ± 39%      +3.3        3.34 ± 49%  perf-profile.children.cycles-pp.do_execveat_common
>        0.00            +4.0        3.99 ± 38%      +4.1        4.06 ± 54%  perf-profile.children.cycles-pp.mutex_unlock
>        0.00            +4.2        4.19 ± 31%      +3.0        3.02 ± 20%  perf-profile.children.cycles-pp.zap_pte_range
>        0.00            +4.2        4.25 ± 65%      +8.0        7.98 ± 43%  perf-profile.children.cycles-pp.generic_perform_write
>        0.00            +4.3        4.29 ± 29%      +3.0        3.02 ± 20%  perf-profile.children.cycles-pp.unmap_page_range
>        0.00            +4.3        4.29 ± 29%      +3.0        3.02 ± 20%  perf-profile.children.cycles-pp.zap_pmd_range
>        0.00            +4.3        4.31 ± 51%      +5.3        5.31 ± 46%  perf-profile.children.cycles-pp.do_filp_open
>        0.00            +4.3        4.31 ± 51%      +5.3        5.31 ± 46%  perf-profile.children.cycles-pp.path_openat
>        0.19 ± 23%      +4.4        4.60 ± 26%      +3.4        3.54 ± 27%  perf-profile.children.cycles-pp.__handle_mm_fault
>        0.00            +4.5        4.46 ± 59%      +8.1        8.07 ± 42%  perf-profile.children.cycles-pp.shmem_file_write_iter
>        0.00            +4.5        4.55 ± 24%      +4.0        3.97 ± 39%  perf-profile.children.cycles-pp.smp_call_function_single
>        0.00            +4.5        4.55 ± 24%      +4.1        4.06 ± 38%  perf-profile.children.cycles-pp.event_function_call
>        0.00            +4.6        4.58 ± 30%      +3.2        3.19 ± 24%  perf-profile.children.cycles-pp.unmap_vmas
>        0.51 ±  6%      +4.6        5.14 ± 24%      +3.6        4.06 ± 30%  perf-profile.children.cycles-pp.handle_mm_fault
>        0.00            +4.7        4.68 ± 55%      +8.4        8.41 ± 39%  perf-profile.children.cycles-pp.writen
>        0.00            +4.7        4.68 ± 55%      +8.5        8.49 ± 39%  perf-profile.children.cycles-pp.record__pushfn
>        0.00            +4.8        4.80 ± 48%      +6.1        6.15 ± 34%  perf-profile.children.cycles-pp.do_sys_openat2
>        0.77 ±  3%      +4.8        5.59 ± 21%      +4.3        5.07 ± 29%  perf-profile.children.cycles-pp.exc_page_fault
>        0.76 ±  3%      +4.8        5.59 ± 21%      +4.3        5.07 ± 29%  perf-profile.children.cycles-pp.do_user_addr_fault
>        0.00            +4.9        4.90 ± 57%     +10.3       10.28 ± 65%  perf-profile.children.cycles-pp.vfs_write
>        0.00            +4.9        4.90 ± 57%     +10.4       10.41 ± 63%  perf-profile.children.cycles-pp.ksys_write
>        0.00            +4.9        4.90 ± 48%      +6.1        6.15 ± 34%  perf-profile.children.cycles-pp.__x64_sys_openat
>        0.00            +4.9        4.92 ± 26%      +4.7        4.66 ± 47%  perf-profile.children.cycles-pp.sw_perf_event_destroy
>        0.00            +5.0        4.99 ±100%      +2.6        2.64 ±101%  perf-profile.children.cycles-pp.perf_rotate_context
>        0.00            +5.0        5.01 ± 54%     +10.9       10.87 ± 59%  perf-profile.children.cycles-pp.write
>        0.00            +5.1        5.09 ±102%      +2.7        2.74 ± 94%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
>        0.00            +5.4        5.43 ± 25%      +5.0        4.97 ± 45%  perf-profile.children.cycles-pp._free_event
>        1.18            +5.6        6.78 ± 20%      +5.5        6.71 ± 24%  perf-profile.children.cycles-pp.asm_exc_page_fault
>        0.46            +5.6        6.07 ± 90%      +4.1        4.54 ± 53%  perf-profile.children.cycles-pp.__hrtimer_run_queues
>        0.00            +5.7        5.75 ± 39%     +10.2       10.22 ± 24%  perf-profile.children.cycles-pp.perf_mmap__push
>        0.00            +5.7        5.75 ± 39%     +10.4       10.38 ± 23%  perf-profile.children.cycles-pp.record__mmap_read_evlist
>        0.53            +5.8        6.28 ± 89%      +4.4        4.91 ± 50%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
>        0.53            +5.8        6.28 ± 89%      +4.4        4.91 ± 50%  perf-profile.children.cycles-pp.hrtimer_interrupt
>        0.00            +6.6        6.65 ± 77%      +3.3        3.32 ± 91%  perf-profile.children.cycles-pp.__intel_pmu_enable_all
>        0.00            +6.8        6.85 ± 20%      +5.2        5.23 ± 25%  perf-profile.children.cycles-pp.exit_mm
>        0.58 ±  2%      +7.6        8.14 ± 75%      +6.0        6.55 ± 38%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
>        0.00            +7.7        7.67 ± 23%      +6.1        6.14 ± 15%  perf-profile.children.cycles-pp.exit_mmap
>        0.00            +7.7        7.67 ± 30%      +7.0        7.05 ± 50%  perf-profile.children.cycles-pp.seq_read_iter
>        0.00            +7.7        7.72 ± 80%      +8.2        8.15 ± 51%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
>        0.00            +7.8        7.75 ± 23%      +6.1        6.14 ± 15%  perf-profile.children.cycles-pp.__mmput
>        0.00            +8.0        8.03 ± 27%      +7.4        7.37 ± 52%  perf-profile.children.cycles-pp.ksys_read
>        0.00            +8.0        8.03 ± 27%      +7.4        7.37 ± 52%  perf-profile.children.cycles-pp.vfs_read
>        0.00            +8.1        8.13 ± 28%      +7.4        7.37 ± 52%  perf-profile.children.cycles-pp.read
>        0.02 ±141%      +9.0        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.children.cycles-pp.__cmd_record
>        0.02 ±141%      +9.0        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.children.cycles-pp.cmd_record
>        0.02 ±141%      +9.0        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.children.cycles-pp.handle_internal_command
>        0.02 ±141%      +9.0        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.children.cycles-pp.main
>        0.02 ±141%      +9.0        9.05 ± 35%     +13.9       13.88 ± 19%  perf-profile.children.cycles-pp.run_builtin
>        0.00            +9.3        9.26 ± 30%      +9.0        8.96 ± 31%  perf-profile.children.cycles-pp.perf_event_release_kernel
>        0.00            +9.3        9.26 ± 30%      +9.0        8.96 ± 31%  perf-profile.children.cycles-pp.perf_release
>        1.02 ±  4%      +9.3       10.33 ± 27%      +9.8       10.80 ± 35%  perf-profile.children.cycles-pp.task_work_run
>        0.00           +11.0       11.05 ± 28%     +10.4       10.37 ± 32%  perf-profile.children.cycles-pp.__fput
>        0.00           +15.8       15.85 ± 25%     +16.1       16.11 ± 29%  perf-profile.children.cycles-pp.get_signal
>        0.00           +15.8       15.85 ± 25%     +16.2       16.17 ± 29%  perf-profile.children.cycles-pp.arch_do_signal_or_restart
>        0.00           +19.1       19.09 ± 19%     +18.1       18.06 ± 29%  perf-profile.children.cycles-pp.do_exit
>        0.00           +19.1       19.09 ± 19%     +18.1       18.06 ± 29%  perf-profile.children.cycles-pp.do_group_exit
>        1.70 ±  2%     +30.7       32.41 ± 21%     +27.2       28.87 ± 12%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>        1.22 ±  3%     +36.9       38.10 ± 13%     +34.8       36.05 ± 11%  perf-profile.children.cycles-pp.start_secondary
>        1.21 ±  3%     +37.2       38.43 ± 11%     +34.2       35.40 ±  8%  perf-profile.children.cycles-pp.acpi_idle_do_entry
>        1.21 ±  3%     +37.2       38.43 ± 11%     +34.2       35.40 ±  8%  perf-profile.children.cycles-pp.acpi_idle_enter
>        1.21 ±  3%     +37.2       38.43 ± 11%     +34.2       35.40 ±  8%  perf-profile.children.cycles-pp.acpi_safe_halt
>        1.22 ±  3%     +37.3       38.54 ± 12%     +35.0       36.18 ± 10%  perf-profile.children.cycles-pp.cpuidle_idle_call
>        1.21 ±  3%     +37.3       38.54 ± 12%     +34.7       35.87 ± 10%  perf-profile.children.cycles-pp.cpuidle_enter
>        1.21 ±  3%     +37.3       38.54 ± 12%     +34.7       35.87 ± 10%  perf-profile.children.cycles-pp.cpuidle_enter_state
>        1.22 ±  3%     +37.6       38.84 ± 12%     +35.4       36.60 ± 11%  perf-profile.children.cycles-pp.common_startup_64
>        1.22 ±  3%     +37.6       38.84 ± 12%     +35.4       36.60 ± 11%  perf-profile.children.cycles-pp.cpu_startup_entry
>        1.22 ±  3%     +37.6       38.84 ± 12%     +35.4       36.60 ± 11%  perf-profile.children.cycles-pp.do_idle
>       92.37           -92.4        0.00           -92.4        0.00        perf-profile.self.cycles-pp.osq_lock
>        0.00            +0.1        0.09 ±264%      +0.8        0.84 ± 51%  perf-profile.self.cycles-pp.kcpustat_cpu_fetch
>        0.00            +2.1        2.06 ± 52%      +1.4        1.38 ± 77%  perf-profile.self.cycles-pp._raw_spin_lock
>        0.00            +2.6        2.61 ± 36%      +2.8        2.75 ± 48%  perf-profile.self.cycles-pp.smp_call_function_single
>        0.00            +3.7        3.68 ± 37%      +3.7        3.70 ± 64%  perf-profile.self.cycles-pp.mutex_unlock
>        0.00            +6.6        6.65 ± 77%      +3.3        3.32 ± 91%  perf-profile.self.cycles-pp.__intel_pmu_enable_all
>        1.19 ±  3%     +29.2       30.38 ± 15%     +27.9       29.13 ± 13%  perf-profile.self.cycles-pp.acpi_safe_halt
>
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-02-14 22:53           ` Yang Shi
@ 2025-02-18  6:30             ` Oliver Sang
  2025-02-19  1:12               ` Yang Shi
  0 siblings, 1 reply; 35+ messages in thread
From: Oliver Sang @ 2025-02-18  6:30 UTC (permalink / raw)
  To: Yang Shi
  Cc: oe-lkp, lkp, linux-kernel, arnd, gregkh, Liam.Howlett,
	lorenzo.stoakes, vbabka, jannh, willy, liushixin2, akpm,
	linux-mm, oliver.sang

hi, Yang Shi,

On Fri, Feb 14, 2025 at 02:53:37PM -0800, Yang Shi wrote:
> 
> On 2/12/25 6:04 PM, Oliver Sang wrote:
> > hi, Yang Shi,
> > 
> > On Fri, Feb 07, 2025 at 10:10:37AM -0800, Yang Shi wrote:
> > > On 2/6/25 12:02 AM, Oliver Sang wrote:
> > [...]
> > 
> > > > since we applied your "/dev/zero: make private mapping full anonymous mapping"
> > > > patch upon a68d3cbfad like below:
> > > > 
> > > > * 7143ee2391f1e /dev/zero: make private mapping full anonymous mapping
> > > > * a68d3cbfade64 memstick: core: fix kernel-doc notation
> > > > 
> > > > so I applied below patch also upon a68d3cbfad.
> > > > 
> > > > we saw big improvement but not that big.
> > > > 
> > > > =========================================================================================
> > > > compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
> > > >     gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability
> > > > 
> > > > commit:
> > > >     a68d3cbfad ("memstick: core: fix kernel-doc notation")
> > > >     52ec85cb99  <--- your patch
> > > > 
> > > > 
> > > > a68d3cbfade64392 52ec85cb99e9b31dc304eae965a
> > > > ---------------- ---------------------------
> > > >            %stddev     %change         %stddev
> > > >                \          |                \
> > > >     14364828 ±  4%    +410.6%   73349239 ±  3%  vm-scalability.throughput
> > > > 
> > > > full comparison as below [1] just FYI.
> > > Thanks for the update. I stared at the profiling report for a whole day, but
> > > I didn't figure out where that 400% lost. I just saw the number of page
> > > faults was fewer. And it seems like the reduction of page faults match the
> > > 400% loss. So I did more trace and profiling.
> > > 
> > > The test case did the below stuff in a tight loop:
> > >    mmap 40K memory from /dev/zero (read only)
> > >    read the area
> > > 
> > > So two major factors to the performance: mmap and page fault. The
> > > alternative patch did reduce the overhead of mmap to the same level as the
> > > original patch.
> > > 
> > > The further perf profiling showed the cost of page fault is higher than the
> > > original patch. But the profiling of page fault was interesting:
> > > 
> > > -   44.87%     0.01%  usemem [kernel.kallsyms]                   [k]
> > > do_translation_fault
> > >     - 44.86% do_translation_fault
> > >        - 44.83% do_page_fault
> > >           - 44.53% handle_mm_fault
> > >                9.04% __handle_mm_fault
> > > 
> > > Page fault consumed 40% of cpu time in handle_mm_fault, but
> > > __handle_mm_fault just consumed 9%, I expected it should be the major
> > > consumer.
> > > 
> > > So I annotated handle_mm_fault, then found the most time was consumed by
> > > lru_gen_enter_fault() -> vma_has_recency() (my kernel has multi-gen LRU
> > > enabled):
> > > 
> > >        │     if (vma->vm_file && (vma->vm_file->f_mode & FMODE_NOREUSE))
> > >         │     ↓ cbz     x1, b4
> > >    0.00 │       ldr     w0, [x1, #12]
> > >   99.59 │       eor     x0, x0, #0x800000
> > >    0.00 │       ubfx    w0, w0, #23, #1
> > >         │     current->in_lru_fault = vma_has_recency(vma);
> > >    0.00 │ b4:   ldrh    w1, [x2, #1992]
> > >    0.01 │       bfi     w1, w0, #5, #1
> > >    0.00 │       strh    w1, [x2, #1992]
> > > 
> > > 
> > > vma_has_recency() read vma->vm_file->f_mode if vma->vm_file is not NULL. But
> > > that load took a long time. So I inspected struct file and saw:
> > > 
> > > struct file {
> > >      file_ref_t            f_ref;
> > >      spinlock_t            f_lock;
> > >      fmode_t                f_mode;
> > >      const struct file_operations    *f_op;
> > >      ...
> > > }
> > > 
> > > The f_mode is in the same cache line with f_ref (my kernel does NOT have
> > > spin lock debug enabled). The test case mmap /dev/zero in a tight loop, so
> > > the refcount is modified (fget/fput) very frequently, this resulted in
> > > somehow false sharing.
> > > 
> > > So I tried the below patch on top of the alternative patch:
> > > 
> > > diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> > > index f9157a0c42a5..ba11dc0b1c7c 100644
> > > --- a/include/linux/mm_inline.h
> > > +++ b/include/linux/mm_inline.h
> > > @@ -608,6 +608,9 @@ static inline bool vma_has_recency(struct vm_area_struct
> > > *vma)
> > >          if (vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))
> > >                  return false;
> > > 
> > > +       if (vma_is_anonymous(vma))
> > > +               return true;
> > > +
> > >          if (vma->vm_file && (vma->vm_file->f_mode & FMODE_NOREUSE))
> > >                  return false;
> > > 
> > > This made the profiling of page fault look normal:
> > > 
> > >                          - 1.90% do_translation_fault
> > >                             - 1.87% do_page_fault
> > >                                - 1.49% handle_mm_fault
> > >                                   - 1.36% __handle_mm_fault
> > > 
> > > Please try this in your test.
> > > 
> > > But AFAICT I have never seen performance issue reported due to the false
> > > sharing of refcount and other fields in struct file. This benchmark stressed
> > > this quite badly.
> > I applied your above patch upon alternative patch last time, then saw more
> > improvement (+445.2% vs a68d3cbfad), but still not that big as in our original
> > report.
> 
> Thanks for the update. It looks like the problem is still in page faults. I
> did my test on arm64 machine. I also noticed struct file has
> "__randomize_layout", so it may have different layout on x86 than arm64?
> 
> The page fault handler may also access other fields of struct file that may
> cause false sharing, for example, accessing f_mapping to read gfp flags.
> This may not be a problem on my machine, but may be more costly on yours
> depending on the real layout of struct file on the machines,
> 
> Can you please try the below patch on top of the current patches? Thank you
> so much for your patience.

you are welcome!

now has more improvements. I just list "a68d3cbfad + 3 patches so far" vs
a68d3cbfad below, if you want more data, please let me know.

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability

commit:
  a68d3cbfad ("memstick: core: fix kernel-doc notation")
  edc84ea79f  <--- a68d3cbfad + 3 patches so far
  
a68d3cbfade64392 edc84ea79f8dc11853076b96ad5
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
  14364828 ±  4%    +685.6%  1.129e+08 ±  5%  vm-scalability.throughput

full data is as below [1] FYI.

> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 539c0f7c6d54..1fa9dbce0f66 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3214,6 +3214,9 @@ static gfp_t __get_fault_gfp_mask(struct
> vm_area_struct *vma)
>  {
>         struct file *vm_file = vma->vm_file;
> 
> +       if (vma_is_anonymous(vma))
> +               return GFP_KERNEL;
> +
>         if (vm_file)
>                 return mapping_gfp_mask(vm_file->f_mapping) | __GFP_FS |
> __GFP_IO;
> 

[1]
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability

commit:
  a68d3cbfad ("memstick: core: fix kernel-doc notation")
  edc84ea79f  <--- a68d3cbfad + 3 patches so far
  
a68d3cbfade64392 edc84ea79f8dc11853076b96ad5
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
 5.262e+09 ±  3%     -59.8%  2.114e+09 ±  2%  cpuidle..time
   7924008 ±  3%     -83.9%    1275131 ±  5%  cpuidle..usage
   1871164 ±  4%     -16.8%    1557233 ±  8%  numa-numastat.node3.local_node
   1952164 ±  3%     -14.8%    1663189 ±  7%  numa-numastat.node3.numa_hit
    399.52           -75.0%      99.77 ±  2%  uptime.boot
     14507           -22.1%      11296        uptime.idle
      3408 ±  5%     -99.8%       7.25 ± 46%  perf-c2c.DRAM.local
     18076 ±  3%     -99.8%      43.00 ±100%  perf-c2c.DRAM.remote
      8082 ±  5%     -99.8%      12.50 ± 63%  perf-c2c.HITM.local
      6544 ±  6%     -99.7%      22.88 ±151%  perf-c2c.HITM.remote
     14627 ±  4%     -99.8%      35.38 ±114%  perf-c2c.HITM.total
      6.99 ±  3%    +177.6%      19.41 ±  3%  vmstat.cpu.id
     91.35           -28.5%      65.31        vmstat.cpu.sy
      1.71          +793.1%      15.25 ±  4%  vmstat.cpu.us
     34204 ±  5%     -64.1%      12271 ±  9%  vmstat.system.cs
    266575           -21.2%     210049        vmstat.system.in
      6.49 ±  3%     +10.0       16.46 ±  3%  mpstat.cpu.all.idle%
      0.63            -0.3        0.34 ±  3%  mpstat.cpu.all.irq%
      0.03 ±  2%      +0.3        0.31 ±  4%  mpstat.cpu.all.soft%
     91.17           -24.1       67.09        mpstat.cpu.all.sys%
      1.68 ±  2%     +14.1       15.80 ±  4%  mpstat.cpu.all.usr%
    337.33           -98.7%       4.25 ± 10%  mpstat.max_utilization.seconds
    352.76           -84.7%      53.95 ±  4%  time.elapsed_time
    352.76           -84.7%      53.95 ±  4%  time.elapsed_time.max
    225965 ±  7%     -17.1%     187329 ± 12%  time.involuntary_context_switches
 9.592e+08 ±  4%     +11.9%  1.074e+09        time.minor_page_faults
     20852           -10.0%      18761        time.percent_of_cpu_this_job_got
     72302           -88.6%       8227 ±  6%  time.system_time
      1260 ±  3%     +50.7%       1899        time.user_time
   5393707 ±  5%     -98.8%      66895 ± 21%  time.voluntary_context_switches
   1609925           -50.7%     793216        meminfo.Active
   1609925           -50.7%     793216        meminfo.Active(anon)
    160837 ± 33%     -72.5%      44155 ±  9%  meminfo.AnonHugePages
   4435665           -18.7%    3608195        meminfo.Cached
   1775547           -44.2%     990889        meminfo.Committed_AS
    148539           -47.4%      78096        meminfo.Mapped
   4245538 ±  4%     -24.6%    3202495        meminfo.PageTables
    929777           -88.9%     102759        meminfo.Shmem
  25676018 ±  3%     +14.3%   29335678        meminfo.max_used_kB
     64129 ±  4%    +706.8%     517389 ±  7%  vm-scalability.median
     45.40 ±  5%   +2248.9        2294 ±  2%  vm-scalability.stddev%
  14364828 ±  4%    +685.6%  1.129e+08 ±  5%  vm-scalability.throughput
    352.76           -84.7%      53.95 ±  4%  vm-scalability.time.elapsed_time
    352.76           -84.7%      53.95 ±  4%  vm-scalability.time.elapsed_time.max
    225965 ±  7%     -17.1%     187329 ± 12%  vm-scalability.time.involuntary_context_switches
 9.592e+08 ±  4%     +11.9%  1.074e+09        vm-scalability.time.minor_page_faults
     20852           -10.0%      18761        vm-scalability.time.percent_of_cpu_this_job_got
     72302           -88.6%       8227 ±  6%  vm-scalability.time.system_time
      1260 ±  3%     +50.7%       1899        vm-scalability.time.user_time
   5393707 ±  5%     -98.8%      66895 ± 21%  vm-scalability.time.voluntary_context_switches
 4.316e+09 ±  4%     +11.9%  4.832e+09        vm-scalability.workload
   1063552 ±  4%     -24.9%     799008 ±  3%  numa-meminfo.node0.PageTables
    125455 ±106%     -85.5%      18164 ±165%  numa-meminfo.node0.Shmem
   1062709 ±  4%     -25.7%     789746 ±  4%  numa-meminfo.node1.PageTables
    176171 ± 71%     -92.4%      13303 ±230%  numa-meminfo.node1.Shmem
     35515 ± 91%     -97.3%     976.55 ± 59%  numa-meminfo.node2.Mapped
   1058901 ±  4%     -25.3%     791392 ±  4%  numa-meminfo.node2.PageTables
    770405 ± 30%     -79.2%     160245 ±101%  numa-meminfo.node3.Active
    770405 ± 30%     -79.2%     160245 ±101%  numa-meminfo.node3.Active(anon)
    380096 ± 50%     -62.5%     142513 ± 98%  numa-meminfo.node3.AnonPages.max
   1146977 ±108%     -92.8%      82894 ± 60%  numa-meminfo.node3.FilePages
     52663 ± 47%     -97.2%       1488 ± 39%  numa-meminfo.node3.Mapped
   1058539 ±  4%     -22.3%     821992 ±  3%  numa-meminfo.node3.PageTables
    558943 ± 14%     -93.7%      35227 ±124%  numa-meminfo.node3.Shmem
    265763 ±  4%     -24.9%     199601 ±  3%  numa-vmstat.node0.nr_page_table_pages
     31364 ±106%     -85.5%       4539 ±165%  numa-vmstat.node0.nr_shmem
    265546 ±  4%     -25.5%     197854 ±  5%  numa-vmstat.node1.nr_page_table_pages
     44052 ± 71%     -92.5%       3323 ±230%  numa-vmstat.node1.nr_shmem
      8961 ± 91%     -97.3%     244.02 ± 59%  numa-vmstat.node2.nr_mapped
    264589 ±  4%     -25.2%     197920 ±  3%  numa-vmstat.node2.nr_page_table_pages
    192683 ± 30%     -79.2%      40126 ±101%  numa-vmstat.node3.nr_active_anon
    286819 ±108%     -92.8%      20761 ± 60%  numa-vmstat.node3.nr_file_pages
     13124 ± 49%     -97.2%     372.02 ± 39%  numa-vmstat.node3.nr_mapped
    264499 ±  4%     -22.4%     205376 ±  3%  numa-vmstat.node3.nr_page_table_pages
    139810 ± 14%     -93.7%       8844 ±124%  numa-vmstat.node3.nr_shmem
    192683 ± 30%     -79.2%      40126 ±101%  numa-vmstat.node3.nr_zone_active_anon
   1951359 ±  3%     -14.9%    1661427 ±  7%  numa-vmstat.node3.numa_hit
   1870359 ±  4%     -16.8%    1555470 ±  8%  numa-vmstat.node3.numa_local
    402515           -50.7%     198246        proc-vmstat.nr_active_anon
    170568            +1.8%     173591        proc-vmstat.nr_anon_pages
   1109246           -18.7%     902238        proc-vmstat.nr_file_pages
     37525           -47.3%      19768        proc-vmstat.nr_mapped
   1059932 ±  4%     -24.2%     803105 ±  2%  proc-vmstat.nr_page_table_pages
    232507           -89.0%      25623        proc-vmstat.nr_shmem
     37297            -5.4%      35299        proc-vmstat.nr_slab_reclaimable
    402515           -50.7%     198246        proc-vmstat.nr_zone_active_anon
     61931 ±  8%     -83.9%       9948 ± 59%  proc-vmstat.numa_hint_faults
     15755 ± 21%     -96.6%     541.38 ± 36%  proc-vmstat.numa_hint_faults_local
   6916516 ±  3%      -8.0%    6360040        proc-vmstat.numa_hit
   6568542 ±  3%      -8.5%    6012265        proc-vmstat.numa_local
    293942 ±  3%     -68.8%      91724 ± 48%  proc-vmstat.numa_pte_updates
 9.608e+08 ±  4%     +11.8%  1.074e+09        proc-vmstat.pgfault
     55981 ±  2%     -68.7%      17541 ±  2%  proc-vmstat.pgreuse
      0.82 ±  4%     -51.0%       0.40 ±  8%  perf-stat.i.MPKI
 2.714e+10 ±  2%    +378.3%  1.298e+11 ±  9%  perf-stat.i.branch-instructions
      0.11 ±  3%      +0.1        0.24 ±  8%  perf-stat.i.branch-miss-rate%
  24932893          +306.8%  1.014e+08 ±  9%  perf-stat.i.branch-misses
     64.93            -7.5       57.48        perf-stat.i.cache-miss-rate%
  88563288 ±  3%     +35.0%  1.196e+08 ±  7%  perf-stat.i.cache-misses
 1.369e+08 ±  3%     +43.7%  1.968e+08 ±  7%  perf-stat.i.cache-references
     34508 ±  4%     -47.3%      18199 ±  9%  perf-stat.i.context-switches
      7.67           -75.7%       1.87 ±  3%  perf-stat.i.cpi
    224605           +22.5%     275084 ±  6%  perf-stat.i.cpu-clock
    696.35 ±  2%     -53.5%     323.77 ±  2%  perf-stat.i.cpu-migrations
     10834 ±  4%     -24.1%       8224 ± 11%  perf-stat.i.cycles-between-cache-misses
 1.102e+11          +282.2%  4.212e+11 ±  9%  perf-stat.i.instructions
      0.14          +334.6%       0.62 ±  5%  perf-stat.i.ipc
     24.25 ±  3%    +626.9%     176.25 ±  4%  perf-stat.i.metric.K/sec
   2722043 ±  3%    +803.8%   24600740 ±  9%  perf-stat.i.minor-faults
   2722043 ±  3%    +803.8%   24600739 ±  9%  perf-stat.i.page-faults
    224605           +22.5%     275084 ±  6%  perf-stat.i.task-clock
      0.81 ±  3%     -62.2%       0.31 ± 11%  perf-stat.overall.MPKI
      0.09            -0.0        0.08 ±  2%  perf-stat.overall.branch-miss-rate%
     64.81            -2.4       62.37        perf-stat.overall.cache-miss-rate%
      7.24           -70.7%       2.12 ±  5%  perf-stat.overall.cpi
      8933 ±  4%     -21.9%       6978 ±  7%  perf-stat.overall.cycles-between-cache-misses
      0.14          +242.2%       0.47 ±  5%  perf-stat.overall.ipc
      9012 ±  2%     -57.8%       3806        perf-stat.overall.path-length
 2.701e+10 ±  2%    +285.4%  1.041e+11 ±  5%  perf-stat.ps.branch-instructions
  24708939          +215.8%   78042343 ±  4%  perf-stat.ps.branch-misses
  89032538 ±  3%     +15.9%  1.032e+08 ±  8%  perf-stat.ps.cache-misses
 1.374e+08 ±  3%     +20.6%  1.656e+08 ±  9%  perf-stat.ps.cache-references
     34266 ±  5%     -66.2%      11570 ± 10%  perf-stat.ps.context-switches
    223334            -1.6%     219861        perf-stat.ps.cpu-clock
 7.941e+11            -9.9%  7.157e+11        perf-stat.ps.cpu-cycles
    693.54 ±  2%     -67.2%     227.38 ±  4%  perf-stat.ps.cpu-migrations
 1.097e+11          +208.3%  3.381e+11 ±  5%  perf-stat.ps.instructions
   2710577 ±  3%    +626.7%   19698901 ±  5%  perf-stat.ps.minor-faults
   2710577 ±  3%    +626.7%   19698902 ±  5%  perf-stat.ps.page-faults
    223334            -1.6%     219861        perf-stat.ps.task-clock
 3.886e+13 ±  2%     -52.7%  1.839e+13        perf-stat.total.instructions
  64052898 ±  5%     -99.9%      81213 ± 23%  sched_debug.cfs_rq:/.avg_vruntime.avg
  95701822 ±  7%     -96.4%    3425672 ±  7%  sched_debug.cfs_rq:/.avg_vruntime.max
  43098762 ±  6%    -100.0%     153.42 ± 36%  sched_debug.cfs_rq:/.avg_vruntime.min
   9223270 ±  9%     -95.9%     380347 ± 16%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.00 ± 22%    -100.0%       0.00        sched_debug.cfs_rq:/.h_nr_delayed.avg
      0.69 ±  8%    -100.0%       0.00        sched_debug.cfs_rq:/.h_nr_delayed.max
      0.05 ± 12%    -100.0%       0.00        sched_debug.cfs_rq:/.h_nr_delayed.stddev
      0.78 ±  2%     -94.5%       0.04 ± 21%  sched_debug.cfs_rq:/.h_nr_running.avg
      1.97 ±  5%     -49.3%       1.00        sched_debug.cfs_rq:/.h_nr_running.max
      0.28 ±  7%     -29.1%       0.20 ± 10%  sched_debug.cfs_rq:/.h_nr_running.stddev
    411536 ± 58%    -100.0%       1.15 ±182%  sched_debug.cfs_rq:/.left_deadline.avg
  43049468 ± 22%    -100.0%     258.27 ±182%  sched_debug.cfs_rq:/.left_deadline.max
   3836405 ± 37%    -100.0%      17.22 ±182%  sched_debug.cfs_rq:/.left_deadline.stddev
    411536 ± 58%    -100.0%       1.06 ±191%  sched_debug.cfs_rq:/.left_vruntime.avg
  43049467 ± 22%    -100.0%     236.56 ±191%  sched_debug.cfs_rq:/.left_vruntime.max
   3836405 ± 37%    -100.0%      15.77 ±191%  sched_debug.cfs_rq:/.left_vruntime.stddev
  64052901 ±  5%     -99.9%      81213 ± 23%  sched_debug.cfs_rq:/.min_vruntime.avg
  95701822 ±  7%     -96.4%    3425672 ±  7%  sched_debug.cfs_rq:/.min_vruntime.max
  43098762 ±  6%    -100.0%     153.42 ± 36%  sched_debug.cfs_rq:/.min_vruntime.min
   9223270 ±  9%     -95.9%     380347 ± 16%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.77 ±  2%     -94.4%       0.04 ± 21%  sched_debug.cfs_rq:/.nr_running.avg
      1.50 ±  9%     -33.3%       1.00        sched_debug.cfs_rq:/.nr_running.max
      0.26 ± 10%     -22.7%       0.20 ± 10%  sched_debug.cfs_rq:/.nr_running.stddev
      1.61 ± 24%    +413.4%       8.24 ± 60%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
     86.69          +508.6%     527.62 ±  4%  sched_debug.cfs_rq:/.removed.runnable_avg.max
     11.14 ± 13%    +428.4%      58.87 ± 32%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
      1.61 ± 24%    +413.3%       8.24 ± 60%  sched_debug.cfs_rq:/.removed.util_avg.avg
     86.69          +508.6%     527.62 ±  4%  sched_debug.cfs_rq:/.removed.util_avg.max
     11.14 ± 13%    +428.4%      58.87 ± 32%  sched_debug.cfs_rq:/.removed.util_avg.stddev
    411536 ± 58%    -100.0%       1.06 ±191%  sched_debug.cfs_rq:/.right_vruntime.avg
  43049467 ± 22%    -100.0%     236.56 ±191%  sched_debug.cfs_rq:/.right_vruntime.max
   3836405 ± 37%    -100.0%      15.77 ±191%  sched_debug.cfs_rq:/.right_vruntime.stddev
    769.03           -84.7%     117.79 ±  3%  sched_debug.cfs_rq:/.util_avg.avg
      1621 ±  5%     -32.7%       1092 ± 16%  sched_debug.cfs_rq:/.util_avg.max
    159.12 ±  8%     +33.2%     211.88 ±  7%  sched_debug.cfs_rq:/.util_avg.stddev
    724.17 ±  2%     -98.6%      10.41 ± 32%  sched_debug.cfs_rq:/.util_est.avg
      1360 ± 15%     -51.5%     659.38 ± 10%  sched_debug.cfs_rq:/.util_est.max
    234.34 ±  9%     -68.2%      74.43 ± 18%  sched_debug.cfs_rq:/.util_est.stddev
    766944 ±  3%     +18.9%     912012        sched_debug.cpu.avg_idle.avg
   1067639 ±  5%     +25.5%    1339736 ±  9%  sched_debug.cpu.avg_idle.max
      3799 ±  7%     -38.3%       2346 ± 23%  sched_debug.cpu.avg_idle.min
    321459 ±  2%     -36.6%     203909 ±  7%  sched_debug.cpu.avg_idle.stddev
    195573           -76.9%      45144        sched_debug.cpu.clock.avg
    195596           -76.9%      45160        sched_debug.cpu.clock.max
    195548           -76.9%      45123        sched_debug.cpu.clock.min
     13.79 ±  3%     -36.0%       8.83 ±  2%  sched_debug.cpu.clock.stddev
    194424           -76.8%      45019        sched_debug.cpu.clock_task.avg
    194608           -76.8%      45145        sched_debug.cpu.clock_task.max
    181834           -82.1%      32559        sched_debug.cpu.clock_task.min
      4241 ±  2%     -96.8%     136.38 ± 21%  sched_debug.cpu.curr->pid.avg
      9799 ±  2%     -59.8%       3934        sched_debug.cpu.curr->pid.max
      1365 ± 10%     -49.1%     695.11 ± 10%  sched_debug.cpu.curr->pid.stddev
    537665 ±  4%     +28.3%     690006 ±  6%  sched_debug.cpu.max_idle_balance_cost.max
      3119 ± 56%    +479.5%      18078 ± 29%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.00 ± 12%     -68.3%       0.00 ± 17%  sched_debug.cpu.next_balance.stddev
      0.78 ±  2%     -95.3%       0.04 ± 20%  sched_debug.cpu.nr_running.avg
      2.17 ±  8%     -53.8%       1.00        sched_debug.cpu.nr_running.max
      0.29 ±  8%     -35.4%       0.19 ±  9%  sched_debug.cpu.nr_running.stddev
     25773 ±  5%     -97.0%     764.82 ±  3%  sched_debug.cpu.nr_switches.avg
     48669 ± 10%     -77.2%      11080 ± 12%  sched_debug.cpu.nr_switches.max
     19006 ±  7%     -99.2%     151.12 ± 15%  sched_debug.cpu.nr_switches.min
      4142 ±  8%     -69.5%       1264 ±  6%  sched_debug.cpu.nr_switches.stddev
      0.07 ± 23%     -93.3%       0.01 ± 53%  sched_debug.cpu.nr_uninterruptible.avg
    240.19 ± 16%     -80.2%      47.50 ± 44%  sched_debug.cpu.nr_uninterruptible.max
    -77.92           -88.1%      -9.25        sched_debug.cpu.nr_uninterruptible.min
     37.87 ±  5%     -84.7%       5.78 ± 13%  sched_debug.cpu.nr_uninterruptible.stddev
    195549           -76.9%      45130        sched_debug.cpu_clk
    194699           -77.3%      44280        sched_debug.ktime
      0.00          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.avg
      0.17          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.max
      0.01          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.stddev
    196368           -76.6%      45975        sched_debug.sched_clk
     95.59           -95.6        0.00        perf-profile.calltrace.cycles-pp.__mmap
     95.54           -95.5        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     95.54           -95.5        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
     94.54           -94.5        0.00        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     94.46           -94.4        0.07 ±264%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
     94.45           -94.0        0.41 ±158%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     94.14           -93.9        0.29 ±134%  perf-profile.calltrace.cycles-pp.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
     94.25           -93.8        0.41 ±158%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
     93.79           -93.7        0.07 ±264%  perf-profile.calltrace.cycles-pp.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
     93.44           -93.4        0.00        perf-profile.calltrace.cycles-pp.down_write.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap
     93.40           -93.4        0.00        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma.__mmap_region
     93.33           -93.3        0.00        perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma
     92.89           -92.9        0.00        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file
      0.00            +1.7        1.69 ± 65%  perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
      0.00            +1.9        1.90 ± 55%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
      0.00            +1.9        1.90 ± 55%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      0.00            +1.9        1.93 ± 53%  perf-profile.calltrace.cycles-pp.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.9        1.93 ± 53%  perf-profile.calltrace.cycles-pp.seq_read_iter.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64
      0.00            +2.0        1.99 ± 53%  perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +2.0        2.02 ± 64%  perf-profile.calltrace.cycles-pp.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      0.00            +2.3        2.27 ± 56%  perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +2.3        2.27 ± 56%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +2.3        2.27 ± 56%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +2.3        2.27 ± 56%  perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
      0.00            +2.4        2.45 ± 53%  perf-profile.calltrace.cycles-pp._Fork
      0.00            +2.5        2.51 ± 52%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +2.5        2.51 ± 52%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
      0.00            +2.5        2.51 ± 52%  perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +2.5        2.51 ± 52%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +3.2        3.17 ± 42%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00            +3.3        3.28 ± 52%  perf-profile.calltrace.cycles-pp.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin
      0.00            +3.3        3.28 ± 52%  perf-profile.calltrace.cycles-pp.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin.handle_internal_command
      0.00            +4.1        4.10 ± 45%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.run_builtin.handle_internal_command.main
      0.00            +4.1        4.10 ± 45%  perf-profile.calltrace.cycles-pp.cmd_record.run_builtin.handle_internal_command.main
      0.00            +4.8        4.80 ± 61%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
      0.00            +5.0        4.98 ± 69%  perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
      0.00            +5.1        5.07 ± 71%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn
      0.00            +5.1        5.07 ± 71%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn.perf_mmap__push
      0.00            +5.1        5.07 ± 71%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen
      0.00            +5.1        5.07 ± 71%  perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +5.1        5.07 ± 71%  perf-profile.calltrace.cycles-pp.write.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist
      0.00            +5.1        5.07 ± 71%  perf-profile.calltrace.cycles-pp.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record
      0.00            +5.1        5.11 ± 47%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
      0.00            +5.1        5.12 ± 70%  perf-profile.calltrace.cycles-pp.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
      0.00            +6.1        6.08 ± 50%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
      0.00            +7.8        7.84 ± 21%  perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.00            +7.9        7.88 ± 20%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.00            +7.9        7.88 ± 20%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
      0.00            +7.9        7.88 ± 20%  perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.00            +7.9        7.88 ± 20%  perf-profile.calltrace.cycles-pp.read
      0.00           +11.1       11.10 ± 41%  perf-profile.calltrace.cycles-pp.handle_internal_command.main
      0.00           +11.1       11.10 ± 41%  perf-profile.calltrace.cycles-pp.main
      0.00           +11.1       11.10 ± 41%  perf-profile.calltrace.cycles-pp.run_builtin.handle_internal_command.main
      0.00           +11.2       11.18 ± 73%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.00           +15.9       15.94 ± 41%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00           +15.9       15.94 ± 41%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.00           +19.5       19.54 ± 41%  perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
      1.21 ±  3%     +36.7       37.86 ±  7%  perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      1.21 ±  3%     +36.7       37.86 ±  7%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      1.21 ±  3%     +37.0       38.24 ±  7%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      1.21 ±  3%     +37.2       38.41 ±  7%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.21 ±  3%     +37.4       38.57 ±  6%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      1.22 ±  3%     +38.5       39.67 ±  7%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
      1.22 ±  3%     +38.5       39.67 ±  7%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      1.22 ±  3%     +38.5       39.67 ±  7%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
      1.22 ±  3%     +38.9       40.09 ±  6%  perf-profile.calltrace.cycles-pp.common_startup_64
      2.19 ±  3%     +45.2       47.41 ± 14%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
     95.60           -95.4        0.22 ±135%  perf-profile.children.cycles-pp.__mmap
     94.55           -93.9        0.60 ±103%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
     94.14           -93.7        0.44 ±112%  perf-profile.children.cycles-pp.__mmap_new_vma
     93.79           -93.7        0.10 ±264%  perf-profile.children.cycles-pp.vma_link_file
     94.46           -93.5        0.96 ± 76%  perf-profile.children.cycles-pp.vm_mmap_pgoff
     94.45           -93.5        0.96 ± 76%  perf-profile.children.cycles-pp.do_mmap
     94.25           -93.4        0.86 ± 87%  perf-profile.children.cycles-pp.__mmap_region
     93.40           -93.4        0.00        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
     93.33           -93.3        0.00        perf-profile.children.cycles-pp.rwsem_optimistic_spin
     93.44           -93.2        0.22 ±149%  perf-profile.children.cycles-pp.down_write
     92.91           -92.9        0.00        perf-profile.children.cycles-pp.osq_lock
     95.58           -45.4       50.16 ±  8%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     95.58           -45.4       50.16 ±  8%  perf-profile.children.cycles-pp.do_syscall_64
      0.00            +1.1        1.12 ± 74%  perf-profile.children.cycles-pp.filemap_map_pages
      0.00            +1.1        1.12 ± 76%  perf-profile.children.cycles-pp.vfs_fstatat
      0.00            +1.2        1.19 ± 35%  perf-profile.children.cycles-pp.vsnprintf
      0.00            +1.2        1.20 ± 46%  perf-profile.children.cycles-pp.seq_printf
      0.00            +1.3        1.28 ± 78%  perf-profile.children.cycles-pp.__do_sys_newfstatat
      0.00            +1.5        1.54 ± 75%  perf-profile.children.cycles-pp.folios_put_refs
      0.00            +1.6        1.56 ± 52%  perf-profile.children.cycles-pp.__cond_resched
      0.00            +1.6        1.60 ± 32%  perf-profile.children.cycles-pp.sched_balance_newidle
      0.00            +1.7        1.69 ± 65%  perf-profile.children.cycles-pp.dup_mm
      0.00            +1.9        1.93 ± 53%  perf-profile.children.cycles-pp.proc_reg_read_iter
      0.00            +2.0        1.99 ± 53%  perf-profile.children.cycles-pp.copy_process
      0.00            +2.1        2.06 ± 51%  perf-profile.children.cycles-pp.__x64_sys_ioctl
      0.00            +2.1        2.08 ± 45%  perf-profile.children.cycles-pp.proc_single_show
      0.00            +2.1        2.14 ± 45%  perf-profile.children.cycles-pp.seq_read
      0.00            +2.2        2.16 ± 47%  perf-profile.children.cycles-pp.ioctl
      0.00            +2.2        2.17 ± 33%  perf-profile.children.cycles-pp.schedule
      0.00            +2.2        2.20 ± 28%  perf-profile.children.cycles-pp.__pick_next_task
      0.00            +2.2        2.21 ± 47%  perf-profile.children.cycles-pp.perf_evsel__run_ioctl
      0.00            +2.3        2.26 ± 58%  perf-profile.children.cycles-pp.do_read_fault
      0.00            +2.3        2.27 ± 56%  perf-profile.children.cycles-pp.__do_sys_clone
      0.00            +2.3        2.27 ± 56%  perf-profile.children.cycles-pp.kernel_clone
      0.00            +2.4        2.37 ± 58%  perf-profile.children.cycles-pp.zap_present_ptes
      0.00            +2.4        2.45 ± 53%  perf-profile.children.cycles-pp._Fork
      0.00            +2.6        2.59 ± 53%  perf-profile.children.cycles-pp.__x64_sys_exit_group
      0.00            +2.6        2.59 ± 53%  perf-profile.children.cycles-pp.x64_sys_call
      0.00            +2.6        2.64 ± 44%  perf-profile.children.cycles-pp.do_pte_missing
      0.00            +3.1        3.13 ± 59%  perf-profile.children.cycles-pp.zap_pte_range
      0.00            +3.2        3.21 ± 58%  perf-profile.children.cycles-pp.zap_pmd_range
      0.00            +3.4        3.40 ± 56%  perf-profile.children.cycles-pp.unmap_page_range
      0.00            +3.4        3.43 ± 55%  perf-profile.children.cycles-pp.unmap_vmas
      0.19 ± 23%      +3.9        4.06 ± 45%  perf-profile.children.cycles-pp.__handle_mm_fault
      0.51 ±  6%      +4.0        4.49 ± 38%  perf-profile.children.cycles-pp.handle_mm_fault
      0.04 ± 44%      +4.0        4.04 ± 28%  perf-profile.children.cycles-pp.__schedule
      0.77 ±  3%      +4.4        5.18 ± 39%  perf-profile.children.cycles-pp.exc_page_fault
      0.76 ±  3%      +4.4        5.18 ± 39%  perf-profile.children.cycles-pp.do_user_addr_fault
      0.58 ±  2%      +4.7        5.26 ± 53%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.00            +5.1        5.07 ± 71%  perf-profile.children.cycles-pp.writen
      0.00            +5.1        5.07 ± 69%  perf-profile.children.cycles-pp.generic_perform_write
      0.00            +5.1        5.12 ± 47%  perf-profile.children.cycles-pp.exit_mm
      0.00            +5.1        5.12 ± 70%  perf-profile.children.cycles-pp.record__pushfn
      0.00            +5.1        5.12 ± 70%  perf-profile.children.cycles-pp.shmem_file_write_iter
      1.18            +5.5        6.69 ± 33%  perf-profile.children.cycles-pp.asm_exc_page_fault
      0.00            +6.2        6.24 ± 43%  perf-profile.children.cycles-pp.__mmput
      0.00            +6.2        6.24 ± 43%  perf-profile.children.cycles-pp.exit_mmap
      0.00            +7.0        7.00 ± 51%  perf-profile.children.cycles-pp.perf_mmap__push
      0.00            +7.0        7.00 ± 51%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.00            +7.2        7.25 ± 52%  perf-profile.children.cycles-pp.__fput
      0.00            +7.3        7.35 ± 20%  perf-profile.children.cycles-pp.seq_read_iter
      0.00            +7.8        7.84 ± 21%  perf-profile.children.cycles-pp.vfs_read
      0.00            +7.9        7.88 ± 20%  perf-profile.children.cycles-pp.ksys_read
      0.00            +7.9        7.88 ± 20%  perf-profile.children.cycles-pp.read
      0.00            +9.9        9.93 ± 41%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
      0.02 ±141%     +11.1       11.10 ± 41%  perf-profile.children.cycles-pp.__cmd_record
      0.02 ±141%     +11.1       11.10 ± 41%  perf-profile.children.cycles-pp.cmd_record
      0.02 ±141%     +11.1       11.10 ± 41%  perf-profile.children.cycles-pp.handle_internal_command
      0.02 ±141%     +11.1       11.10 ± 41%  perf-profile.children.cycles-pp.main
      0.02 ±141%     +11.1       11.10 ± 41%  perf-profile.children.cycles-pp.run_builtin
      0.00           +11.2       11.18 ± 73%  perf-profile.children.cycles-pp.vfs_write
      0.00           +11.2       11.23 ± 73%  perf-profile.children.cycles-pp.ksys_write
      0.00           +11.2       11.23 ± 73%  perf-profile.children.cycles-pp.write
      0.00           +13.6       13.61 ± 44%  perf-profile.children.cycles-pp.do_exit
      0.00           +13.6       13.61 ± 44%  perf-profile.children.cycles-pp.do_group_exit
      1.70 ±  2%     +25.0       26.72 ± 15%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      1.21 ±  3%     +36.6       37.81 ±  7%  perf-profile.children.cycles-pp.acpi_safe_halt
      1.21 ±  3%     +36.6       37.86 ±  7%  perf-profile.children.cycles-pp.acpi_idle_do_entry
      1.21 ±  3%     +36.6       37.86 ±  7%  perf-profile.children.cycles-pp.acpi_idle_enter
      1.21 ±  3%     +37.4       38.57 ±  6%  perf-profile.children.cycles-pp.cpuidle_enter_state
      1.21 ±  3%     +37.4       38.66 ±  6%  perf-profile.children.cycles-pp.cpuidle_enter
      1.22 ±  3%     +37.6       38.82 ±  6%  perf-profile.children.cycles-pp.cpuidle_idle_call
      1.22 ±  3%     +38.5       39.67 ±  7%  perf-profile.children.cycles-pp.start_secondary
      1.22 ±  3%     +38.9       40.09 ±  6%  perf-profile.children.cycles-pp.common_startup_64
      1.22 ±  3%     +38.9       40.09 ±  6%  perf-profile.children.cycles-pp.cpu_startup_entry
      1.22 ±  3%     +38.9       40.09 ±  6%  perf-profile.children.cycles-pp.do_idle
     92.37           -92.4        0.00        perf-profile.self.cycles-pp.osq_lock
      1.19 ±  3%     +30.7       31.90 ±  7%  perf-profile.self.cycles-pp.acpi_safe_halt
      0.17 ±142%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
      0.19 ± 34%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      0.14 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      0.14 ± 73%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
      0.10 ± 66%     -99.9%       0.00 ±264%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
      0.11 ± 59%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.04 ±132%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.07 ±101%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.02 ± 31%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.02 ±143%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
      0.10 ± 44%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      0.12 ±145%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      0.04 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.25 ± 41%     -98.5%       0.00 ±105%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      0.11 ± 59%     -97.1%       0.00 ± 61%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.40 ± 50%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.32 ±104%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.01 ± 12%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.08 ± 28%     -99.5%       0.00 ±264%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.18 ± 57%     -96.8%       0.01 ±193%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.03 ± 83%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.01 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      0.02 ± 65%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      0.32 ± 47%     -98.2%       0.01 ± 42%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      0.19 ±185%     -96.5%       0.01 ± 33%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.07 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      0.26 ± 17%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.02 ± 60%     -94.2%       0.00 ±264%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      0.01 ±128%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      1.00 ±151%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
     25.45 ± 94%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      4.56 ± 67%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      3.55 ± 97%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
      2.13 ± 67%    -100.0%       0.00 ±264%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
      3.16 ± 78%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.30 ±159%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      1.61 ±100%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.03 ± 86%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.20 ±182%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
      3.51 ± 21%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      0.83 ±160%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      0.09 ± 31%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3.59 ± 11%     -99.9%       0.00 ±105%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      1.60 ± 69%     -99.6%       0.01 ±129%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.81 ± 43%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      1.02 ± 88%    -100.0%       0.00        perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.02 ±  7%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      9.68 ± 32%    -100.0%       0.00 ±264%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     12.26 ±109%    -100.0%       0.01 ±193%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      5.60 ±139%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.03 ±106%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      2.11 ± 61%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      3.67 ± 25%     -99.8%       0.01 ± 16%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      1.65 ±187%     -99.3%       0.01 ± 23%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     37.84 ± 47%    -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      4.68 ± 36%    -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.21 ±169%     -99.6%       0.00 ±264%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      7.92 ±131%     -99.2%       0.06 ± 92%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.36 ±186%    -100.0%       0.00        perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     33.45 ±  3%     -91.6%       2.81 ± 90%  perf-sched.total_wait_and_delay.average.ms
     97903 ±  4%     -98.2%       1776 ± 28%  perf-sched.total_wait_and_delay.count.ms
      2942 ± 23%     -95.2%     141.09 ± 36%  perf-sched.total_wait_and_delay.max.ms
     33.37 ±  3%     -91.9%       2.69 ± 95%  perf-sched.total_wait_time.average.ms
      2942 ± 23%     -96.7%      97.14 ± 19%  perf-sched.total_wait_time.max.ms
      3.97 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      3.08 ±  4%     -94.3%       0.18 ± 92%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    119.91 ± 38%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    433.73 ± 41%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    302.41 ±  5%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.48 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     23.24 ± 25%     -96.7%       0.76 ± 27%  perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
    327.16 ±  9%     -99.8%       0.76 ±188%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    369.37 ±  2%     -98.9%       4.03 ±204%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.96 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
    453.60          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
    187.66           -96.7%       6.11 ±109%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      2.37 ± 29%     -99.6%       0.01 ±264%  perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    750.07           -99.3%       5.10 ± 84%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1831 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      1269 ±  8%     -45.8%     688.12 ± 21%  perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      6.17 ± 45%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      5.00          -100.0%       0.00        perf-sched.wait_and_delay.count.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     14.33 ±  5%    -100.0%       0.00        perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
    810.00 ± 10%    -100.0%       0.00        perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      3112 ± 24%     -97.9%      65.75 ±106%  perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
     40.50 ±  8%     -98.8%       0.50 ±173%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     73021 ±  3%    -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
     40.00          -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.kcompactd.kthread.ret_from_fork
      1122           -99.0%      10.88 ± 98%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     11323 ±  3%     -93.6%     722.25 ± 20%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1887 ± 45%    -100.0%       0.88 ±264%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      1238           -93.9%      75.62 ± 79%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     35.19 ± 57%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      1002           -91.0%      89.82 ± 93%  perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    318.48 ± 65%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1000          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    966.90 ±  7%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
     20.79 ± 19%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      1043           -98.4%      16.64 ±214%  perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      1240 ± 20%     -99.9%       1.52 ±188%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    500.34           -96.9%      15.38 ±232%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     58.83 ± 39%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
    505.17          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
     19.77 ± 55%     -62.8%       7.36 ± 85%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1237 ± 34%     -91.7%     102.88 ± 33%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1001          -100.0%       0.05 ±264%  perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2794 ± 24%     -97.9%      59.20 ± 61%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     49.27 ±119%    -100.0%       0.01 ±264%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
     58.17 ±187%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
      3.78 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      2.99 ±  4%     -97.0%       0.09 ± 91%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      3.92 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
      4.71 ±  8%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
      1.67 ± 20%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
      2.10 ± 27%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.01 ± 44%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      1.67 ± 21%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.04 ±133%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
     67.14 ± 73%     -99.5%       0.32 ±177%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      1.65 ± 67%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
      2.30 ± 14%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
     42.44 ±200%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
    152.73 ±152%    -100.0%       0.06 ±249%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
    119.87 ± 38%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3.80 ± 18%     -99.9%       0.00 ±105%  perf-sched.wait_time.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
    433.32 ± 41%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    250.23 ±107%    -100.0%       0.00        perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
     29.19 ±  5%     -99.2%       0.25 ± 24%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
    302.40 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.40 ±  6%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      4.03 ±  8%     -99.9%       0.01 ±193%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     35.38 ±192%    -100.0%       0.00 ±264%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      0.05 ± 40%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.72 ±220%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      1.00 ±120%     -99.9%       0.00 ±264%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
     23.07 ± 24%     -97.1%       0.67 ± 10%  perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
    326.84 ±  9%     -99.6%       1.19 ±108%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    369.18 ±  2%     -98.7%       4.72 ±167%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.89 ±  6%    -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
      1.17 ± 16%     -99.7%       0.00 ±264%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    453.58          -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      4.42           -25.4%       3.30 ± 17%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    187.58           -96.8%       6.05 ±110%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      2.36 ± 29%     -99.1%       0.02 ± 84%  perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ±156%    -100.0%       0.00        perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
    750.01           -99.5%       3.45 ±141%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    340.69 ±135%    -100.0%       0.01 ±264%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
    535.09 ±128%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
     22.04 ± 32%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      1001           -95.5%      44.91 ± 93%  perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
     13.57 ± 17%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
     13.54 ± 10%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
     10.17 ± 19%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
     11.35 ± 25%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.01 ± 32%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
     10.62 ±  9%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.20 ±199%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      1559 ± 64%    -100.0%       0.44 ±167%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      6.93 ± 53%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
     14.42 ± 22%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
    159.10 ±148%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
    391.02 ±171%    -100.0%       0.12 ±256%  perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
    318.43 ± 65%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     13.14 ± 21%    -100.0%       0.00 ±105%  perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
      1000          -100.0%       0.00        perf-sched.wait_time.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    500.84 ± 99%    -100.0%       0.00        perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
    641.50 ± 23%     -99.2%       5.27 ± 76%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
     10.75 ± 98%     -89.8%       1.10 ± 78%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    966.89 ±  7%    -100.0%       0.00        perf-sched.wait_time.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
     15.80 ±  8%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     16.69 ± 10%    -100.0%       0.01 ±193%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     41.71 ±158%    -100.0%       0.00 ±264%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     11.64 ± 61%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      2.94 ±213%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
    175.70 ±210%    -100.0%       0.00 ±264%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      1043           -99.6%       4.46 ±105%  perf-sched.wait_time.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      1240 ± 20%     -99.8%       2.37 ±108%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    500.11           -96.5%      17.32 ±201%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     32.65 ± 33%    -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
     22.94 ± 56%    -100.0%       0.00 ±264%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    505.00          -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
     12.20 ± 43%     -59.2%       4.98        perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1237 ± 34%     -92.5%      92.94 ± 20%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1000          -100.0%       0.09 ±111%  perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.36 ±190%    -100.0%       0.00        perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      2794 ± 24%     -98.9%      30.12 ±114%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] /dev/zero: make private mapping full anonymous mapping
  2025-02-18  6:30             ` Oliver Sang
@ 2025-02-19  1:12               ` Yang Shi
  0 siblings, 0 replies; 35+ messages in thread
From: Yang Shi @ 2025-02-19  1:12 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, linux-kernel, arnd, gregkh, Liam.Howlett,
	lorenzo.stoakes, vbabka, jannh, willy, liushixin2, akpm,
	linux-mm




On 2/17/25 10:30 PM, Oliver Sang wrote:
> hi, Yang Shi,
>
> On Fri, Feb 14, 2025 at 02:53:37PM -0800, Yang Shi wrote:
>> On 2/12/25 6:04 PM, Oliver Sang wrote:
>>> hi, Yang Shi,
>>>
>>> On Fri, Feb 07, 2025 at 10:10:37AM -0800, Yang Shi wrote:
>>>> On 2/6/25 12:02 AM, Oliver Sang wrote:
>>> [...]
>>>
>>>>> since we applied your "/dev/zero: make private mapping full anonymous mapping"
>>>>> patch upon a68d3cbfad like below:
>>>>>
>>>>> * 7143ee2391f1e /dev/zero: make private mapping full anonymous mapping
>>>>> * a68d3cbfade64 memstick: core: fix kernel-doc notation
>>>>>
>>>>> so I applied below patch also upon a68d3cbfad.
>>>>>
>>>>> we saw big improvement but not that big.
>>>>>
>>>>> =========================================================================================
>>>>> compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
>>>>>      gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability
>>>>>
>>>>> commit:
>>>>>      a68d3cbfad ("memstick: core: fix kernel-doc notation")
>>>>>      52ec85cb99  <--- your patch
>>>>>
>>>>>
>>>>> a68d3cbfade64392 52ec85cb99e9b31dc304eae965a
>>>>> ---------------- ---------------------------
>>>>>             %stddev     %change         %stddev
>>>>>                 \          |                \
>>>>>      14364828 ±  4%    +410.6%   73349239 ±  3%  vm-scalability.throughput
>>>>>
>>>>> full comparison as below [1] just FYI.
>>>> Thanks for the update. I stared at the profiling report for a whole day, but
>>>> I didn't figure out where that 400% lost. I just saw the number of page
>>>> faults was fewer. And it seems like the reduction of page faults match the
>>>> 400% loss. So I did more trace and profiling.
>>>>
>>>> The test case did the below stuff in a tight loop:
>>>>     mmap 40K memory from /dev/zero (read only)
>>>>     read the area
>>>>
>>>> So two major factors to the performance: mmap and page fault. The
>>>> alternative patch did reduce the overhead of mmap to the same level as the
>>>> original patch.
>>>>
>>>> The further perf profiling showed the cost of page fault is higher than the
>>>> original patch. But the profiling of page fault was interesting:
>>>>
>>>> -   44.87%     0.01%  usemem [kernel.kallsyms]                   [k]
>>>> do_translation_fault
>>>>      - 44.86% do_translation_fault
>>>>         - 44.83% do_page_fault
>>>>            - 44.53% handle_mm_fault
>>>>                 9.04% __handle_mm_fault
>>>>
>>>> Page fault consumed 40% of cpu time in handle_mm_fault, but
>>>> __handle_mm_fault just consumed 9%, I expected it should be the major
>>>> consumer.
>>>>
>>>> So I annotated handle_mm_fault, then found the most time was consumed by
>>>> lru_gen_enter_fault() -> vma_has_recency() (my kernel has multi-gen LRU
>>>> enabled):
>>>>
>>>>         │     if (vma->vm_file && (vma->vm_file->f_mode & FMODE_NOREUSE))
>>>>          │     ↓ cbz     x1, b4
>>>>     0.00 │       ldr     w0, [x1, #12]
>>>>    99.59 │       eor     x0, x0, #0x800000
>>>>     0.00 │       ubfx    w0, w0, #23, #1
>>>>          │     current->in_lru_fault = vma_has_recency(vma);
>>>>     0.00 │ b4:   ldrh    w1, [x2, #1992]
>>>>     0.01 │       bfi     w1, w0, #5, #1
>>>>     0.00 │       strh    w1, [x2, #1992]
>>>>
>>>>
>>>> vma_has_recency() read vma->vm_file->f_mode if vma->vm_file is not NULL. But
>>>> that load took a long time. So I inspected struct file and saw:
>>>>
>>>> struct file {
>>>>       file_ref_t            f_ref;
>>>>       spinlock_t            f_lock;
>>>>       fmode_t                f_mode;
>>>>       const struct file_operations    *f_op;
>>>>       ...
>>>> }
>>>>
>>>> The f_mode is in the same cache line with f_ref (my kernel does NOT have
>>>> spin lock debug enabled). The test case mmap /dev/zero in a tight loop, so
>>>> the refcount is modified (fget/fput) very frequently, this resulted in
>>>> somehow false sharing.
>>>>
>>>> So I tried the below patch on top of the alternative patch:
>>>>
>>>> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
>>>> index f9157a0c42a5..ba11dc0b1c7c 100644
>>>> --- a/include/linux/mm_inline.h
>>>> +++ b/include/linux/mm_inline.h
>>>> @@ -608,6 +608,9 @@ static inline bool vma_has_recency(struct vm_area_struct
>>>> *vma)
>>>>           if (vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))
>>>>                   return false;
>>>>
>>>> +       if (vma_is_anonymous(vma))
>>>> +               return true;
>>>> +
>>>>           if (vma->vm_file && (vma->vm_file->f_mode & FMODE_NOREUSE))
>>>>                   return false;
>>>>
>>>> This made the profiling of page fault look normal:
>>>>
>>>>                           - 1.90% do_translation_fault
>>>>                              - 1.87% do_page_fault
>>>>                                 - 1.49% handle_mm_fault
>>>>                                    - 1.36% __handle_mm_fault
>>>>
>>>> Please try this in your test.
>>>>
>>>> But AFAICT I have never seen performance issue reported due to the false
>>>> sharing of refcount and other fields in struct file. This benchmark stressed
>>>> this quite badly.
>>> I applied your above patch upon alternative patch last time, then saw more
>>> improvement (+445.2% vs a68d3cbfad), but still not that big as in our original
>>> report.
>> Thanks for the update. It looks like the problem is still in page faults. I
>> did my test on arm64 machine. I also noticed struct file has
>> "__randomize_layout", so it may have different layout on x86 than arm64?
>>
>> The page fault handler may also access other fields of struct file that may
>> cause false sharing, for example, accessing f_mapping to read gfp flags.
>> This may not be a problem on my machine, but may be more costly on yours
>> depending on the real layout of struct file on the machines,
>>
>> Can you please try the below patch on top of the current patches? Thank you
>> so much for your patience.
> you are welcome!
>
> now has more improvements. I just list "a68d3cbfad + 3 patches so far" vs
> a68d3cbfad below, if you want more data, please let me know.
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
>    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability
>
> commit:
>    a68d3cbfad ("memstick: core: fix kernel-doc notation")
>    edc84ea79f  <--- a68d3cbfad + 3 patches so far
>    
> a68d3cbfade64392 edc84ea79f8dc11853076b96ad5
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>    14364828 ±  4%    +685.6%  1.129e+08 ±  5%  vm-scalability.throughput
>
> full data is as below [1] FYI.

Thank you for the update. It is close to the 800% target, and it looks 
like there may be still some overhead in page fault handler due to the 
false sharing. For example, the vma_is_dax() call in 
__thp_vma_allowable_orders() which is called if pmd is null. I'm not 
sure how much the impact could be. However, I'm not sure whether we 
should continue chasing it or not. Because the false sharing in struct 
file should be very rare for real life workload. The workload has to map 
the same file then do page fault again and again in a tight loop, and 
the struct file is shared by multiple processes. Such behavior should be 
rare in real life.

And changing the layout of struct file to avoid the false sharing sounds 
better than adding vma_is_anonymous() call in all the possible places. 
But it may introduce new false sharing. Having refcount in a dedicated 
cache line is doable too, however it will increase the size of struct 
file (from 192 bytes to 256 bytes). So neither seems worth it.

We can split all the patches into two parts, the first part is to avoid 
i_mmap_rwsem contention, the second part is the struct file false 
sharing. IMHO the first part is more real. I can come up with a formal 
patch then send to the mailing list

Thanks,
Yang

>> diff --git a/mm/memory.c b/mm/memory.c
>> index 539c0f7c6d54..1fa9dbce0f66 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3214,6 +3214,9 @@ static gfp_t __get_fault_gfp_mask(struct
>> vm_area_struct *vma)
>>   {
>>          struct file *vm_file = vma->vm_file;
>>
>> +       if (vma_is_anonymous(vma))
>> +               return GFP_KERNEL;
>> +
>>          if (vm_file)
>>                  return mapping_gfp_mask(vm_file->f_mapping) | __GFP_FS |
>> __GFP_IO;
>>
> [1]
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
>    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/small-allocs/vm-scalability
>
> commit:
>    a68d3cbfad ("memstick: core: fix kernel-doc notation")
>    edc84ea79f  <--- a68d3cbfad + 3 patches so far
>    
> a68d3cbfade64392 edc84ea79f8dc11853076b96ad5
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>   5.262e+09 ±  3%     -59.8%  2.114e+09 ±  2%  cpuidle..time
>     7924008 ±  3%     -83.9%    1275131 ±  5%  cpuidle..usage
>     1871164 ±  4%     -16.8%    1557233 ±  8%  numa-numastat.node3.local_node
>     1952164 ±  3%     -14.8%    1663189 ±  7%  numa-numastat.node3.numa_hit
>      399.52           -75.0%      99.77 ±  2%  uptime.boot
>       14507           -22.1%      11296        uptime.idle
>        3408 ±  5%     -99.8%       7.25 ± 46%  perf-c2c.DRAM.local
>       18076 ±  3%     -99.8%      43.00 ±100%  perf-c2c.DRAM.remote
>        8082 ±  5%     -99.8%      12.50 ± 63%  perf-c2c.HITM.local
>        6544 ±  6%     -99.7%      22.88 ±151%  perf-c2c.HITM.remote
>       14627 ±  4%     -99.8%      35.38 ±114%  perf-c2c.HITM.total
>        6.99 ±  3%    +177.6%      19.41 ±  3%  vmstat.cpu.id
>       91.35           -28.5%      65.31        vmstat.cpu.sy
>        1.71          +793.1%      15.25 ±  4%  vmstat.cpu.us
>       34204 ±  5%     -64.1%      12271 ±  9%  vmstat.system.cs
>      266575           -21.2%     210049        vmstat.system.in
>        6.49 ±  3%     +10.0       16.46 ±  3%  mpstat.cpu.all.idle%
>        0.63            -0.3        0.34 ±  3%  mpstat.cpu.all.irq%
>        0.03 ±  2%      +0.3        0.31 ±  4%  mpstat.cpu.all.soft%
>       91.17           -24.1       67.09        mpstat.cpu.all.sys%
>        1.68 ±  2%     +14.1       15.80 ±  4%  mpstat.cpu.all.usr%
>      337.33           -98.7%       4.25 ± 10%  mpstat.max_utilization.seconds
>      352.76           -84.7%      53.95 ±  4%  time.elapsed_time
>      352.76           -84.7%      53.95 ±  4%  time.elapsed_time.max
>      225965 ±  7%     -17.1%     187329 ± 12%  time.involuntary_context_switches
>   9.592e+08 ±  4%     +11.9%  1.074e+09        time.minor_page_faults
>       20852           -10.0%      18761        time.percent_of_cpu_this_job_got
>       72302           -88.6%       8227 ±  6%  time.system_time
>        1260 ±  3%     +50.7%       1899        time.user_time
>     5393707 ±  5%     -98.8%      66895 ± 21%  time.voluntary_context_switches
>     1609925           -50.7%     793216        meminfo.Active
>     1609925           -50.7%     793216        meminfo.Active(anon)
>      160837 ± 33%     -72.5%      44155 ±  9%  meminfo.AnonHugePages
>     4435665           -18.7%    3608195        meminfo.Cached
>     1775547           -44.2%     990889        meminfo.Committed_AS
>      148539           -47.4%      78096        meminfo.Mapped
>     4245538 ±  4%     -24.6%    3202495        meminfo.PageTables
>      929777           -88.9%     102759        meminfo.Shmem
>    25676018 ±  3%     +14.3%   29335678        meminfo.max_used_kB
>       64129 ±  4%    +706.8%     517389 ±  7%  vm-scalability.median
>       45.40 ±  5%   +2248.9        2294 ±  2%  vm-scalability.stddev%
>    14364828 ±  4%    +685.6%  1.129e+08 ±  5%  vm-scalability.throughput
>      352.76           -84.7%      53.95 ±  4%  vm-scalability.time.elapsed_time
>      352.76           -84.7%      53.95 ±  4%  vm-scalability.time.elapsed_time.max
>      225965 ±  7%     -17.1%     187329 ± 12%  vm-scalability.time.involuntary_context_switches
>   9.592e+08 ±  4%     +11.9%  1.074e+09        vm-scalability.time.minor_page_faults
>       20852           -10.0%      18761        vm-scalability.time.percent_of_cpu_this_job_got
>       72302           -88.6%       8227 ±  6%  vm-scalability.time.system_time
>        1260 ±  3%     +50.7%       1899        vm-scalability.time.user_time
>     5393707 ±  5%     -98.8%      66895 ± 21%  vm-scalability.time.voluntary_context_switches
>   4.316e+09 ±  4%     +11.9%  4.832e+09        vm-scalability.workload
>     1063552 ±  4%     -24.9%     799008 ±  3%  numa-meminfo.node0.PageTables
>      125455 ±106%     -85.5%      18164 ±165%  numa-meminfo.node0.Shmem
>     1062709 ±  4%     -25.7%     789746 ±  4%  numa-meminfo.node1.PageTables
>      176171 ± 71%     -92.4%      13303 ±230%  numa-meminfo.node1.Shmem
>       35515 ± 91%     -97.3%     976.55 ± 59%  numa-meminfo.node2.Mapped
>     1058901 ±  4%     -25.3%     791392 ±  4%  numa-meminfo.node2.PageTables
>      770405 ± 30%     -79.2%     160245 ±101%  numa-meminfo.node3.Active
>      770405 ± 30%     -79.2%     160245 ±101%  numa-meminfo.node3.Active(anon)
>      380096 ± 50%     -62.5%     142513 ± 98%  numa-meminfo.node3.AnonPages.max
>     1146977 ±108%     -92.8%      82894 ± 60%  numa-meminfo.node3.FilePages
>       52663 ± 47%     -97.2%       1488 ± 39%  numa-meminfo.node3.Mapped
>     1058539 ±  4%     -22.3%     821992 ±  3%  numa-meminfo.node3.PageTables
>      558943 ± 14%     -93.7%      35227 ±124%  numa-meminfo.node3.Shmem
>      265763 ±  4%     -24.9%     199601 ±  3%  numa-vmstat.node0.nr_page_table_pages
>       31364 ±106%     -85.5%       4539 ±165%  numa-vmstat.node0.nr_shmem
>      265546 ±  4%     -25.5%     197854 ±  5%  numa-vmstat.node1.nr_page_table_pages
>       44052 ± 71%     -92.5%       3323 ±230%  numa-vmstat.node1.nr_shmem
>        8961 ± 91%     -97.3%     244.02 ± 59%  numa-vmstat.node2.nr_mapped
>      264589 ±  4%     -25.2%     197920 ±  3%  numa-vmstat.node2.nr_page_table_pages
>      192683 ± 30%     -79.2%      40126 ±101%  numa-vmstat.node3.nr_active_anon
>      286819 ±108%     -92.8%      20761 ± 60%  numa-vmstat.node3.nr_file_pages
>       13124 ± 49%     -97.2%     372.02 ± 39%  numa-vmstat.node3.nr_mapped
>      264499 ±  4%     -22.4%     205376 ±  3%  numa-vmstat.node3.nr_page_table_pages
>      139810 ± 14%     -93.7%       8844 ±124%  numa-vmstat.node3.nr_shmem
>      192683 ± 30%     -79.2%      40126 ±101%  numa-vmstat.node3.nr_zone_active_anon
>     1951359 ±  3%     -14.9%    1661427 ±  7%  numa-vmstat.node3.numa_hit
>     1870359 ±  4%     -16.8%    1555470 ±  8%  numa-vmstat.node3.numa_local
>      402515           -50.7%     198246        proc-vmstat.nr_active_anon
>      170568            +1.8%     173591        proc-vmstat.nr_anon_pages
>     1109246           -18.7%     902238        proc-vmstat.nr_file_pages
>       37525           -47.3%      19768        proc-vmstat.nr_mapped
>     1059932 ±  4%     -24.2%     803105 ±  2%  proc-vmstat.nr_page_table_pages
>      232507           -89.0%      25623        proc-vmstat.nr_shmem
>       37297            -5.4%      35299        proc-vmstat.nr_slab_reclaimable
>      402515           -50.7%     198246        proc-vmstat.nr_zone_active_anon
>       61931 ±  8%     -83.9%       9948 ± 59%  proc-vmstat.numa_hint_faults
>       15755 ± 21%     -96.6%     541.38 ± 36%  proc-vmstat.numa_hint_faults_local
>     6916516 ±  3%      -8.0%    6360040        proc-vmstat.numa_hit
>     6568542 ±  3%      -8.5%    6012265        proc-vmstat.numa_local
>      293942 ±  3%     -68.8%      91724 ± 48%  proc-vmstat.numa_pte_updates
>   9.608e+08 ±  4%     +11.8%  1.074e+09        proc-vmstat.pgfault
>       55981 ±  2%     -68.7%      17541 ±  2%  proc-vmstat.pgreuse
>        0.82 ±  4%     -51.0%       0.40 ±  8%  perf-stat.i.MPKI
>   2.714e+10 ±  2%    +378.3%  1.298e+11 ±  9%  perf-stat.i.branch-instructions
>        0.11 ±  3%      +0.1        0.24 ±  8%  perf-stat.i.branch-miss-rate%
>    24932893          +306.8%  1.014e+08 ±  9%  perf-stat.i.branch-misses
>       64.93            -7.5       57.48        perf-stat.i.cache-miss-rate%
>    88563288 ±  3%     +35.0%  1.196e+08 ±  7%  perf-stat.i.cache-misses
>   1.369e+08 ±  3%     +43.7%  1.968e+08 ±  7%  perf-stat.i.cache-references
>       34508 ±  4%     -47.3%      18199 ±  9%  perf-stat.i.context-switches
>        7.67           -75.7%       1.87 ±  3%  perf-stat.i.cpi
>      224605           +22.5%     275084 ±  6%  perf-stat.i.cpu-clock
>      696.35 ±  2%     -53.5%     323.77 ±  2%  perf-stat.i.cpu-migrations
>       10834 ±  4%     -24.1%       8224 ± 11%  perf-stat.i.cycles-between-cache-misses
>   1.102e+11          +282.2%  4.212e+11 ±  9%  perf-stat.i.instructions
>        0.14          +334.6%       0.62 ±  5%  perf-stat.i.ipc
>       24.25 ±  3%    +626.9%     176.25 ±  4%  perf-stat.i.metric.K/sec
>     2722043 ±  3%    +803.8%   24600740 ±  9%  perf-stat.i.minor-faults
>     2722043 ±  3%    +803.8%   24600739 ±  9%  perf-stat.i.page-faults
>      224605           +22.5%     275084 ±  6%  perf-stat.i.task-clock
>        0.81 ±  3%     -62.2%       0.31 ± 11%  perf-stat.overall.MPKI
>        0.09            -0.0        0.08 ±  2%  perf-stat.overall.branch-miss-rate%
>       64.81            -2.4       62.37        perf-stat.overall.cache-miss-rate%
>        7.24           -70.7%       2.12 ±  5%  perf-stat.overall.cpi
>        8933 ±  4%     -21.9%       6978 ±  7%  perf-stat.overall.cycles-between-cache-misses
>        0.14          +242.2%       0.47 ±  5%  perf-stat.overall.ipc
>        9012 ±  2%     -57.8%       3806        perf-stat.overall.path-length
>   2.701e+10 ±  2%    +285.4%  1.041e+11 ±  5%  perf-stat.ps.branch-instructions
>    24708939          +215.8%   78042343 ±  4%  perf-stat.ps.branch-misses
>    89032538 ±  3%     +15.9%  1.032e+08 ±  8%  perf-stat.ps.cache-misses
>   1.374e+08 ±  3%     +20.6%  1.656e+08 ±  9%  perf-stat.ps.cache-references
>       34266 ±  5%     -66.2%      11570 ± 10%  perf-stat.ps.context-switches
>      223334            -1.6%     219861        perf-stat.ps.cpu-clock
>   7.941e+11            -9.9%  7.157e+11        perf-stat.ps.cpu-cycles
>      693.54 ±  2%     -67.2%     227.38 ±  4%  perf-stat.ps.cpu-migrations
>   1.097e+11          +208.3%  3.381e+11 ±  5%  perf-stat.ps.instructions
>     2710577 ±  3%    +626.7%   19698901 ±  5%  perf-stat.ps.minor-faults
>     2710577 ±  3%    +626.7%   19698902 ±  5%  perf-stat.ps.page-faults
>      223334            -1.6%     219861        perf-stat.ps.task-clock
>   3.886e+13 ±  2%     -52.7%  1.839e+13        perf-stat.total.instructions
>    64052898 ±  5%     -99.9%      81213 ± 23%  sched_debug.cfs_rq:/.avg_vruntime.avg
>    95701822 ±  7%     -96.4%    3425672 ±  7%  sched_debug.cfs_rq:/.avg_vruntime.max
>    43098762 ±  6%    -100.0%     153.42 ± 36%  sched_debug.cfs_rq:/.avg_vruntime.min
>     9223270 ±  9%     -95.9%     380347 ± 16%  sched_debug.cfs_rq:/.avg_vruntime.stddev
>        0.00 ± 22%    -100.0%       0.00        sched_debug.cfs_rq:/.h_nr_delayed.avg
>        0.69 ±  8%    -100.0%       0.00        sched_debug.cfs_rq:/.h_nr_delayed.max
>        0.05 ± 12%    -100.0%       0.00        sched_debug.cfs_rq:/.h_nr_delayed.stddev
>        0.78 ±  2%     -94.5%       0.04 ± 21%  sched_debug.cfs_rq:/.h_nr_running.avg
>        1.97 ±  5%     -49.3%       1.00        sched_debug.cfs_rq:/.h_nr_running.max
>        0.28 ±  7%     -29.1%       0.20 ± 10%  sched_debug.cfs_rq:/.h_nr_running.stddev
>      411536 ± 58%    -100.0%       1.15 ±182%  sched_debug.cfs_rq:/.left_deadline.avg
>    43049468 ± 22%    -100.0%     258.27 ±182%  sched_debug.cfs_rq:/.left_deadline.max
>     3836405 ± 37%    -100.0%      17.22 ±182%  sched_debug.cfs_rq:/.left_deadline.stddev
>      411536 ± 58%    -100.0%       1.06 ±191%  sched_debug.cfs_rq:/.left_vruntime.avg
>    43049467 ± 22%    -100.0%     236.56 ±191%  sched_debug.cfs_rq:/.left_vruntime.max
>     3836405 ± 37%    -100.0%      15.77 ±191%  sched_debug.cfs_rq:/.left_vruntime.stddev
>    64052901 ±  5%     -99.9%      81213 ± 23%  sched_debug.cfs_rq:/.min_vruntime.avg
>    95701822 ±  7%     -96.4%    3425672 ±  7%  sched_debug.cfs_rq:/.min_vruntime.max
>    43098762 ±  6%    -100.0%     153.42 ± 36%  sched_debug.cfs_rq:/.min_vruntime.min
>     9223270 ±  9%     -95.9%     380347 ± 16%  sched_debug.cfs_rq:/.min_vruntime.stddev
>        0.77 ±  2%     -94.4%       0.04 ± 21%  sched_debug.cfs_rq:/.nr_running.avg
>        1.50 ±  9%     -33.3%       1.00        sched_debug.cfs_rq:/.nr_running.max
>        0.26 ± 10%     -22.7%       0.20 ± 10%  sched_debug.cfs_rq:/.nr_running.stddev
>        1.61 ± 24%    +413.4%       8.24 ± 60%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
>       86.69          +508.6%     527.62 ±  4%  sched_debug.cfs_rq:/.removed.runnable_avg.max
>       11.14 ± 13%    +428.4%      58.87 ± 32%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
>        1.61 ± 24%    +413.3%       8.24 ± 60%  sched_debug.cfs_rq:/.removed.util_avg.avg
>       86.69          +508.6%     527.62 ±  4%  sched_debug.cfs_rq:/.removed.util_avg.max
>       11.14 ± 13%    +428.4%      58.87 ± 32%  sched_debug.cfs_rq:/.removed.util_avg.stddev
>      411536 ± 58%    -100.0%       1.06 ±191%  sched_debug.cfs_rq:/.right_vruntime.avg
>    43049467 ± 22%    -100.0%     236.56 ±191%  sched_debug.cfs_rq:/.right_vruntime.max
>     3836405 ± 37%    -100.0%      15.77 ±191%  sched_debug.cfs_rq:/.right_vruntime.stddev
>      769.03           -84.7%     117.79 ±  3%  sched_debug.cfs_rq:/.util_avg.avg
>        1621 ±  5%     -32.7%       1092 ± 16%  sched_debug.cfs_rq:/.util_avg.max
>      159.12 ±  8%     +33.2%     211.88 ±  7%  sched_debug.cfs_rq:/.util_avg.stddev
>      724.17 ±  2%     -98.6%      10.41 ± 32%  sched_debug.cfs_rq:/.util_est.avg
>        1360 ± 15%     -51.5%     659.38 ± 10%  sched_debug.cfs_rq:/.util_est.max
>      234.34 ±  9%     -68.2%      74.43 ± 18%  sched_debug.cfs_rq:/.util_est.stddev
>      766944 ±  3%     +18.9%     912012        sched_debug.cpu.avg_idle.avg
>     1067639 ±  5%     +25.5%    1339736 ±  9%  sched_debug.cpu.avg_idle.max
>        3799 ±  7%     -38.3%       2346 ± 23%  sched_debug.cpu.avg_idle.min
>      321459 ±  2%     -36.6%     203909 ±  7%  sched_debug.cpu.avg_idle.stddev
>      195573           -76.9%      45144        sched_debug.cpu.clock.avg
>      195596           -76.9%      45160        sched_debug.cpu.clock.max
>      195548           -76.9%      45123        sched_debug.cpu.clock.min
>       13.79 ±  3%     -36.0%       8.83 ±  2%  sched_debug.cpu.clock.stddev
>      194424           -76.8%      45019        sched_debug.cpu.clock_task.avg
>      194608           -76.8%      45145        sched_debug.cpu.clock_task.max
>      181834           -82.1%      32559        sched_debug.cpu.clock_task.min
>        4241 ±  2%     -96.8%     136.38 ± 21%  sched_debug.cpu.curr->pid.avg
>        9799 ±  2%     -59.8%       3934        sched_debug.cpu.curr->pid.max
>        1365 ± 10%     -49.1%     695.11 ± 10%  sched_debug.cpu.curr->pid.stddev
>      537665 ±  4%     +28.3%     690006 ±  6%  sched_debug.cpu.max_idle_balance_cost.max
>        3119 ± 56%    +479.5%      18078 ± 29%  sched_debug.cpu.max_idle_balance_cost.stddev
>        0.00 ± 12%     -68.3%       0.00 ± 17%  sched_debug.cpu.next_balance.stddev
>        0.78 ±  2%     -95.3%       0.04 ± 20%  sched_debug.cpu.nr_running.avg
>        2.17 ±  8%     -53.8%       1.00        sched_debug.cpu.nr_running.max
>        0.29 ±  8%     -35.4%       0.19 ±  9%  sched_debug.cpu.nr_running.stddev
>       25773 ±  5%     -97.0%     764.82 ±  3%  sched_debug.cpu.nr_switches.avg
>       48669 ± 10%     -77.2%      11080 ± 12%  sched_debug.cpu.nr_switches.max
>       19006 ±  7%     -99.2%     151.12 ± 15%  sched_debug.cpu.nr_switches.min
>        4142 ±  8%     -69.5%       1264 ±  6%  sched_debug.cpu.nr_switches.stddev
>        0.07 ± 23%     -93.3%       0.01 ± 53%  sched_debug.cpu.nr_uninterruptible.avg
>      240.19 ± 16%     -80.2%      47.50 ± 44%  sched_debug.cpu.nr_uninterruptible.max
>      -77.92           -88.1%      -9.25        sched_debug.cpu.nr_uninterruptible.min
>       37.87 ±  5%     -84.7%       5.78 ± 13%  sched_debug.cpu.nr_uninterruptible.stddev
>      195549           -76.9%      45130        sched_debug.cpu_clk
>      194699           -77.3%      44280        sched_debug.ktime
>        0.00          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.avg
>        0.17          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.max
>        0.01          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.stddev
>      196368           -76.6%      45975        sched_debug.sched_clk
>       95.59           -95.6        0.00        perf-profile.calltrace.cycles-pp.__mmap
>       95.54           -95.5        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       95.54           -95.5        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
>       94.54           -94.5        0.00        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       94.46           -94.4        0.07 ±264%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       94.45           -94.0        0.41 ±158%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       94.14           -93.9        0.29 ±134%  perf-profile.calltrace.cycles-pp.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
>       94.25           -93.8        0.41 ±158%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>       93.79           -93.7        0.07 ±264%  perf-profile.calltrace.cycles-pp.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
>       93.44           -93.4        0.00        perf-profile.calltrace.cycles-pp.down_write.vma_link_file.__mmap_new_vma.__mmap_region.do_mmap
>       93.40           -93.4        0.00        perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>       93.33           -93.3        0.00        perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file.__mmap_new_vma
>       92.89           -92.9        0.00        perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.vma_link_file
>        0.00            +1.7        1.69 ± 65%  perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
>        0.00            +1.9        1.90 ± 55%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
>        0.00            +1.9        1.90 ± 55%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
>        0.00            +1.9        1.93 ± 53%  perf-profile.calltrace.cycles-pp.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +1.9        1.93 ± 53%  perf-profile.calltrace.cycles-pp.seq_read_iter.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64
>        0.00            +2.0        1.99 ± 53%  perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +2.0        2.02 ± 64%  perf-profile.calltrace.cycles-pp.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
>        0.00            +2.3        2.27 ± 56%  perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +2.3        2.27 ± 56%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +2.3        2.27 ± 56%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +2.3        2.27 ± 56%  perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
>        0.00            +2.4        2.45 ± 53%  perf-profile.calltrace.cycles-pp._Fork
>        0.00            +2.5        2.51 ± 52%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +2.5        2.51 ± 52%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
>        0.00            +2.5        2.51 ± 52%  perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +2.5        2.51 ± 52%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +3.2        3.17 ± 42%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>        0.00            +3.3        3.28 ± 52%  perf-profile.calltrace.cycles-pp.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin
>        0.00            +3.3        3.28 ± 52%  perf-profile.calltrace.cycles-pp.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin.handle_internal_command
>        0.00            +4.1        4.10 ± 45%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.run_builtin.handle_internal_command.main
>        0.00            +4.1        4.10 ± 45%  perf-profile.calltrace.cycles-pp.cmd_record.run_builtin.handle_internal_command.main
>        0.00            +4.8        4.80 ± 61%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
>        0.00            +5.0        4.98 ± 69%  perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
>        0.00            +5.1        5.07 ± 71%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn
>        0.00            +5.1        5.07 ± 71%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn.perf_mmap__push
>        0.00            +5.1        5.07 ± 71%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen
>        0.00            +5.1        5.07 ± 71%  perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00            +5.1        5.07 ± 71%  perf-profile.calltrace.cycles-pp.write.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist
>        0.00            +5.1        5.07 ± 71%  perf-profile.calltrace.cycles-pp.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record
>        0.00            +5.1        5.11 ± 47%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
>        0.00            +5.1        5.12 ± 70%  perf-profile.calltrace.cycles-pp.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
>        0.00            +6.1        6.08 ± 50%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
>        0.00            +7.8        7.84 ± 21%  perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.00            +7.9        7.88 ± 20%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.00            +7.9        7.88 ± 20%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
>        0.00            +7.9        7.88 ± 20%  perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.00            +7.9        7.88 ± 20%  perf-profile.calltrace.cycles-pp.read
>        0.00           +11.1       11.10 ± 41%  perf-profile.calltrace.cycles-pp.handle_internal_command.main
>        0.00           +11.1       11.10 ± 41%  perf-profile.calltrace.cycles-pp.main
>        0.00           +11.1       11.10 ± 41%  perf-profile.calltrace.cycles-pp.run_builtin.handle_internal_command.main
>        0.00           +11.2       11.18 ± 73%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>        0.00           +15.9       15.94 ± 41%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.00           +15.9       15.94 ± 41%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>        0.00           +19.5       19.54 ± 41%  perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
>        1.21 ±  3%     +36.7       37.86 ±  7%  perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
>        1.21 ±  3%     +36.7       37.86 ±  7%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>        1.21 ±  3%     +37.0       38.24 ±  7%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
>        1.21 ±  3%     +37.2       38.41 ±  7%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>        1.21 ±  3%     +37.4       38.57 ±  6%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
>        1.22 ±  3%     +38.5       39.67 ±  7%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
>        1.22 ±  3%     +38.5       39.67 ±  7%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>        1.22 ±  3%     +38.5       39.67 ±  7%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
>        1.22 ±  3%     +38.9       40.09 ±  6%  perf-profile.calltrace.cycles-pp.common_startup_64
>        2.19 ±  3%     +45.2       47.41 ± 14%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
>       95.60           -95.4        0.22 ±135%  perf-profile.children.cycles-pp.__mmap
>       94.55           -93.9        0.60 ±103%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
>       94.14           -93.7        0.44 ±112%  perf-profile.children.cycles-pp.__mmap_new_vma
>       93.79           -93.7        0.10 ±264%  perf-profile.children.cycles-pp.vma_link_file
>       94.46           -93.5        0.96 ± 76%  perf-profile.children.cycles-pp.vm_mmap_pgoff
>       94.45           -93.5        0.96 ± 76%  perf-profile.children.cycles-pp.do_mmap
>       94.25           -93.4        0.86 ± 87%  perf-profile.children.cycles-pp.__mmap_region
>       93.40           -93.4        0.00        perf-profile.children.cycles-pp.rwsem_down_write_slowpath
>       93.33           -93.3        0.00        perf-profile.children.cycles-pp.rwsem_optimistic_spin
>       93.44           -93.2        0.22 ±149%  perf-profile.children.cycles-pp.down_write
>       92.91           -92.9        0.00        perf-profile.children.cycles-pp.osq_lock
>       95.58           -45.4       50.16 ±  8%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>       95.58           -45.4       50.16 ±  8%  perf-profile.children.cycles-pp.do_syscall_64
>        0.00            +1.1        1.12 ± 74%  perf-profile.children.cycles-pp.filemap_map_pages
>        0.00            +1.1        1.12 ± 76%  perf-profile.children.cycles-pp.vfs_fstatat
>        0.00            +1.2        1.19 ± 35%  perf-profile.children.cycles-pp.vsnprintf
>        0.00            +1.2        1.20 ± 46%  perf-profile.children.cycles-pp.seq_printf
>        0.00            +1.3        1.28 ± 78%  perf-profile.children.cycles-pp.__do_sys_newfstatat
>        0.00            +1.5        1.54 ± 75%  perf-profile.children.cycles-pp.folios_put_refs
>        0.00            +1.6        1.56 ± 52%  perf-profile.children.cycles-pp.__cond_resched
>        0.00            +1.6        1.60 ± 32%  perf-profile.children.cycles-pp.sched_balance_newidle
>        0.00            +1.7        1.69 ± 65%  perf-profile.children.cycles-pp.dup_mm
>        0.00            +1.9        1.93 ± 53%  perf-profile.children.cycles-pp.proc_reg_read_iter
>        0.00            +2.0        1.99 ± 53%  perf-profile.children.cycles-pp.copy_process
>        0.00            +2.1        2.06 ± 51%  perf-profile.children.cycles-pp.__x64_sys_ioctl
>        0.00            +2.1        2.08 ± 45%  perf-profile.children.cycles-pp.proc_single_show
>        0.00            +2.1        2.14 ± 45%  perf-profile.children.cycles-pp.seq_read
>        0.00            +2.2        2.16 ± 47%  perf-profile.children.cycles-pp.ioctl
>        0.00            +2.2        2.17 ± 33%  perf-profile.children.cycles-pp.schedule
>        0.00            +2.2        2.20 ± 28%  perf-profile.children.cycles-pp.__pick_next_task
>        0.00            +2.2        2.21 ± 47%  perf-profile.children.cycles-pp.perf_evsel__run_ioctl
>        0.00            +2.3        2.26 ± 58%  perf-profile.children.cycles-pp.do_read_fault
>        0.00            +2.3        2.27 ± 56%  perf-profile.children.cycles-pp.__do_sys_clone
>        0.00            +2.3        2.27 ± 56%  perf-profile.children.cycles-pp.kernel_clone
>        0.00            +2.4        2.37 ± 58%  perf-profile.children.cycles-pp.zap_present_ptes
>        0.00            +2.4        2.45 ± 53%  perf-profile.children.cycles-pp._Fork
>        0.00            +2.6        2.59 ± 53%  perf-profile.children.cycles-pp.__x64_sys_exit_group
>        0.00            +2.6        2.59 ± 53%  perf-profile.children.cycles-pp.x64_sys_call
>        0.00            +2.6        2.64 ± 44%  perf-profile.children.cycles-pp.do_pte_missing
>        0.00            +3.1        3.13 ± 59%  perf-profile.children.cycles-pp.zap_pte_range
>        0.00            +3.2        3.21 ± 58%  perf-profile.children.cycles-pp.zap_pmd_range
>        0.00            +3.4        3.40 ± 56%  perf-profile.children.cycles-pp.unmap_page_range
>        0.00            +3.4        3.43 ± 55%  perf-profile.children.cycles-pp.unmap_vmas
>        0.19 ± 23%      +3.9        4.06 ± 45%  perf-profile.children.cycles-pp.__handle_mm_fault
>        0.51 ±  6%      +4.0        4.49 ± 38%  perf-profile.children.cycles-pp.handle_mm_fault
>        0.04 ± 44%      +4.0        4.04 ± 28%  perf-profile.children.cycles-pp.__schedule
>        0.77 ±  3%      +4.4        5.18 ± 39%  perf-profile.children.cycles-pp.exc_page_fault
>        0.76 ±  3%      +4.4        5.18 ± 39%  perf-profile.children.cycles-pp.do_user_addr_fault
>        0.58 ±  2%      +4.7        5.26 ± 53%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
>        0.00            +5.1        5.07 ± 71%  perf-profile.children.cycles-pp.writen
>        0.00            +5.1        5.07 ± 69%  perf-profile.children.cycles-pp.generic_perform_write
>        0.00            +5.1        5.12 ± 47%  perf-profile.children.cycles-pp.exit_mm
>        0.00            +5.1        5.12 ± 70%  perf-profile.children.cycles-pp.record__pushfn
>        0.00            +5.1        5.12 ± 70%  perf-profile.children.cycles-pp.shmem_file_write_iter
>        1.18            +5.5        6.69 ± 33%  perf-profile.children.cycles-pp.asm_exc_page_fault
>        0.00            +6.2        6.24 ± 43%  perf-profile.children.cycles-pp.__mmput
>        0.00            +6.2        6.24 ± 43%  perf-profile.children.cycles-pp.exit_mmap
>        0.00            +7.0        7.00 ± 51%  perf-profile.children.cycles-pp.perf_mmap__push
>        0.00            +7.0        7.00 ± 51%  perf-profile.children.cycles-pp.record__mmap_read_evlist
>        0.00            +7.2        7.25 ± 52%  perf-profile.children.cycles-pp.__fput
>        0.00            +7.3        7.35 ± 20%  perf-profile.children.cycles-pp.seq_read_iter
>        0.00            +7.8        7.84 ± 21%  perf-profile.children.cycles-pp.vfs_read
>        0.00            +7.9        7.88 ± 20%  perf-profile.children.cycles-pp.ksys_read
>        0.00            +7.9        7.88 ± 20%  perf-profile.children.cycles-pp.read
>        0.00            +9.9        9.93 ± 41%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
>        0.02 ±141%     +11.1       11.10 ± 41%  perf-profile.children.cycles-pp.__cmd_record
>        0.02 ±141%     +11.1       11.10 ± 41%  perf-profile.children.cycles-pp.cmd_record
>        0.02 ±141%     +11.1       11.10 ± 41%  perf-profile.children.cycles-pp.handle_internal_command
>        0.02 ±141%     +11.1       11.10 ± 41%  perf-profile.children.cycles-pp.main
>        0.02 ±141%     +11.1       11.10 ± 41%  perf-profile.children.cycles-pp.run_builtin
>        0.00           +11.2       11.18 ± 73%  perf-profile.children.cycles-pp.vfs_write
>        0.00           +11.2       11.23 ± 73%  perf-profile.children.cycles-pp.ksys_write
>        0.00           +11.2       11.23 ± 73%  perf-profile.children.cycles-pp.write
>        0.00           +13.6       13.61 ± 44%  perf-profile.children.cycles-pp.do_exit
>        0.00           +13.6       13.61 ± 44%  perf-profile.children.cycles-pp.do_group_exit
>        1.70 ±  2%     +25.0       26.72 ± 15%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>        1.21 ±  3%     +36.6       37.81 ±  7%  perf-profile.children.cycles-pp.acpi_safe_halt
>        1.21 ±  3%     +36.6       37.86 ±  7%  perf-profile.children.cycles-pp.acpi_idle_do_entry
>        1.21 ±  3%     +36.6       37.86 ±  7%  perf-profile.children.cycles-pp.acpi_idle_enter
>        1.21 ±  3%     +37.4       38.57 ±  6%  perf-profile.children.cycles-pp.cpuidle_enter_state
>        1.21 ±  3%     +37.4       38.66 ±  6%  perf-profile.children.cycles-pp.cpuidle_enter
>        1.22 ±  3%     +37.6       38.82 ±  6%  perf-profile.children.cycles-pp.cpuidle_idle_call
>        1.22 ±  3%     +38.5       39.67 ±  7%  perf-profile.children.cycles-pp.start_secondary
>        1.22 ±  3%     +38.9       40.09 ±  6%  perf-profile.children.cycles-pp.common_startup_64
>        1.22 ±  3%     +38.9       40.09 ±  6%  perf-profile.children.cycles-pp.cpu_startup_entry
>        1.22 ±  3%     +38.9       40.09 ±  6%  perf-profile.children.cycles-pp.do_idle
>       92.37           -92.4        0.00        perf-profile.self.cycles-pp.osq_lock
>        1.19 ±  3%     +30.7       31.90 ±  7%  perf-profile.self.cycles-pp.acpi_safe_halt
>        0.17 ±142%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>        0.19 ± 34%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        0.14 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>        0.14 ± 73%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>        0.10 ± 66%     -99.9%       0.00 ±264%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>        0.11 ± 59%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.04 ±132%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>        0.07 ±101%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.02 ± 31%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        0.02 ±143%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
>        0.10 ± 44%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>        0.12 ±145%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>        0.04 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        0.25 ± 41%     -98.5%       0.00 ±105%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
>        0.11 ± 59%     -97.1%       0.00 ± 61%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
>        0.40 ± 50%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.32 ±104%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>        0.01 ± 12%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        0.08 ± 28%     -99.5%       0.00 ±264%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        0.18 ± 57%     -96.8%       0.01 ±193%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        0.03 ± 83%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
>        0.01 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>        0.02 ± 65%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>        0.32 ± 47%     -98.2%       0.01 ± 42%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>        0.19 ±185%     -96.5%       0.01 ± 33%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>        0.07 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        0.26 ± 17%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>        0.02 ± 60%     -94.2%       0.00 ±264%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>        0.01 ±128%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>        1.00 ±151%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>       25.45 ± 94%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        4.56 ± 67%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>        3.55 ± 97%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>        2.13 ± 67%    -100.0%       0.00 ±264%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>        3.16 ± 78%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.30 ±159%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>        1.61 ±100%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.03 ± 86%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        0.20 ±182%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
>        3.51 ± 21%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>        0.83 ±160%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>        0.09 ± 31%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        3.59 ± 11%     -99.9%       0.00 ±105%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
>        1.60 ± 69%     -99.6%       0.01 ±129%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
>        0.81 ± 43%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        1.02 ± 88%    -100.0%       0.00        perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>        0.02 ±  7%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        9.68 ± 32%    -100.0%       0.00 ±264%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>       12.26 ±109%    -100.0%       0.01 ±193%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        5.60 ±139%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
>        0.03 ±106%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>        2.11 ± 61%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>        3.67 ± 25%     -99.8%       0.01 ± 16%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>        1.65 ±187%     -99.3%       0.01 ± 23%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       37.84 ± 47%    -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        4.68 ± 36%    -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>        0.21 ±169%     -99.6%       0.00 ±264%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>        7.92 ±131%     -99.2%       0.06 ± 92%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.36 ±186%    -100.0%       0.00        perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>       33.45 ±  3%     -91.6%       2.81 ± 90%  perf-sched.total_wait_and_delay.average.ms
>       97903 ±  4%     -98.2%       1776 ± 28%  perf-sched.total_wait_and_delay.count.ms
>        2942 ± 23%     -95.2%     141.09 ± 36%  perf-sched.total_wait_and_delay.max.ms
>       33.37 ±  3%     -91.9%       2.69 ± 95%  perf-sched.total_wait_time.average.ms
>        2942 ± 23%     -96.7%      97.14 ± 19%  perf-sched.total_wait_time.max.ms
>        3.97 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        3.08 ±  4%     -94.3%       0.18 ± 92%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>      119.91 ± 38%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      433.73 ± 41%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      302.41 ±  5%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1.48 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>       23.24 ± 25%     -96.7%       0.76 ± 27%  perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>      327.16 ±  9%     -99.8%       0.76 ±188%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      369.37 ±  2%     -98.9%       4.03 ±204%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>        0.96 ±  6%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>      453.60          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>      187.66           -96.7%       6.11 ±109%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        2.37 ± 29%     -99.6%       0.01 ±264%  perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      750.07           -99.3%       5.10 ± 84%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1831 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        1269 ±  8%     -45.8%     688.12 ± 21%  perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>        6.17 ± 45%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        5.00          -100.0%       0.00        perf-sched.wait_and_delay.count.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       14.33 ±  5%    -100.0%       0.00        perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>      810.00 ± 10%    -100.0%       0.00        perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        3112 ± 24%     -97.9%      65.75 ±106%  perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
>       40.50 ±  8%     -98.8%       0.50 ±173%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>       73021 ±  3%    -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>       40.00          -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.kcompactd.kthread.ret_from_fork
>        1122           -99.0%      10.88 ± 98%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>       11323 ±  3%     -93.6%     722.25 ± 20%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        1887 ± 45%    -100.0%       0.88 ±264%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        1238           -93.9%      75.62 ± 79%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       35.19 ± 57%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        1002           -91.0%      89.82 ± 93%  perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>      318.48 ± 65%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        1000          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      966.90 ±  7%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>       20.79 ± 19%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        1043           -98.4%      16.64 ±214%  perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>        1240 ± 20%     -99.9%       1.52 ±188%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      500.34           -96.9%      15.38 ±232%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       58.83 ± 39%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>      505.17          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>       19.77 ± 55%     -62.8%       7.36 ± 85%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        1237 ± 34%     -91.7%     102.88 ± 33%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        1001          -100.0%       0.05 ±264%  perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        2794 ± 24%     -97.9%      59.20 ± 61%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       49.27 ±119%    -100.0%       0.01 ±264%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
>       58.17 ±187%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>        3.78 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        2.99 ±  4%     -97.0%       0.09 ± 91%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>        3.92 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>        4.71 ±  8%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>        1.67 ± 20%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>        2.10 ± 27%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.01 ± 44%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>        1.67 ± 21%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.04 ±133%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>       67.14 ± 73%     -99.5%       0.32 ±177%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        1.65 ± 67%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
>        2.30 ± 14%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>       42.44 ±200%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>      152.73 ±152%    -100.0%       0.06 ±249%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>      119.87 ± 38%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        3.80 ± 18%     -99.9%       0.00 ±105%  perf-sched.wait_time.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
>      433.32 ± 41%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      250.23 ±107%    -100.0%       0.00        perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>       29.19 ±  5%     -99.2%       0.25 ± 24%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
>      302.40 ±  5%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1.40 ±  6%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>        4.03 ±  8%     -99.9%       0.01 ±193%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       35.38 ±192%    -100.0%       0.00 ±264%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>        0.05 ± 40%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
>        0.72 ±220%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>        1.00 ±120%     -99.9%       0.00 ±264%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>       23.07 ± 24%     -97.1%       0.67 ± 10%  perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>      326.84 ±  9%     -99.6%       1.19 ±108%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      369.18 ±  2%     -98.7%       4.72 ±167%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>        0.89 ±  6%    -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>        1.17 ± 16%     -99.7%       0.00 ±264%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>      453.58          -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>        4.42           -25.4%       3.30 ± 17%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>      187.58           -96.8%       6.05 ±110%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        2.36 ± 29%     -99.1%       0.02 ± 84%  perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.01 ±156%    -100.0%       0.00        perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>      750.01           -99.5%       3.45 ±141%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      340.69 ±135%    -100.0%       0.01 ±264%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
>      535.09 ±128%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__do_fault.do_read_fault.do_pte_missing.__handle_mm_fault
>       22.04 ± 32%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>        1001           -95.5%      44.91 ± 93%  perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>       13.57 ± 17%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
>       13.54 ± 10%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
>       10.17 ± 19%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
>       11.35 ± 25%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
>        0.01 ± 32%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
>       10.62 ±  9%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.20 ±199%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        1559 ± 64%    -100.0%       0.44 ±167%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        6.93 ± 53%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
>       14.42 ± 22%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
>      159.10 ±148%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>      391.02 ±171%    -100.0%       0.12 ±256%  perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>      318.43 ± 65%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       13.14 ± 21%    -100.0%       0.00 ±105%  perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
>        1000          -100.0%       0.00        perf-sched.wait_time.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      500.84 ± 99%    -100.0%       0.00        perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>      641.50 ± 23%     -99.2%       5.27 ± 76%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
>       10.75 ± 98%     -89.8%       1.10 ± 78%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      966.89 ±  7%    -100.0%       0.00        perf-sched.wait_time.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>       15.80 ±  8%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
>       16.69 ± 10%    -100.0%       0.01 ±193%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       41.71 ±158%    -100.0%       0.00 ±264%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>       11.64 ± 61%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
>        2.94 ±213%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>      175.70 ±210%    -100.0%       0.00 ±264%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>        1043           -99.6%       4.46 ±105%  perf-sched.wait_time.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>        1240 ± 20%     -99.8%       2.37 ±108%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
>      500.11           -96.5%      17.32 ±201%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       32.65 ± 33%    -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
>       22.94 ± 56%    -100.0%       0.00 ±264%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>      505.00          -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>       12.20 ± 43%     -59.2%       4.98        perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        1237 ± 34%     -92.5%      92.94 ± 20%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>        1000          -100.0%       0.09 ±111%  perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.36 ±190%    -100.0%       0.00        perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>        2794 ± 24%     -98.9%      30.12 ±114%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2025-02-19  1:12 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-13 22:30 [PATCH] /dev/zero: make private mapping full anonymous mapping Yang Shi
2025-01-14 12:05 ` Lorenzo Stoakes
2025-01-14 16:53   ` Yang Shi
2025-01-14 18:14     ` Lorenzo Stoakes
2025-01-14 18:19       ` Lorenzo Stoakes
2025-01-14 18:21         ` Lorenzo Stoakes
2025-01-14 18:22         ` Matthew Wilcox
2025-01-14 18:26           ` Lorenzo Stoakes
2025-01-14 18:32       ` Jann Horn
2025-01-14 18:38         ` Lorenzo Stoakes
2025-01-14 19:03       ` Yang Shi
2025-01-14 19:13         ` Lorenzo Stoakes
2025-01-14 21:24           ` Yang Shi
2025-01-15 12:10             ` Lorenzo Stoakes
2025-01-15 21:29               ` Yang Shi
2025-01-15 22:05                 ` Christoph Lameter (Ampere)
2025-01-14 13:01 ` David Hildenbrand
2025-01-14 14:52   ` Lorenzo Stoakes
2025-01-14 15:06     ` David Hildenbrand
2025-01-14 17:01       ` Yang Shi
2025-01-14 17:23         ` David Hildenbrand
2025-01-14 17:38           ` Yang Shi
2025-01-14 17:46             ` David Hildenbrand
2025-01-14 18:05               ` Yang Shi
2025-01-14 17:02       ` David Hildenbrand
2025-01-14 17:20         ` Yang Shi
2025-01-14 17:24           ` David Hildenbrand
2025-01-28  3:14 ` kernel test robot
2025-01-31 18:38   ` Yang Shi
2025-02-06  8:02     ` Oliver Sang
2025-02-07 18:10       ` Yang Shi
2025-02-13  2:04         ` Oliver Sang
2025-02-14 22:53           ` Yang Shi
2025-02-18  6:30             ` Oliver Sang
2025-02-19  1:12               ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox