From: Peter Xu <peterx@redhat.com>
To: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org
Subject: Re: [PATCH v2] mm: memory: check userfaultfd_wp() in vmf_orig_pte_uffd_wp()
Date: Fri, 19 Apr 2024 11:17:27 -0400 [thread overview]
Message-ID: <ZiKLBxS8D_jzD6F9@x1n> (raw)
In-Reply-To: <fe4c5e73-7242-4d35-ad76-1ee55cae6155@huawei.com>
On Fri, Apr 19, 2024 at 11:00:46AM +0800, Kefeng Wang wrote:
>
>
> On 2024/4/19 0:32, Peter Xu wrote:
> > Hi, Kefeng,
> >
> > On Thu, Apr 18, 2024 at 08:06:41PM +0800, Kefeng Wang wrote:
> > > Add userfaultfd_wp() check in vmf_orig_pte_uffd_wp() to avoid the
> > > unnecessary pte_marker_entry_uffd_wp() in most pagefault, difference
> > > as shows below from perf data of lat_pagefault, note, the function
> > > vmf_orig_pte_uffd_wp() is not inlined in the two kernel versions.
> > >
> > > perf report -i perf.data.before | grep vmf
> > > 0.17% 0.13% lat_pagefault [kernel.kallsyms] [k] vmf_orig_pte_uffd_wp.part.0.isra.0
> > > perf report -i perf.data.after | grep vmf
> >
> > Any real number to share too besides the perf greps? I meant, even if perf
> > report will not report such function anymore, it doesn't mean it'll be
> > faster, and how much it improves?
>
> dd if=/dev/zero of=/tmp/XXX bs=512M count=1
> ./lat_pagefault -W 5 -N 5 /tmp/XXX
>
> before after
> 1 0.2623 0.2605
> 2 0.2622 0.2598
> 3 0.2621 0.2595
> 4 0.2622 0.2600
> 5 0.2651 0.2598
> 6 0.2624 0.2594
> 7 0.2624 0.2605
> 8 0.2627 0.2608
> average 0.262675 0.2600375 -0.0026375
>
> The lat_pagefault does show some improvement(also I reboot and retest,
> the results are same).
Thanks. Could you replace the perf report with these real data? Or just
append to it.
I had a look at the asm and indeed the current code will generate two
jumps when without this patch, and I don't know why..
0x0000000000006ac4 <+52>: test $0x8,%ah <---- check FAULT_FLAG_ORIG_PTE_VALID
0x0000000000006ac7 <+55>: jne 0x6bcf <set_pte_range+319>
0x0000000000006acd <+61>: mov 0x18(%rbp),%rsi
...
0x0000000000006bcf <+319>: mov 0x40(%rdi),%rdi
0x0000000000006bd3 <+323>: test $0xffffffffffffff9f,%rdi <---- pte_none() check
0x0000000000006bda <+330>: je 0x6acd <set_pte_range+61>
0x0000000000006be0 <+336>: test $0x101,%edi <---- pte_present() check
0x0000000000006be6 <+342>: jne 0x6acd <set_pte_range+61>
0x0000000000006bec <+348>: call 0x1c50 <pte_to_swp_entry>
0x0000000000006bf1 <+353>: mov 0x0(%rip),%rdx # 0x6bf8 <set_pte_range+360>
0x0000000000006bf8 <+360>: mov %rax,%r15
0x0000000000006bfb <+363>: shr $0x3a,%rax
0x0000000000006bff <+367>: cmp $0x1f,%rax
0x0000000000006c03 <+371>: mov $0x0,%eax
0x0000000000006c08 <+376>: cmovne %rax,%r15
0x0000000000006c0c <+380>: mov 0x28(%rbx),%eax
0x0000000000006c0f <+383>: and $0x1,%r15d
0x0000000000006c13 <+387>: jmp 0x6acd <set_pte_range+61>
I also don't know why the compiler cannot already merge the none+present
check into one shot, I thought it could. Also surprised me that
pte_to_swp_entry() is a function call.. but not involved in this context.
So I think I was right it should bypass this when seeing it pte_none,
however that includes two jumps.
And with your patch applied the two jumps are not there:
0x0000000000006b0c <+124>: testb $0x8,0x29(%r14) <--- FAULT_FLAG_ORIG_PTE_VALID
0x0000000000006b11 <+129>: je 0x6b6a <set_pte_range+218>
0x0000000000006b13 <+131>: mov (%r14),%rax
0x0000000000006b16 <+134>: testb $0x10,0x21(%rax) <--- userfaultfd_wp(vmf->vma) check
0x0000000000006b1a <+138>: je 0x6b6a <set_pte_range+218>
Maybe that's what contributes to that 0.x% extra time of a fault.
So if we do care about this 0.x% and we're doing this anyway, perhaps move
the vma check upper? Because afaict FAULT_FLAG_ORIG_PTE_VALID should
always hit in set_pte_range(), so we can avoid two more insts in the common
paths.
I'll leave that to you too if you want to mention some details in above and
add that into the commit log.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2024-04-19 15:17 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-18 12:06 Kefeng Wang
2024-04-18 16:32 ` Peter Xu
2024-04-19 3:00 ` Kefeng Wang
2024-04-19 15:17 ` Peter Xu [this message]
2024-04-20 4:05 ` Kefeng Wang
2024-04-21 13:53 ` Peter Xu
2024-04-22 2:13 ` Kefeng Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZiKLBxS8D_jzD6F9@x1n \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox