From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05D27C4345F for ; Sat, 20 Apr 2024 04:05:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3E67B6B0085; Sat, 20 Apr 2024 00:05:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 396EC6B0087; Sat, 20 Apr 2024 00:05:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 285566B0088; Sat, 20 Apr 2024 00:05:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0B9D26B0085 for ; Sat, 20 Apr 2024 00:05:15 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 27359A1EA5 for ; Sat, 20 Apr 2024 04:05:14 +0000 (UTC) X-FDA: 82028570148.10.3BC61A7 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf10.hostedemail.com (Postfix) with ESMTP id 4C37CC0009 for ; Sat, 20 Apr 2024 04:05:09 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713585912; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+hxfQvU4kbxXFyoqVxMfqijKyx4O8PyPZ1dpQaaP6Uw=; b=n69VC4mJ+J6ZA/wSgtfcMEdcrMi4arDvLJAsESHB1mG6qPyFOc9L6TpeJsVmqkDUf7oVhk 0C5EHPlS3VVc2K3DOAR3Qtw8xZ0VkrdygxtZZGiP2WncMj6VlCRgFJbJ6C9E+mNRoWFUhk sxmmbxU4PiHcaN9uVB1TjEBp0YcH2H4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713585912; a=rsa-sha256; cv=none; b=05vJOKSGq9oBd1poJQxx5mkRv06IJ9PT7Bgjf6L3n/cPsxkFAPgdgKBC/Q51Y8aBXQYidc ISG8mYSSl4d3x5OKQRebVXeP1kFg2AZJ7X1Q5GWRwCdkrMoY3DDpKAkewfT6yO1f1kMhaf KiI360agBobnZFwhY3KShdgJMA7EJ18= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4VLyXG3TpDzNnRT; Sat, 20 Apr 2024 12:02:38 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id 38A0F1800AA; Sat, 20 Apr 2024 12:05:05 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Sat, 20 Apr 2024 12:05:04 +0800 Message-ID: <073f3d13-5656-4de0-a62b-cee96f2b0eaa@huawei.com> Date: Sat, 20 Apr 2024 12:05:04 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] mm: memory: check userfaultfd_wp() in vmf_orig_pte_uffd_wp() Content-Language: en-US To: Peter Xu CC: Andrew Morton , References: <20240418120641.2653165-1-wangkefeng.wang@huawei.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemm100001.china.huawei.com (7.185.36.93) X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 4C37CC0009 X-Rspam-User: X-Stat-Signature: 7qt88b7e3mcrkyenwbkeih399aaohs5s X-HE-Tag: 1713585909-829160 X-HE-Meta: U2FsdGVkX18qpUHdXh5Mg48ehU2trFBYA8peSfBCEMzGcv8DpGi+RjIB34cIKVPndLdy22+4IOBVb5KQiZhv3G545+p4sYPDHtkYxeI3Arn8GBykbSxNHxj5MvLObxdD6S3pKVdj7ZSs21I+r79N8fEaDP0Hd00Ujw26W/bSaFOuty//z5NQps1Uz1DcHe2BQ+7xS5dhd6pyUi3Sr714SFXmDEwnTfQQFYwutw34jyJZX0sdhJPZABabDD5iXvfU7UAk89YX1F/D4Hkhl5gdJ+gJvwEJUtH6JrOHh/XYe/GhznEd1usva0h+dephWlLacxGBhPuCR7caEKG/NoqD0Jbj21Nfz9UvPkWTbwl3vsdJlzbzh/lcV9JDB5z2e3Z1cehCXpirr/T7rSDFnQY97x22StKGMp5D1fJuvUvfU2ChjCtpSlPP4U12lHehXFqkTE8nc8E+fFmGSUyJb0PZNIncZWd+utlmH5LV/ebn9qGDk/oeC/+nGdRqz7bVtF7pdgGxUmGipyqCXR0Zi3w22PeMpxYa7wU6qL/zsBfKAh8DDY7iuHOO9wyCN9OoQmF7qh+EdN1tdtgx5z1dnRr88VzWxU93M9LgMiaTtPpSSRzCi9zH5Wcw85IGw4Qw6xeSY3/PemYP2tHiWvqe/Ux/H9x7qFSbX4Te1AzfECfyhw9WARAqzUyWOdipwR1lBK6o2VDN9pqNm8+n4/oi3xEVB2WvpoZkuaujaQ3F+2zNBaumFfhotcCLPX6tjn8FMATotBlgNmQw1n6vNqbWEmifubJ8I9dbXKHOCi0JNdEd2hyKZMHP3IB4kNdnAhG+SoHDgQhN/UffRtDhchifGQ8ratCD+lO8iQl+xaniLsWGn+uXzUoR5CP++s5MvU+x2IgpI9S5wxVeq3N+JKHKubKps4wfc0e14Kf4+3X2/Be+9JIeMpTpeTXG51O7op0xCqzGDd2sjcbPntf/nInJDWs X02TozZD QULqkzYmn7Nktk9Bku8DRXubh1v9eUu7a4RHWoy7Vyym61GH6yvec3X9OewaKryc11DoF0+KnW1yAoP+w2e/7D0Rm388mRar4lorA+Y9rysC911wdr6O96ty7rMckxMwHUS1NnPLWI6ugZ6yKzsrV47/OB6UnWkoI9BQdX0N2Y6znxTFfijQ0J+kry92/MMd4A1OiMA6ERuU8DHacTjk4GIec/7gQE7epXstV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/4/19 23:17, Peter Xu wrote: > On Fri, Apr 19, 2024 at 11:00:46AM +0800, Kefeng Wang wrote: >> >> >> On 2024/4/19 0:32, Peter Xu wrote: >>> Hi, Kefeng, >>> >>> On Thu, Apr 18, 2024 at 08:06:41PM +0800, Kefeng Wang wrote: >>>> Add userfaultfd_wp() check in vmf_orig_pte_uffd_wp() to avoid the >>>> unnecessary pte_marker_entry_uffd_wp() in most pagefault, difference >>>> as shows below from perf data of lat_pagefault, note, the function >>>> vmf_orig_pte_uffd_wp() is not inlined in the two kernel versions. >>>> >>>> perf report -i perf.data.before | grep vmf >>>> 0.17% 0.13% lat_pagefault [kernel.kallsyms] [k] vmf_orig_pte_uffd_wp.part.0.isra.0 >>>> perf report -i perf.data.after | grep vmf >>> >>> Any real number to share too besides the perf greps? I meant, even if perf >>> report will not report such function anymore, it doesn't mean it'll be >>> faster, and how much it improves? >> >> dd if=/dev/zero of=/tmp/XXX bs=512M count=1 >> ./lat_pagefault -W 5 -N 5 /tmp/XXX >> >> before after >> 1 0.2623 0.2605 >> 2 0.2622 0.2598 >> 3 0.2621 0.2595 >> 4 0.2622 0.2600 >> 5 0.2651 0.2598 >> 6 0.2624 0.2594 >> 7 0.2624 0.2605 >> 8 0.2627 0.2608 >> average 0.262675 0.2600375 -0.0026375 >> >> The lat_pagefault does show some improvement(also I reboot and retest, >> the results are same). > > Thanks. Could you replace the perf report with these real data? Or just > append to it. Sure, will append it. > > I had a look at the asm and indeed the current code will generate two > jumps when without this patch, and I don't know why.. > > 0x0000000000006ac4 <+52>: test $0x8,%ah <---- check FAULT_FLAG_ORIG_PTE_VALID > 0x0000000000006ac7 <+55>: jne 0x6bcf > 0x0000000000006acd <+61>: mov 0x18(%rbp),%rsi > > ... > > 0x0000000000006bcf <+319>: mov 0x40(%rdi),%rdi > 0x0000000000006bd3 <+323>: test $0xffffffffffffff9f,%rdi <---- pte_none() check > 0x0000000000006bda <+330>: je 0x6acd > 0x0000000000006be0 <+336>: test $0x101,%edi <---- pte_present() check > 0x0000000000006be6 <+342>: jne 0x6acd > 0x0000000000006bec <+348>: call 0x1c50 > 0x0000000000006bf1 <+353>: mov 0x0(%rip),%rdx # 0x6bf8 > 0x0000000000006bf8 <+360>: mov %rax,%r15 > 0x0000000000006bfb <+363>: shr $0x3a,%rax > 0x0000000000006bff <+367>: cmp $0x1f,%rax > 0x0000000000006c03 <+371>: mov $0x0,%eax > 0x0000000000006c08 <+376>: cmovne %rax,%r15 > 0x0000000000006c0c <+380>: mov 0x28(%rbx),%eax > 0x0000000000006c0f <+383>: and $0x1,%r15d > 0x0000000000006c13 <+387>: jmp 0x6acd > > I also don't know why the compiler cannot already merge the none+present > check into one shot, I thought it could. Also surprised me that > pte_to_swp_entry() is a function call.. but not involved in this context. > > So I think I was right it should bypass this when seeing it pte_none, > however that includes two jumps. > > And with your patch applied the two jumps are not there: > > 0x0000000000006b0c <+124>: testb $0x8,0x29(%r14) <--- FAULT_FLAG_ORIG_PTE_VALID > 0x0000000000006b11 <+129>: je 0x6b6a > 0x0000000000006b13 <+131>: mov (%r14),%rax > 0x0000000000006b16 <+134>: testb $0x10,0x21(%rax) <--- userfaultfd_wp(vmf->vma) check > 0x0000000000006b1a <+138>: je 0x6b6a > > Maybe that's what contributes to that 0.x% extra time of a fault. > > So if we do care about this 0.x% and we're doing this anyway, perhaps move The latency of lat_pagefault increased a lot than the old kernel(vs 5.10), except mm counter updating, the another obvious difference shown from perf graph is the new vmf_orig_pte_uffd_wp(). > the vma check upper? Because afaict FAULT_FLAG_ORIG_PTE_VALID should > always hit in set_pte_range(), so we can avoid two more insts in the common > paths. Moving it upper is better, and maybe add __always_inline to vmf_orig_pte_uffd_wp() to make set_pte_range() only check VM_UFFD_WP from vm_flags? > > I'll leave that to you too if you want to mention some details in above and > add that into the commit log. Will update the changelog, thanks. > > Thanks, >