From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 996C7D2AB0D for ; Tue, 29 Oct 2024 09:36:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B12C16B00A6; Tue, 29 Oct 2024 05:36:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC2BA6B00A8; Tue, 29 Oct 2024 05:36:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98B1F6B00AA; Tue, 29 Oct 2024 05:36:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7B6D86B00A6 for ; Tue, 29 Oct 2024 05:36:52 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E2A1E40F86 for ; Tue, 29 Oct 2024 09:36:51 +0000 (UTC) X-FDA: 82726134666.21.0DC2159 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf17.hostedemail.com (Postfix) with ESMTP id 1480B4001E for ; Tue, 29 Oct 2024 09:36:29 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730194530; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eakWrMH7b9/zl3copotyPidqAlTCK4hVv+f0CRZjhAc=; b=oygUH5zNe/vWnmBXjNn31GNkYFunDlcP9w8u/s2n6RDkYqizn0oO/4TQkQP39gUak0fbMz 8XOZKAczr9rzGHWBoK9miMwq4ik9gn166KqRpgh1ix+X4FctLX+sxPQWFUqDAanAK40OJC esDARm1i5TGij2hwflGL954Q3xfRs1g= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730194530; a=rsa-sha256; cv=none; b=OjVbq2kd9cQ/sLgb3RQGBJPP+Od/Ax+NkFvGkv8KP2/q+9ME3Gnk9oSBUkmWCoriAAS+14 YeBxNf/rbipzV4oKINNcJIwunwb7CBpD2ZXTaTwbMntXJTpYUd123RRZaQMu6yFeWv03GD q5szJ6Kft0FvAQZzImasB7kiRuEIQLI= Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4Xd4r20kl5z20r6G; Tue, 29 Oct 2024 17:35:46 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 4F030180019; Tue, 29 Oct 2024 17:36:44 +0800 (CST) Received: from [10.67.120.129] (10.67.120.129) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 29 Oct 2024 17:36:44 +0800 Message-ID: <472a7a09-387f-480d-b66c-761e0b6192ef@huawei.com> Date: Tue, 29 Oct 2024 17:36:43 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next v23 0/7] Replace page_frag with page_frag_cache (Part-1) To: Alexander Duyck CC: , , , , , Shuah Khan , Andrew Morton , Linux-MM References: <20241028115343.3405838-1-linyunsheng@huawei.com> Content-Language: en-US From: Yunsheng Lin In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.120.129] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpemf200006.china.huawei.com (7.185.36.61) X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 1480B4001E X-Stat-Signature: dcte4bb687i9nr45qxdrhr9n8iezdhai X-HE-Tag: 1730194589-78478 X-HE-Meta: U2FsdGVkX18EeebkRBiM8tSWCv6Zsb49JI22m67IYHQIlvrMZkLh4GYxcZOjnveBL/E+PuF5iuq3ciNppafg1QKVRWflmC/RLb33jsRpk1FdH0gKBibr+nxJpTpNmwz4Q7JiP3Vr/FtThdNbgnh/0Ci357catffkyygFvix2q7sKLABwdYqbuHKeiyGe7H4Blcovnyg1s1PyqOD6oSdJPSYJkigVkghd5BiygOEK3cd2L7T1LIfrx2s3nwilcdKBxDdS93l4dbLtKl9SdSKBfQhIFC2Au3pLMETSTSBbT0Q54/xYXWZZUG5j1HmZI1o9Z2R8dafgA66Y5yZOq8hPboXfHimLDS3VvDhzxPX9lUf+8d5Aei+lKxUq3hu2LEAneJXEGv34FZq4tKmmW6aPbQmMpEYtW/E62VgJaVXv/sYHJqQtlNv2qqe9nZKBHQ5Br7gCU5g0MEF264FLeGecinmRWl2pfIu51/lXayhLKXhXqu3eqHIb/soXaNPucQ6/Os+ZRw35YdChjdYgHhYE5fZgx/1GODT3gi1Q5OGrL9RBI9eEIUMwE2goew5/R2Cf2QoEs4ka8RvWxTuouZsvrtSQ9ZbgYueNWOS+mWIYykf0Cmy0zD90F1tprlCSfW6Lx81EHpv6HH0D8eLFyPQwhmctcxN6mA7mIHEBZTxHPt5EoqChLak8Sjx6m4XU+ZIMayb3MN9CFLL9gfymjGBJz3TMu7Y8C/6KWKi8MYK3K6htxTWhptEFwQAp1wpyODr9JsAOLif2+BK5Ks2ilpSG7q/+LQ34v7aIr25c5KLVhMNuw3SmQabM6N3vPvxTRlNkY3jxPEHGbuoZHAUgiOBTK5ePhp1fRrnoM32XVrdo/RwjymlfqjtDSEbeQIjKD82eDFCiVSjrMbMhmLkI7nWkROFE9aGorWRxT1segSCXYDfqRe/16raogI2GDW9PfRcQcuM+iFZ5eWqwZHCAeLM o5LX3yk7 /aWKV1Xni22wJ5OaeB0foK1F8Qu1LelykDcwtjSXl8ez/RHGlJNcMkMhI9ubt66ljRtqqfV6xIcy552D/FwmeDQrdPAV3JAw3M8LHv6DZf8udlCF9cmicU7QtyZQ5T5qteqtKoGRdK8Y1H4I3lWKeoYK4GrLdEys+PChKG+K+Wm1BgN8B/TaEUGn7K8zVYwKawEFAQy8X4xtYbIAoDm0ty+0J1Vt7/HhYAkhU3vHJ980loQ/P7hpobeTZW2Rq2MoCI9bXENhBs7XdVpS8VBJdtlKHZuAQN++D398R X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/10/28 23:30, Alexander Duyck wrote: ... >> >> > > Is this actually the numbers for this patch set? Seems like you have > been using the same numbers for the last several releases. I can Yes, as recent refactoring doesn't seems big enough that the perf data is reused for the last several releases. > understand the "before" being mostly the same, but since we have As there is rebasing for the latest net-next tree, even the 'before' might not be the same as the testing seems sensitive to other changing, like binary size changing and page allocator changing during different version. So it might need both the same kernel and config for 'before' and 'after'. > factored out the refactor portion of it the numbers for the "after" > should have deviated as I find it highly unlikely the numbers are > exactly the same down to the nanosecond. from the previous patch set. Below is the the performance data for Part-1 with the latest net-next: Before this patchset: Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000' (200 runs): 17.990790 task-clock (msec) # 0.003 CPUs utilized ( +- 0.19% ) 8 context-switches # 0.444 K/sec ( +- 0.09% ) 0 cpu-migrations # 0.000 K/sec ( +-100.00% ) 81 page-faults # 0.004 M/sec ( +- 0.09% ) 46712295 cycles # 2.596 GHz ( +- 0.19% ) 34466157 instructions # 0.74 insn per cycle ( +- 0.01% ) 8011755 branches # 445.325 M/sec ( +- 0.01% ) 39913 branch-misses # 0.50% of all branches ( +- 0.07% ) 6.382252558 seconds time elapsed ( +- 0.07% ) Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 test_align=1' (200 runs): 17.638466 task-clock (msec) # 0.003 CPUs utilized ( +- 0.01% ) 8 context-switches # 0.451 K/sec ( +- 0.20% ) 0 cpu-migrations # 0.001 K/sec ( +- 70.53% ) 81 page-faults # 0.005 M/sec ( +- 0.08% ) 45794305 cycles # 2.596 GHz ( +- 0.01% ) 34435077 instructions # 0.75 insn per cycle ( +- 0.00% ) 8004416 branches # 453.805 M/sec ( +- 0.00% ) 39758 branch-misses # 0.50% of all branches ( +- 0.06% ) 5.328976590 seconds time elapsed ( +- 0.60% ) After this patchset: Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000' (200 runs): 18.647432 task-clock (msec) # 0.003 CPUs utilized ( +- 1.11% ) 8 context-switches # 0.422 K/sec ( +- 0.36% ) 0 cpu-migrations # 0.005 K/sec ( +- 22.54% ) 81 page-faults # 0.004 M/sec ( +- 0.08% ) 48418108 cycles # 2.597 GHz ( +- 1.11% ) 35889299 instructions # 0.74 insn per cycle ( +- 0.11% ) 8318363 branches # 446.086 M/sec ( +- 0.11% ) 19263 branch-misses # 0.23% of all branches ( +- 0.13% ) 5.624666079 seconds time elapsed ( +- 0.07% ) Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_alloc_len=12 nr_test=51200000 test_align=1' (200 runs): 18.466768 task-clock (msec) # 0.007 CPUs utilized ( +- 1.23% ) 8 context-switches # 0.428 K/sec ( +- 0.26% ) 0 cpu-migrations # 0.002 K/sec ( +- 34.73% ) 81 page-faults # 0.004 M/sec ( +- 0.09% ) 47949220 cycles # 2.597 GHz ( +- 1.23% ) 35859039 instructions # 0.75 insn per cycle ( +- 0.12% ) 8309086 branches # 449.948 M/sec ( +- 0.11% ) 19246 branch-misses # 0.23% of all branches ( +- 0.08% ) 2.573546035 seconds time elapsed ( +- 0.04% ) > > Also it wouldn't hurt to have an explanation for the 3.4->0.9 second > performance change as it seems like the samples don't seem to match up > with the elapsed time data. As there is also a 4.6->3.4 second performance change for the 'before' part, I am not really thinking much at that. I am guessing some timing for implementation of ptr_ring or cpu cache cause the above performance change? I used the same cpu for both pop and push thread, the performance change doesn't seems to exist anymore, and the performance improvement doesn't seems to exist anymore either: After this patchset: Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=0 test_alloc_len=12 nr_test=512000' (10 runs): 13.293402 task-clock (msec) # 0.002 CPUs utilized ( +- 5.05% ) 7 context-switches # 0.534 K/sec ( +- 1.41% ) 0 cpu-migrations # 0.015 K/sec ( +-100.00% ) 80 page-faults # 0.006 M/sec ( +- 0.38% ) 34494793 cycles # 2.595 GHz ( +- 5.05% ) 9663299 instructions # 0.28 insn per cycle ( +- 1.45% ) 1767284 branches # 132.944 M/sec ( +- 1.70% ) 19798 branch-misses # 1.12% of all branches ( +- 1.18% ) 8.119681413 seconds time elapsed ( +- 0.01% ) Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=0 test_alloc_len=12 nr_test=512000 test_align=1' (10 runs): 12.289096 task-clock (msec) # 0.002 CPUs utilized ( +- 0.07% ) 7 context-switches # 0.570 K/sec ( +- 2.13% ) 0 cpu-migrations # 0.033 K/sec ( +- 66.67% ) 81 page-faults # 0.007 M/sec ( +- 0.43% ) 31886319 cycles # 2.595 GHz ( +- 0.07% ) 9468850 instructions # 0.30 insn per cycle ( +- 0.06% ) 1723487 branches # 140.245 M/sec ( +- 0.05% ) 19263 branch-misses # 1.12% of all branches ( +- 0.47% ) 8.119686950 seconds time elapsed ( +- 0.01% ) Before this patchset: Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=0 test_alloc_len=12 nr_test=512000' (10 runs): 13.320328 task-clock (msec) # 0.002 CPUs utilized ( +- 5.00% ) 7 context-switches # 0.541 K/sec ( +- 1.85% ) 0 cpu-migrations # 0.008 K/sec ( +-100.00% ) 80 page-faults # 0.006 M/sec ( +- 0.36% ) 34572091 cycles # 2.595 GHz ( +- 5.01% ) 9664910 instructions # 0.28 insn per cycle ( +- 1.51% ) 1768276 branches # 132.750 M/sec ( +- 1.80% ) 19592 branch-misses # 1.11% of all branches ( +- 1.33% ) 8.119686381 seconds time elapsed ( +- 0.01% ) Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=0 test_pop_cpu=0 test_alloc_len=12 nr_test=512000 test_align=1' (10 runs): 12.306471 task-clock (msec) # 0.002 CPUs utilized ( +- 0.08% ) 7 context-switches # 0.585 K/sec ( +- 1.85% ) 0 cpu-migrations # 0.000 K/sec 80 page-faults # 0.007 M/sec ( +- 0.28% ) 31937686 cycles # 2.595 GHz ( +- 0.08% ) 9462218 instructions # 0.30 insn per cycle ( +- 0.08% ) 1721989 branches # 139.925 M/sec ( +- 0.07% ) 19114 branch-misses # 1.11% of all branches ( +- 0.31% ) 8.118897296 seconds time elapsed ( +- 0.00% )