From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93A0FD3A66F for ; Tue, 29 Oct 2024 15:45:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A81B6B0095; Tue, 29 Oct 2024 11:45:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 257D56B0096; Tue, 29 Oct 2024 11:45:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 146C06B0098; Tue, 29 Oct 2024 11:45:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EAB136B0095 for ; Tue, 29 Oct 2024 11:45:52 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9628E161498 for ; Tue, 29 Oct 2024 15:45:52 +0000 (UTC) X-FDA: 82727064336.27.0401741 Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) by imf07.hostedemail.com (Postfix) with ESMTP id 5E1094001A for ; Tue, 29 Oct 2024 15:45:16 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KscaLewa; spf=pass (imf07.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.128.49 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730216670; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=66bPwhKpwBfWPtZQFUTRwh4k/Kr/jGoFRaplIb/urhQ=; b=gF08hcs9Fpk+YXCRhrP0VJCqmf8GozaB3XIYoyJ7CLr39P/vQ0Wf56dSuyV/bqVfD7BLi0 oUyrYvXA3Dc0ERYsTVPT9qfhMQQlt7tADSP3+Vb+CKaQLoE1Ku7EYYzhLg5WEf/QvPJasG oCbQzr+2yA8SHCoqhOlqpRBXPMgoi9s= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KscaLewa; spf=pass (imf07.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.128.49 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730216671; a=rsa-sha256; cv=none; b=OodE/jo8s/hMKmMdchC4LNlN9Ny5zqO/xtrbyFj04hHMzmWpblAx2XBbQUGHqL887lSCJ7 sMOpxIwfOHyu4WTCPbnJcd0xSfST3T0n6r1CI10qHG9yw2vrmOIqzt1WVvnZDEfuH0zgUo p6LdisjQGJMt7nmIUFucT6VC47w7qoQ= Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-4314fa33a35so53635245e9.1 for ; Tue, 29 Oct 2024 08:45:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730216749; x=1730821549; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=66bPwhKpwBfWPtZQFUTRwh4k/Kr/jGoFRaplIb/urhQ=; b=KscaLewa6Vgrs9Lf5sPWeNNq0yBHICuUnsXPjTpUW8zcGCpI0971HA8Lu+eTmGpp5C LjNmoOug41smum/e3Ouzqs+AKnM1fd54fFeQuxbhMOUf7Xp2YAaGjd5dAnTKo127h5tf RvTZk/9DDjwvNPSiaAuZdPaL4H3A5A6RM2YKt+oiD77X9VK5iwwNb8iHP/uugDcsD7fR eDY5o2SfEVXLw9OwqdozZ6+4EXKfPgMgOpCHdEg3mKbWW8T/aKKkwVxYylALXVhGDRDg Ln6JB/uMJvVtX3Lqt+JyfNcY1nKUZv44zuxffC7M/IDm0gXOePwu5ToapTZD3NF01deF ox/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730216749; x=1730821549; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=66bPwhKpwBfWPtZQFUTRwh4k/Kr/jGoFRaplIb/urhQ=; b=K1u94JuxZS70HHKxB5I8Gi6o0c7wDvbl7lZ4ZYqC0viJjt96QUiJrY7DwroAiJusup EyRgxJi9F2b2fb8u1S16VzcsPxFLy3tjA/VKh3zuOQcW/hySvntXbZtKMk2C0JKuCm48 /Z6IdfOTf/5JUdeijNI7cS9o3vExi4V9Ys5qJ/lDg9B4dOgoc7kKaAcHkPCcVD0Aqusu RpHv99PN+qDxfTfcPcQ8/Tj4fx461VxBxfKtrg7yHPf5xQRvqAMuDUQepZ0J29hFJ9a0 BJQtgu8J90QVl40bqUN7C3Mdza4TFCx2xXfSycR95rg+JR7HYTaTs+w+z6kJ5LYzErpy wVMg== X-Forwarded-Encrypted: i=1; AJvYcCXFDg/Jp+Yjc3k1Qia59aRDrZNavFAW4/1yo2MtSB37AA8sWkOUwcBAqKnDxSKpYuiIKMrmRaDDkQ==@kvack.org X-Gm-Message-State: AOJu0YyGXwAbJOotTr/+Z5TDlN8mntN5eFmaXshTteEgN4ve4emjC34q 2xD3Owv8vnEN+tetBfufCBriaKVL+htppQNfddO3DUBPoNo3dvNna3sKM+/Xn6T6YKm676fDSnK Oxi1sQE59jwyL2BEr5AsoW6p+IIE= X-Google-Smtp-Source: AGHT+IHjdP0h77G5PKkwY5FAMADMzotz+qaNx7ZVIir4IUyDNtdiKXwpdMqyd4HrTmzhtUJc+f5ugwaBXjqng2JEcRc= X-Received: by 2002:a05:600c:4fd3:b0:431:52b7:a485 with SMTP id 5b1f17b1804b1-431bb98fc9fmr1818785e9.19.1730216748967; Tue, 29 Oct 2024 08:45:48 -0700 (PDT) MIME-Version: 1.0 References: <20241028115343.3405838-1-linyunsheng@huawei.com> <472a7a09-387f-480d-b66c-761e0b6192ef@huawei.com> In-Reply-To: <472a7a09-387f-480d-b66c-761e0b6192ef@huawei.com> From: Alexander Duyck Date: Tue, 29 Oct 2024 08:45:11 -0700 Message-ID: Subject: Re: [PATCH net-next v23 0/7] Replace page_frag with page_frag_cache (Part-1) To: Yunsheng Lin Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Shuah Khan , Andrew Morton , Linux-MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 5E1094001A X-Stat-Signature: qxx8opcw8gn1xtj7dr7sibusb7j47qje X-HE-Tag: 1730216716-16750 X-HE-Meta: U2FsdGVkX1/9iBQh3RUYMy1GDeRMGAn7KKZkd3TcEiKznAO8PjTr3UGgrILOY0MWP5Gfew524eaLa+T38aH2IcfJ2UU2wYJmxcqBZie47ZJMSjNu2VzRxB2yjxOSmnmtGoCq7CebMhXJqk4pueG5rxlVjdRhKfYTQZiYDAxErRBi2xk+uMlTyZ2sXmjGJdbaw4nNMosmhYrBkt1r2lLQHZnxq70aCCjykHPEuT5M0qJqzVbBv3PsKZZECTVt000IwIywaiZTwracQr6ZqiKjHvuI9xKMWenL7uojz56M1w1TSNhzUMgGieOxnvQWjaU5bX/yfCT/TCxGLfLVN4pjdye6c5y5XHDEJg9TmGH6XtmCwXb4/YbhFUdQq4RA13r+j6g37Pdy71/O7fktZ5AGsjLHxfNbiBunX9w8PcwbTK3Ib4ELt4iaBwZeaOSuYKwBcZSPJWX0b0SdMvbtQ96OCX6Lt+kG7AEhiDmf8FfA9ibosQNCnkPS9JfIcA2UQV70NPbCagLPOJ5siSrP1prljt/w6x0lnxqG1fikLcr46yeTJfdvkzkj9g03xuZ4oCiUidEPnOa1xny8zv81dHqdMS0o3kNv0W5L3gfV+sPerYuM0vdkF/eJHP9vz0a8Pa1E2+05OLUk/CplrSL44KjFU7S8m3LJnlcrrDv6GYxJ914UKjqjLxV4KbslkjdL1+dzMPR+0DeWXtOYoqmX2XZGG2WbaynpgG6DVzikg4Md5uwtTacBjC0m4HBoDj3qgjytQJZ2G6cS5wLw1tDANgAUblZjhY4UiT7blRmjKb96Q3khRBT6PCs6UzGqWixFFNaCFzaLQajQOl2z2ZLdEKN+8/xVVTGZI8x0NgX7mLki3P/vUEjKYb85aNVLIH7DuWbOnBkD4u+nL5gRm8FPZTiQCTfqp4NvZvxPkw9P9ZE3IV0pDJgxgTdhqUeBcyDWC4iuA6FendSERwcVq1gDAPF 1GhfEFTr jWFsmRkjDkkkGeojc3rP5nfThE6yHRVVgWvogdRxC0S0ouvGL5ev5SQtSZYwBHIaGW9ZOQEzWULJHPf/28DXowLqqg5SK1jz4MLUrLYql+lrTPDOnEQ5s8/H7kv+mZ8eLGzy04dKWRM94Zm7TLk1d3ec1F3drRM08lfXTS1idwydmEPRQWxeMtIPcufM/EQd9sMz55jI2Dxey1Z0wH1iIX2VoiPRfpvf7ZJj4EAtRAFFBSEfafMUvU2+uI670zQahjLWCoT4y47Lv6Mq0uWxlQsF3toTCFX3izThRcW9b84PjQQMU6loscyD25apBBBNjIbvBkYHX3pzPl8uqPPgfLcoilqdTGDyW80ZL11BwaMmuza09HI/Vp3BklkQM1N2ae6sE2ExgbIY9ZiNsgTc71jjIRKX1h93za51+L/QoJyvsnew= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 29, 2024 at 2:36=E2=80=AFAM Yunsheng Lin wrote: > > On 2024/10/28 23:30, Alexander Duyck wrote: > > ... > > >> > >> > > > > Is this actually the numbers for this patch set? Seems like you have > > been using the same numbers for the last several releases. I can > > Yes, as recent refactoring doesn't seems big enough that the perf data is > reused for the last several releases. > > > understand the "before" being mostly the same, but since we have > > As there is rebasing for the latest net-next tree, even the 'before' > might not be the same as the testing seems sensitive to other changing, > like binary size changing and page allocator changing during different > version. > > So it might need both the same kernel and config for 'before' and 'after'= . > > > factored out the refactor portion of it the numbers for the "after" > > should have deviated as I find it highly unlikely the numbers are > > exactly the same down to the nanosecond. from the previous patch set. > Below is the the performance data for Part-1 with the latest net-next: > > Before this patchset: > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu= =3D16 test_pop_cpu=3D17 test_alloc_len=3D12 nr_test=3D51200000' (200 runs): > > 17.990790 task-clock (msec) # 0.003 CPUs utilize= d ( +- 0.19% ) > 8 context-switches # 0.444 K/sec = ( +- 0.09% ) > 0 cpu-migrations # 0.000 K/sec = ( +-100.00% ) > 81 page-faults # 0.004 M/sec = ( +- 0.09% ) > 46712295 cycles # 2.596 GHz = ( +- 0.19% ) > 34466157 instructions # 0.74 insn per cyc= le ( +- 0.01% ) > 8011755 branches # 445.325 M/sec = ( +- 0.01% ) > 39913 branch-misses # 0.50% of all branc= hes ( +- 0.07% ) > > 6.382252558 seconds time elapsed = ( +- 0.07% ) > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu= =3D16 test_pop_cpu=3D17 test_alloc_len=3D12 nr_test=3D51200000 test_align= =3D1' (200 runs): > > 17.638466 task-clock (msec) # 0.003 CPUs utilize= d ( +- 0.01% ) > 8 context-switches # 0.451 K/sec = ( +- 0.20% ) > 0 cpu-migrations # 0.001 K/sec = ( +- 70.53% ) > 81 page-faults # 0.005 M/sec = ( +- 0.08% ) > 45794305 cycles # 2.596 GHz = ( +- 0.01% ) > 34435077 instructions # 0.75 insn per cyc= le ( +- 0.00% ) > 8004416 branches # 453.805 M/sec = ( +- 0.00% ) > 39758 branch-misses # 0.50% of all branc= hes ( +- 0.06% ) > > 5.328976590 seconds time elapsed = ( +- 0.60% ) > > > After this patchset: > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu= =3D16 test_pop_cpu=3D17 test_alloc_len=3D12 nr_test=3D51200000' (200 runs): > > 18.647432 task-clock (msec) # 0.003 CPUs utilize= d ( +- 1.11% ) > 8 context-switches # 0.422 K/sec = ( +- 0.36% ) > 0 cpu-migrations # 0.005 K/sec = ( +- 22.54% ) > 81 page-faults # 0.004 M/sec = ( +- 0.08% ) > 48418108 cycles # 2.597 GHz = ( +- 1.11% ) > 35889299 instructions # 0.74 insn per cyc= le ( +- 0.11% ) > 8318363 branches # 446.086 M/sec = ( +- 0.11% ) > 19263 branch-misses # 0.23% of all branc= hes ( +- 0.13% ) > > 5.624666079 seconds time elapsed = ( +- 0.07% ) > > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu= =3D16 test_pop_cpu=3D17 test_alloc_len=3D12 nr_test=3D51200000 test_align= =3D1' (200 runs): > > 18.466768 task-clock (msec) # 0.007 CPUs utilize= d ( +- 1.23% ) > 8 context-switches # 0.428 K/sec = ( +- 0.26% ) > 0 cpu-migrations # 0.002 K/sec = ( +- 34.73% ) > 81 page-faults # 0.004 M/sec = ( +- 0.09% ) > 47949220 cycles # 2.597 GHz = ( +- 1.23% ) > 35859039 instructions # 0.75 insn per cyc= le ( +- 0.12% ) > 8309086 branches # 449.948 M/sec = ( +- 0.11% ) > 19246 branch-misses # 0.23% of all branc= hes ( +- 0.08% ) > > 2.573546035 seconds time elapsed = ( +- 0.04% ) > Interesting. It doesn't look like too much changed in terms of most of the metrics other than the fact that we reduced the number of branch misses by just over half. > > > > Also it wouldn't hurt to have an explanation for the 3.4->0.9 second > > performance change as it seems like the samples don't seem to match up > > with the elapsed time data. > > As there is also a 4.6->3.4 second performance change for the 'before' > part, I am not really thinking much at that. > > I am guessing some timing for implementation of ptr_ring or cpu cache > cause the above performance change? > > I used the same cpu for both pop and push thread, the performance change > doesn't seems to exist anymore, and the performance improvement doesn't > seems to exist anymore either: > > After this patchset: > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu= =3D0 test_pop_cpu=3D0 test_alloc_len=3D12 nr_test=3D512000' (10 runs): > > 13.293402 task-clock (msec) # 0.002 CPUs utilize= d ( +- 5.05% ) > 7 context-switches # 0.534 K/sec = ( +- 1.41% ) > 0 cpu-migrations # 0.015 K/sec = ( +-100.00% ) > 80 page-faults # 0.006 M/sec = ( +- 0.38% ) > 34494793 cycles # 2.595 GHz = ( +- 5.05% ) > 9663299 instructions # 0.28 insn per cyc= le ( +- 1.45% ) > 1767284 branches # 132.944 M/sec = ( +- 1.70% ) > 19798 branch-misses # 1.12% of all branc= hes ( +- 1.18% ) > > 8.119681413 seconds time elapsed = ( +- 0.01% ) > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu= =3D0 test_pop_cpu=3D0 test_alloc_len=3D12 nr_test=3D512000 test_align=3D1' = (10 runs): > > 12.289096 task-clock (msec) # 0.002 CPUs utilize= d ( +- 0.07% ) > 7 context-switches # 0.570 K/sec = ( +- 2.13% ) > 0 cpu-migrations # 0.033 K/sec = ( +- 66.67% ) > 81 page-faults # 0.007 M/sec = ( +- 0.43% ) > 31886319 cycles # 2.595 GHz = ( +- 0.07% ) > 9468850 instructions # 0.30 insn per cyc= le ( +- 0.06% ) > 1723487 branches # 140.245 M/sec = ( +- 0.05% ) > 19263 branch-misses # 1.12% of all branc= hes ( +- 0.47% ) > > 8.119686950 seconds time elapsed = ( +- 0.01% ) > > Before this patchset: > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu= =3D0 test_pop_cpu=3D0 test_alloc_len=3D12 nr_test=3D512000' (10 runs): > > 13.320328 task-clock (msec) # 0.002 CPUs utilize= d ( +- 5.00% ) > 7 context-switches # 0.541 K/sec = ( +- 1.85% ) > 0 cpu-migrations # 0.008 K/sec = ( +-100.00% ) > 80 page-faults # 0.006 M/sec = ( +- 0.36% ) > 34572091 cycles # 2.595 GHz = ( +- 5.01% ) > 9664910 instructions # 0.28 insn per cyc= le ( +- 1.51% ) > 1768276 branches # 132.750 M/sec = ( +- 1.80% ) > 19592 branch-misses # 1.11% of all branc= hes ( +- 1.33% ) > > 8.119686381 seconds time elapsed = ( +- 0.01% ) > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu= =3D0 test_pop_cpu=3D0 test_alloc_len=3D12 nr_test=3D512000 test_align=3D1' = (10 runs): > > 12.306471 task-clock (msec) # 0.002 CPUs utilize= d ( +- 0.08% ) > 7 context-switches # 0.585 K/sec = ( +- 1.85% ) > 0 cpu-migrations # 0.000 K/sec > 80 page-faults # 0.007 M/sec = ( +- 0.28% ) > 31937686 cycles # 2.595 GHz = ( +- 0.08% ) > 9462218 instructions # 0.30 insn per cyc= le ( +- 0.08% ) > 1721989 branches # 139.925 M/sec = ( +- 0.07% ) > 19114 branch-misses # 1.11% of all branc= hes ( +- 0.31% ) > > 8.118897296 seconds time elapsed = ( +- 0.00% ) That isn't too surprising. Most likely you are at the mercy of the scheduler and you are just waiting for it to cycle back and forth from producer to consumer and back in order to allow you to complete the test.