From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BCB3C27C79 for ; Thu, 20 Jun 2024 08:44:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51AAA8D00AB; Thu, 20 Jun 2024 04:44:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A31A8D00A7; Thu, 20 Jun 2024 04:44:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 344308D00AB; Thu, 20 Jun 2024 04:44:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 157828D00A7 for ; Thu, 20 Jun 2024 04:44:35 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7E89CA2243 for ; Thu, 20 Jun 2024 08:44:34 +0000 (UTC) X-FDA: 82250630868.22.5212287 Received: from out30-100.freemail.mail.aliyun.com (out30-100.freemail.mail.aliyun.com [115.124.30.100]) by imf11.hostedemail.com (Postfix) with ESMTP id 3EE3540013 for ; Thu, 20 Jun 2024 08:44:30 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=TDXmRqoh; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf11.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.100 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718873062; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iW77EJVfYwKkFDl4zjX0fOY7BnxcYMTdOkYcNmQQZZo=; b=I+UHmYEeH01AECBCGI3xtm57IbygdLZdjStj3TjZ0350VONILFITGrNIPKw9sDALrVMS66 z9fa0VZg5dX1V3+fheaNbK7dmIe0PNsSo1jZFVZl1mdboOaQjQellcTiYGjWCG+ucNdthY SnHvnHlJuaj9j53Vlr/qYCzm0A6iLOI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718873062; a=rsa-sha256; cv=none; b=SNreBUgaK9R3gc9OFPt5VqnCXtlBpTEOV/HWZGC7n92C6SWPA1ZN0ki6Uxjsfh36UWc3NQ O0jAik+9XoIYr41CtaVYZ6Oa8VLM6tQYvxpG3wle+syq3lpqUOjkszYA3sbYtLwOffRCE0 LyRdH6Gta7O6FmKesRnBOQQCOXYksVE= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=TDXmRqoh; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf11.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.100 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1718873068; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=iW77EJVfYwKkFDl4zjX0fOY7BnxcYMTdOkYcNmQQZZo=; b=TDXmRqohfYVyCxhbIBI/rs+rQSjTYUQsBvXu52C6xFu50JNV0wqKTBeIbQwF1Rn3DxgMlQSpuFM/JA66eLBPSHXqlymxRIMOBy9l3fakUMDnf1ADYzAN8ZN9clMBYCd6JbAIbCMZ0q537k48O1LnKUxN7iP1Rd7pF4olcVdGPAw= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R221e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033045046011;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0W8rDW98_1718873066; Received: from 30.97.56.69(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0W8rDW98_1718873066) by smtp.aliyun-inc.com; Thu, 20 Jun 2024 16:44:27 +0800 Message-ID: <67930dc6-e9e4-44f2-8f10-74325a21b1d5@linux.alibaba.com> Date: Thu, 20 Jun 2024 16:44:26 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [linus:master] [mm] d2136d749d: vm-scalability.throughput -7.1% regression To: "Huang, Ying" Cc: kernel test robot , oe-lkp@lists.linux.dev, lkp@intel.com, linux-kernel@vger.kernel.org, Andrew Morton , David Hildenbrand , John Hubbard , Kefeng Wang , Mel Gorman , Ryan Roberts , linux-mm@kvack.org, feng.tang@intel.com, fengwei.yin@intel.com References: <202406201010.a1344783-oliver.sang@intel.com> <24a985cf-1bbf-41f9-b234-24f000164fa6@linux.alibaba.com> <87bk3w2he5.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Baolin Wang In-Reply-To: <87bk3w2he5.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3EE3540013 X-Stat-Signature: cwr5gaz4af983azjd4yi8rnznmqes7ys X-Rspam-User: X-HE-Tag: 1718873070-404723 X-HE-Meta: U2FsdGVkX19bkjAfTfTK9v+giN2BEwqb0gzgjTT2vW3MPioDmLFW2GihBWYqoYN0ELSGg/5dG5h61zGELvevTTYd7duxzaVXVVsVZoiye0egdafzypDVLQOc6egrbS2cW4lSzBP+tRBkfIMKudSPWXyUIAc+TTa3UIDLxmFym+9s9gZPe+Rl0zIlGXF7qditLzFLE1R+BFUMMN2lp9KrTTYrgqEUSw1USHvW8pGCyi63WSEpCMchMF8Rkyqynv1JIScaC750vLtyZlTErHRJHNQdhPXwJH2puIpZxEL4DrvXgOrMaIcReNDomgh37q0V1mY/c9TyWvHHM0QbB68rC4al4Wj4tByxpeIzCD0GequEYOMOgffr1mDUJ9XOvow/YhO4rePp+k8hTWJoBgsMkUBjGr4kT2/R1R+Q6JbrI4HQdUI0/d9bXahnpUQXHjsKpM4A91lCddYYXm7QZO2owyM9mrCrtNB6S69EszPPyDZr0o2wH7JsSzRTLyHypVNGrXC6LzDJk86lqqGIUaG8452n7sSbiWV8whh0pj+3MyieGVEpdbYvnQcRMyKyAkKyZZHaBlW8zPO1sYn4KWh05T5dDL9kI9Ww3PZFsolqsQptH3oShRHzAwHk8ZBRx0J31OkHFPkVL0j4/liMxVyUMLx8w/A/cA7y/g3ORNVhUzE50nmebgM0owuGO9ZeVEKXuaM2Y2s2P61YrabcOELAtniYUHQTgzZ0UR0XFGpqDDPrEIhoV9WCJ1O231v/zH3vqzApH+BqVo+ke0gespueC/MV7R+2oRsXpXdv3/Gb3SVWk2usKMmQ0/jwdeCwQi3vLQhfk/9w03mkaSkddpwNnmNp8VOWORyMtQc3eirlUfW5rPwqT3EXAWGpefxqwlrGs1KiybBKJi3cWI9yM5R9AZfx4OD2WrymlgZt+caqJStoloizszuN4bvOxhlCoDSbIUyBvDkiccUBJNmhqGv HdhKRrda Ozc/EVw7FCm3cfSdd3+2hG69FUQxJIU494nXI7OB0pHS3FBflgq5jeh4p3HNByMLSJlCHm5kmzZoGfoqXbVjxSzAm4ur7rBDankNOTFHEFsv9lpd8qYGxwSBHnO/LvvWZ0cIZXYr9aXw+War0AuPAsw/kP8ipPGdChqbrlYHt7l7m+qMT20viV8lYeiFvOA8UODOIOowKpmcNl97Xd6AT0FSSVXXbJiR4boA1X+EVOlj9AcMYtAMr3O9Rw/N+0rbyWuxF6XyD8X9326/eV3iI49lvVABpijd2mToWnFqBAoFD3GCd5Aln4ZY8ypetxxrGcp8nmdNsj61SNqx6MAXhqq0jDg+UYPebR0yiLXNzPWWxTtyC7X/Eg2whCaEPOv2bU+3IusS7CMZe/l8BGeGID8Dw782HXPJd6/D/yQ9911LVIN6UDrUJodU+DEoI2SdotIa0I1uqhOjdIetEzEvix9rcKVVs1u3WzEKvGzlpBL7Bq6M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/6/20 15:38, Huang, Ying wrote: > Baolin Wang writes: > >> On 2024/6/20 10:39, kernel test robot wrote: >>> Hello, >>> kernel test robot noticed a -7.1% regression of >>> vm-scalability.throughput on: >>> commit: d2136d749d76af980b3accd72704eea4eab625bd ("mm: support >>> multi-size THP numa balancing") >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master >>> [still regression on linus/master >>> 92e5605a199efbaee59fb19e15d6cc2103a04ec2] >>> testcase: vm-scalability >>> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory >>> parameters: >>> runtime: 300s >>> size: 512G >>> test: anon-cow-rand-hugetlb >>> cpufreq_governor: performance >> >> Thanks for reporting. IIUC numa balancing will not scan hugetlb VMA, >> I'm not sure how this patch affects the performance of hugetlb cow, >> but let me try to reproduce it. >> >> >>> If you fix the issue in a separate patch/commit (i.e. not just a new version of >>> the same patch/commit), kindly add following tags >>> | Reported-by: kernel test robot >>> | Closes: https://lore.kernel.org/oe-lkp/202406201010.a1344783-oliver.sang@intel.com >>> Details are as below: >>> --------------------------------------------------------------------------------------------------> >>> The kernel config and materials to reproduce are available at: >>> https://download.01.org/0day-ci/archive/20240620/202406201010.a1344783-oliver.sang@intel.com >>> ========================================================================================= >>> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase: >>> gcc-13/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/300s/512G/lkp-icl-2sp2/anon-cow-rand-hugetlb/vm-scalability >>> commit: >>> 6b0ed7b3c7 ("mm: factor out the numa mapping rebuilding into a new helper") >>> d2136d749d ("mm: support multi-size THP numa balancing") >>> 6b0ed7b3c77547d2 d2136d749d76af980b3accd7270 >>> ---------------- --------------------------- >>> %stddev %change %stddev >>> \ | \ >>> 12.02 -1.3 10.72 ± 4% mpstat.cpu.all.sys% >>> 1228757 +3.0% 1265679 proc-vmstat.pgfault > > Also from other proc-vmstat stats, > > 21770 36% +6.1% 23098 28% proc-vmstat.numa_hint_faults > 6168 107% +48.8% 9180 18% proc-vmstat.numa_hint_faults_local > 154537 15% +23.5% 190883 17% proc-vmstat.numa_pte_updates > > After your patch, more hint page faults occurs, I think this is expected. This is exactly my confusion, why are there more numa hint faults? The hugetlb VMAs will be skipped from scanning, so other VMAs of the application will use mTHP or large folio? > Then, tasks may be moved between sockets because of that, so that some > hugetlb page access becomes remote? Yes, that is possible if the application uses some large folio. >>> 7392513 -7.1% 6865649 vm-scalability.throughput >>> 17356 +9.4% 18986 vm-scalability.time.user_time >>> 0.32 ± 22% -36.9% 0.20 ± 17% sched_debug.cfs_rq:/.h_nr_running.stddev >>> 28657 ± 86% -90.8% 2640 ± 19% sched_debug.cfs_rq:/.load.stddev >>> 0.28 ± 35% -52.1% 0.13 ± 29% sched_debug.cfs_rq:/.nr_running.stddev >>> 299.88 ± 27% -39.6% 181.04 ± 23% sched_debug.cfs_rq:/.runnable_avg.stddev >>> 284.88 ± 32% -44.0% 159.65 ± 27% sched_debug.cfs_rq:/.util_avg.stddev >>> 0.32 ± 22% -37.2% 0.20 ± 17% sched_debug.cpu.nr_running.stddev >>> 1.584e+10 ± 2% -6.9% 1.476e+10 ± 3% perf-stat.i.branch-instructions >>> 11673151 ± 3% -6.3% 10935072 ± 4% perf-stat.i.branch-misses >>> 4.90 +3.5% 5.07 perf-stat.i.cpi >>> 333.40 +7.5% 358.32 perf-stat.i.cycles-between-cache-misses >>> 6.787e+10 ± 2% -6.8% 6.324e+10 ± 3% perf-stat.i.instructions >>> 0.25 -6.2% 0.24 perf-stat.i.ipc >>> 4.19 +7.5% 4.51 perf-stat.overall.cpi >>> 323.02 +7.4% 346.94 perf-stat.overall.cycles-between-cache-misses >>> 0.24 -7.0% 0.22 perf-stat.overall.ipc >>> 1.549e+10 ± 2% -6.8% 1.444e+10 ± 3% perf-stat.ps.branch-instructions >>> 6.634e+10 ± 2% -6.7% 6.186e+10 ± 3% perf-stat.ps.instructions >>> 17.33 ± 77% -10.6 6.72 ±169% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access >>> 17.30 ± 77% -10.6 6.71 ±169% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access >>> 17.30 ± 77% -10.6 6.71 ±169% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access >>> 17.28 ± 77% -10.6 6.70 ±169% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access >>> 17.27 ± 77% -10.6 6.70 ±169% perf-profile.calltrace.cycles-pp.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault >>> 13.65 ± 76% -8.4 5.29 ±168% perf-profile.calltrace.cycles-pp.hugetlb_wp.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault >>> 13.37 ± 76% -8.2 5.18 ±168% perf-profile.calltrace.cycles-pp.copy_user_large_folio.hugetlb_wp.hugetlb_fault.handle_mm_fault.do_user_addr_fault >>> 13.35 ± 76% -8.2 5.18 ±168% perf-profile.calltrace.cycles-pp.copy_subpage.copy_user_large_folio.hugetlb_wp.hugetlb_fault.handle_mm_fault >>> 13.23 ± 76% -8.1 5.13 ±168% perf-profile.calltrace.cycles-pp.copy_mc_enhanced_fast_string.copy_subpage.copy_user_large_folio.hugetlb_wp.hugetlb_fault >>> 3.59 ± 78% -2.2 1.39 ±169% perf-profile.calltrace.cycles-pp.__mutex_lock.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault >>> 17.35 ± 77% -10.6 6.73 ±169% perf-profile.children.cycles-pp.asm_exc_page_fault >>> 17.32 ± 77% -10.6 6.72 ±168% perf-profile.children.cycles-pp.do_user_addr_fault >>> 17.32 ± 77% -10.6 6.72 ±168% perf-profile.children.cycles-pp.exc_page_fault >>> 17.30 ± 77% -10.6 6.71 ±168% perf-profile.children.cycles-pp.handle_mm_fault >>> 17.28 ± 77% -10.6 6.70 ±169% perf-profile.children.cycles-pp.hugetlb_fault >>> 13.65 ± 76% -8.4 5.29 ±168% perf-profile.children.cycles-pp.hugetlb_wp >>> 13.37 ± 76% -8.2 5.18 ±168% perf-profile.children.cycles-pp.copy_user_large_folio >>> 13.35 ± 76% -8.2 5.18 ±168% perf-profile.children.cycles-pp.copy_subpage >>> 13.34 ± 76% -8.2 5.17 ±168% perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string >>> 3.59 ± 78% -2.2 1.39 ±169% perf-profile.children.cycles-pp.__mutex_lock >>> 13.24 ± 76% -8.1 5.13 ±168% perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string >>> Disclaimer: >>> Results have been estimated based on internal Intel analysis and are provided >>> for informational purposes only. Any difference in system hardware or software >>> design or configuration may affect actual performance. >>> > > -- > Best Regards, > Huang, Ying