From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71406C27C79 for ; Thu, 20 Jun 2024 07:40:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF3DC6B020E; Thu, 20 Jun 2024 03:40:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D7CFA6B024F; Thu, 20 Jun 2024 03:40:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF54E6B020E; Thu, 20 Jun 2024 03:40:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A11BA6B0179 for ; Thu, 20 Jun 2024 03:40:25 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 16240120A52 for ; Thu, 20 Jun 2024 07:40:25 +0000 (UTC) X-FDA: 82250469210.15.4E378B2 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) by imf16.hostedemail.com (Postfix) with ESMTP id 8D6CE180011 for ; Thu, 20 Jun 2024 07:40:22 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=h5btOjUX; spf=pass (imf16.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.17 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718869218; a=rsa-sha256; cv=none; b=7C/B9JDxrwHuXNRvy5wL555M+Ipy5J8T+VPP5Bbg/iNUmgcd3JoMnN3SO3taC6bZoS674P xO6jzu06MnP3SLsEffxG62IIwczBcxw874lcURit6EEXVFB3bpjl2OemvrM4Cww9HGgXqg 3nQMqShl1xk7tzwra2VU1SD9Mpp4O30= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=h5btOjUX; spf=pass (imf16.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.17 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718869218; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=R+msZz3M9FrSSG0+pvWxwBqjdw7byW4YhnuDj6YXfZc=; b=bHL0dD8DMacy45XkkyMbAsKk2WaB1J25DbNta2o7IWTO+QEyb3weBhkqJtkYmFPNEIhfwH hP4OlU33+6gHb0lvNzXUEix3V8bGwNpiTWsuBFa+mhBAI3a7Y0CjrJfSKZfuBacLQpn5FK yQNEp6oQjUgV/vuMpKxFxtPZXgt14fI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718869223; x=1750405223; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=eys4YZfmghIr8LFi5aEMDS8OaDCEOgOq5PfNE6K2/WQ=; b=h5btOjUXCK6B+ImxgFS+8655J5mbOjp10Cl8yFl4l2uJ6IdhlR5QuZEY 1U9Tq9Np9es9yV5i56afY0IVCwc1DhwHrkB2LdnY9T83bSXKwR92LmbV8 CSRVGaMaVQq2M5FH5s5bjswytW5WgM9LPitJvhL1Pcd9XjU9pLdGSPQWG 7FDaZwxSkSDRdNuIFFAey9l9gsFLstHhO1OXl3RvM1tH2bCOHTfpVx8fM tPLooOPUlotKV8dk3mvsxO2tAAtLy3/mfH2tH2roa5rVQIRpI3kDZc76H 84uBoYdf1Z1CRbM5Qlyj8J5w4goEz3XofAeNDiIa0HRaX2J0E5DX9VLm1 g==; X-CSE-ConnectionGUID: lkcE9aGpQNOj0da1DxeErw== X-CSE-MsgGUID: kgo7RXYPQN6lxGxChuaFBg== X-IronPort-AV: E=McAfee;i="6700,10204,11108"; a="15704465" X-IronPort-AV: E=Sophos;i="6.08,251,1712646000"; d="scan'208";a="15704465" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jun 2024 00:40:20 -0700 X-CSE-ConnectionGUID: Xxn6JXzkR/OcK4tW30u5cg== X-CSE-MsgGUID: PRu+wMO4TYyD/lD6rS00cw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,251,1712646000"; d="scan'208";a="42014316" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jun 2024 00:40:17 -0700 From: "Huang, Ying" To: Baolin Wang Cc: kernel test robot , , , , Andrew Morton , David Hildenbrand , John Hubbard , Kefeng Wang , Mel Gorman , Ryan Roberts , , , Subject: Re: [linus:master] [mm] d2136d749d: vm-scalability.throughput -7.1% regression In-Reply-To: <24a985cf-1bbf-41f9-b234-24f000164fa6@linux.alibaba.com> (Baolin Wang's message of "Thu, 20 Jun 2024 14:07:45 +0800") References: <202406201010.a1344783-oliver.sang@intel.com> <24a985cf-1bbf-41f9-b234-24f000164fa6@linux.alibaba.com> Date: Thu, 20 Jun 2024 15:38:26 +0800 Message-ID: <87bk3w2he5.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Stat-Signature: wxccho8xu4rjtwdx5m5w71xaefgp66ga X-Rspamd-Queue-Id: 8D6CE180011 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1718869222-805400 X-HE-Meta: U2FsdGVkX1+FdI25RXMIJu9bLe1KjhFpsOdo/uv6tR/4N9dBKE6aIyKBHA6SBVHmRXUtnjJ1S4pGI0NGquEP9n7Urrw6Ygi0sbH6BUiuKzRvyJrokRjvI+gnsgKg8ZVWBAw5AmfZ3mKBIcRtLq79AbuISKP3WtxdfewB5y5s0t5A02FfhAJSUkSldGWEAJVmDLed/RUA5a2LFuMua5En3YCKk3gP6Hbh1xhDxQYm2TMvpQwZDtjLhn+xhdeqQkhvRhHeK/Dp1n6K1e3aySWGRIjSlvB5T4RbZfkcFqLKOITcPn5wwlxajzxH+HbQWR0/F3U6VlcwamXKLe6wvnLn1y2UAqynyjCN3PGoj+MtstGqV8C22bXVUl/rsUuZ9NOZtpN8XbveUEbBozNDEoLGGyklLGqhtrHurRHp4IqzbxN58o3OUQWQJIbw1Q2DgDDWmwGi/CTueM6Mr8YTxngI8tvf6WRhmwnfzREE/3ldxatm/5ubEq3Q5wb24u+5NfotaaP5l52gO2TIfjADjSdJ+z6hSGU8n64Peh7OwvHBq6m/t30oTXrYH9kxeEsEdygmWxioLngZSOYVkcUI4bdA2IMzZbgMOIdgVMvzpiuXILT+cYByftXolKuEQ8ihiA5+MMZ+wE9Q7lzgulN6oBAkBGjs3qYFjreClLi+mRHLlAPh6B6XcDaCZLYe9C+NQGgoRARUTrVJw2nw1tXcKkqZYZh1yKOzykyzTkJqz2agiVqO6t+V6rJBcpjtTOPbhJvXFPvQeb2uobon+AeoEaHp2k7X9KoZdsJohjtY/16q0OarBbX0F03iOFNryxSzl5xMU7EL4trCOsykbbBxDRArY9R7QB+Z5PYBs0uoECVRB9zsfB1hfnpJ50h12E0SZsfbnLcLf5W3ltt3sWTCpWce1/rnCJ2i0KABlmwczonPbCcTPbt+VJc2xCkmLk0DPzpyEJVoCYpvWRNOuZ5PAV5 SmLgHmIZ H1wm9T1valSQSHP2XRyeFtgS4yiF9O2IQMKLZtJdvuqv0OzLUBTr69E+Xwa8pE827fxs5fw2l3x2h9qGtSDqrmwf9uhVuQcs40tXGyIzdJ8sP8KGjyArV8uXa42abhMVJU5cr2/Gd0ABg+JZVpmY9j5DqBZwRKn1SIQSAVDZ+RC2YBxCZISWG6H+jG6sK9huurJCgC4Ey7SUJY13pOsZ/FsLkRtg5aexJtiHFX+H/OGwqV3EstiPidOF81S6HN41b2Y3OoaJPn9d/qSwggFq/nhFkA7aVtPKVMm3CGjChc1pttjOe4xK2QoEULQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Baolin Wang writes: > On 2024/6/20 10:39, kernel test robot wrote: >> Hello, >> kernel test robot noticed a -7.1% regression of >> vm-scalability.throughput on: >> commit: d2136d749d76af980b3accd72704eea4eab625bd ("mm: support >> multi-size THP numa balancing") >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master >> [still regression on linus/master >> 92e5605a199efbaee59fb19e15d6cc2103a04ec2] >> testcase: vm-scalability >> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.0= 0GHz (Ice Lake) with 256G memory >> parameters: >> runtime: 300s >> size: 512G >> test: anon-cow-rand-hugetlb >> cpufreq_governor: performance > > Thanks for reporting. IIUC numa balancing will not scan hugetlb VMA, > I'm not sure how this patch affects the performance of hugetlb cow, > but let me try to reproduce it. > > >> If you fix the issue in a separate patch/commit (i.e. not just a new ver= sion of >> the same patch/commit), kindly add following tags >> | Reported-by: kernel test robot >> | Closes: https://lore.kernel.org/oe-lkp/202406201010.a1344783-oliver.sa= ng@intel.com >> Details are as below: >> ------------------------------------------------------------------------= --------------------------> >> The kernel config and materials to reproduce are available at: >> https://download.01.org/0day-ci/archive/20240620/202406201010.a1344783-o= liver.sang@intel.com >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/te= stcase: >> gcc-13/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/300s= /512G/lkp-icl-2sp2/anon-cow-rand-hugetlb/vm-scalability >> commit: >> 6b0ed7b3c7 ("mm: factor out the numa mapping rebuilding into a new he= lper") >> d2136d749d ("mm: support multi-size THP numa balancing") >> 6b0ed7b3c77547d2 d2136d749d76af980b3accd7270 >> ---------------- --------------------------- >> %stddev %change %stddev >> \ | \ >> 12.02 -1.3 10.72 =C2=B1 4% mpstat.cpu.all.sys% >> 1228757 +3.0% 1265679 proc-vmstat.pgfault Also from other proc-vmstat stats, 21770 36% +6.1% 23098 28% proc-vmstat.numa_hint_faults 6168 107% +48.8% 9180 18% proc-vmstat.numa_hint_faults_lo= cal 154537 15% +23.5% 190883 17% proc-vmstat.numa_pte_updates After your patch, more hint page faults occurs, I think this is expected. Then, tasks may be moved between sockets because of that, so that some hugetlb page access becomes remote? >> 7392513 -7.1% 6865649 vm-scalability.throughput >> 17356 +9.4% 18986 vm-scalability.time.user_= time >> 0.32 =C2=B1 22% -36.9% 0.20 =C2=B1 17% sched_debug.cfs= _rq:/.h_nr_running.stddev >> 28657 =C2=B1 86% -90.8% 2640 =C2=B1 19% sched_debug.cfs= _rq:/.load.stddev >> 0.28 =C2=B1 35% -52.1% 0.13 =C2=B1 29% sched_debug.cfs= _rq:/.nr_running.stddev >> 299.88 =C2=B1 27% -39.6% 181.04 =C2=B1 23% sched_debug.cfs= _rq:/.runnable_avg.stddev >> 284.88 =C2=B1 32% -44.0% 159.65 =C2=B1 27% sched_debug.cfs= _rq:/.util_avg.stddev >> 0.32 =C2=B1 22% -37.2% 0.20 =C2=B1 17% sched_debug.cpu= .nr_running.stddev >> 1.584e+10 =C2=B1 2% -6.9% 1.476e+10 =C2=B1 3% perf-stat.i.bra= nch-instructions >> 11673151 =C2=B1 3% -6.3% 10935072 =C2=B1 4% perf-stat.i.bra= nch-misses >> 4.90 +3.5% 5.07 perf-stat.i.cpi >> 333.40 +7.5% 358.32 perf-stat.i.cycles-betwee= n-cache-misses >> 6.787e+10 =C2=B1 2% -6.8% 6.324e+10 =C2=B1 3% perf-stat.i.ins= tructions >> 0.25 -6.2% 0.24 perf-stat.i.ipc >> 4.19 +7.5% 4.51 perf-stat.overall.cpi >> 323.02 +7.4% 346.94 perf-stat.overall.cycles-= between-cache-misses >> 0.24 -7.0% 0.22 perf-stat.overall.ipc >> 1.549e+10 =C2=B1 2% -6.8% 1.444e+10 =C2=B1 3% perf-stat.ps.br= anch-instructions >> 6.634e+10 =C2=B1 2% -6.7% 6.186e+10 =C2=B1 3% perf-stat.ps.in= structions >> 17.33 =C2=B1 77% -10.6 6.72 =C2=B1169% perf-profile.ca= lltrace.cycles-pp.asm_exc_page_fault.do_access >> 17.30 =C2=B1 77% -10.6 6.71 =C2=B1169% perf-profile.ca= lltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access >> 17.30 =C2=B1 77% -10.6 6.71 =C2=B1169% perf-profile.ca= lltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_a= ccess >> 17.28 =C2=B1 77% -10.6 6.70 =C2=B1169% perf-profile.ca= lltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc= _page_fault.do_access >> 17.27 =C2=B1 77% -10.6 6.70 =C2=B1169% perf-profile.ca= lltrace.cycles-pp.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page= _fault.asm_exc_page_fault >> 13.65 =C2=B1 76% -8.4 5.29 =C2=B1168% perf-profile.ca= lltrace.cycles-pp.hugetlb_wp.hugetlb_fault.handle_mm_fault.do_user_addr_fau= lt.exc_page_fault >> 13.37 =C2=B1 76% -8.2 5.18 =C2=B1168% perf-profile.ca= lltrace.cycles-pp.copy_user_large_folio.hugetlb_wp.hugetlb_fault.handle_mm_= fault.do_user_addr_fault >> 13.35 =C2=B1 76% -8.2 5.18 =C2=B1168% perf-profile.ca= lltrace.cycles-pp.copy_subpage.copy_user_large_folio.hugetlb_wp.hugetlb_fau= lt.handle_mm_fault >> 13.23 =C2=B1 76% -8.1 5.13 =C2=B1168% perf-profile.ca= lltrace.cycles-pp.copy_mc_enhanced_fast_string.copy_subpage.copy_user_large= _folio.hugetlb_wp.hugetlb_fault >> 3.59 =C2=B1 78% -2.2 1.39 =C2=B1169% perf-profile.ca= lltrace.cycles-pp.__mutex_lock.hugetlb_fault.handle_mm_fault.do_user_addr_f= ault.exc_page_fault >> 17.35 =C2=B1 77% -10.6 6.73 =C2=B1169% perf-profile.ch= ildren.cycles-pp.asm_exc_page_fault >> 17.32 =C2=B1 77% -10.6 6.72 =C2=B1168% perf-profile.ch= ildren.cycles-pp.do_user_addr_fault >> 17.32 =C2=B1 77% -10.6 6.72 =C2=B1168% perf-profile.ch= ildren.cycles-pp.exc_page_fault >> 17.30 =C2=B1 77% -10.6 6.71 =C2=B1168% perf-profile.ch= ildren.cycles-pp.handle_mm_fault >> 17.28 =C2=B1 77% -10.6 6.70 =C2=B1169% perf-profile.ch= ildren.cycles-pp.hugetlb_fault >> 13.65 =C2=B1 76% -8.4 5.29 =C2=B1168% perf-profile.ch= ildren.cycles-pp.hugetlb_wp >> 13.37 =C2=B1 76% -8.2 5.18 =C2=B1168% perf-profile.ch= ildren.cycles-pp.copy_user_large_folio >> 13.35 =C2=B1 76% -8.2 5.18 =C2=B1168% perf-profile.ch= ildren.cycles-pp.copy_subpage >> 13.34 =C2=B1 76% -8.2 5.17 =C2=B1168% perf-profile.ch= ildren.cycles-pp.copy_mc_enhanced_fast_string >> 3.59 =C2=B1 78% -2.2 1.39 =C2=B1169% perf-profile.ch= ildren.cycles-pp.__mutex_lock >> 13.24 =C2=B1 76% -8.1 5.13 =C2=B1168% perf-profile.se= lf.cycles-pp.copy_mc_enhanced_fast_string >> Disclaimer: >> Results have been estimated based on internal Intel analysis and are pro= vided >> for informational purposes only. Any difference in system hardware or so= ftware >> design or configuration may affect actual performance. >>=20 -- Best Regards, Huang, Ying