From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AFE95CDB482
	for <linux-mm@archiver.kernel.org>; Thu, 12 Oct 2023 12:17:53 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 3E0078D0124; Thu, 12 Oct 2023 08:17:53 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 390268D0002; Thu, 12 Oct 2023 08:17:53 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 257E48D0124; Thu, 12 Oct 2023 08:17:53 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 181AC8D0002
	for <linux-mm@kvack.org>; Thu, 12 Oct 2023 08:17:53 -0400 (EDT)
Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id D72EC120568
	for <linux-mm@kvack.org>; Thu, 12 Oct 2023 12:17:52 +0000 (UTC)
X-FDA: 81336710784.06.02C759E
Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93])
	by imf02.hostedemail.com (Postfix) with ESMTP id A669980012
	for <linux-mm@kvack.org>; Thu, 12 Oct 2023 12:17:50 +0000 (UTC)
Authentication-Results: imf02.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=RCyR7wtJ;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf02.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1697113070;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=X6PZfvjxIZNgAXdaW5DHgUipcr6MUWNYV1OVEaCMr98=;
	b=W691WpG7XdpK+b8X9zLO5+p9z63HFbRLuABgfdEEp5Yvy+YwYxK+c6YAaTnIfHwg7yQUVt
	uqFTSFqPO6+p5gJql7ZsVQGkIwIC2AcTKmBxWyPYL9koNxN5SG2C1sMCJAJLsMrAKP3jAw
	toFzXLQZAgqESqP8QaO2odtQm83t914=
ARC-Authentication-Results: i=1;
	imf02.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=RCyR7wtJ;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf02.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697113070; a=rsa-sha256;
	cv=none;
	b=oL7cMEt5JxJ1jbkEwJ71PRcCTdqglOo5FvMXI2iFQA1uJ64eyeGqRPUfXaMWO8hhisYXi3
	bvEh1m9VsrK5SjrE7K02HHvg0mLcOvLp7t+5U8g1IBwrcyYZLOPnd5MdW3j8f2iqlbytIf
	1IE4abq0X0qXXV0Un+gMBk0I0rN3iOo=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1697113070; x=1728649070;
  h=from:to:cc:subject:references:date:in-reply-to:
   message-id:mime-version;
  bh=TsWwp6y4lxASPuSdg0tEU4cS1WQ994qMPTC91I8n58M=;
  b=RCyR7wtJh0NLMHlzlkSBvxlMfShRn/M7+6LfWefFqj6gVDTPdkRf0HxL
   G2MH/pNJMfxiicfRXwBeGLL/vs6CYe/UJTx6Et3vt709bbE+CfaKF7lxK
   SahgwaMqbPpuz/CuZRmiTT0Jh8qq3ZFvkeiG753kr31Fzrpk/8OqLsyy9
   GYvZNgUEWFuUaIDgA2SITNl37+jgNP6lDzKmGiIfsRntixazphbaNg7cX
   s2s4ShWNAFgQEq9g0LJeZaF9H0XVg4wS72tfJrpCFqGFDO0tnp2kt34h2
   DNDVRNeaNeNMIRHN7AizrhBw2zsmf5lBMowuoUFRVKi++ZBUpFILi3ARD
   w==;
X-IronPort-AV: E=McAfee;i="6600,9927,10861"; a="382139890"
X-IronPort-AV: E=Sophos;i="6.03,218,1694761200"; 
   d="scan'208";a="382139890"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2023 05:17:49 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10861"; a="844977593"
X-IronPort-AV: E=Sophos;i="6.03,218,1694761200"; 
   d="scan'208";a="844977593"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2023 05:17:45 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: <linux-mm@kvack.org>,  <linux-kernel@vger.kernel.org>,  Arjan Van De Ven
 <arjan@linux.intel.com>,  Andrew Morton <akpm@linux-foundation.org>,
  Vlastimil Babka <vbabka@suse.cz>,  David Hildenbrand <david@redhat.com>,
  Johannes Weiner <jweiner@redhat.com>,  Dave Hansen
 <dave.hansen@linux.intel.com>,  Michal Hocko <mhocko@suse.com>,  Pavel
 Tatashin <pasha.tatashin@soleen.com>,  Matthew Wilcox
 <willy@infradead.org>,  "Christoph Lameter" <cl@linux.com>
Subject: Re: [PATCH 04/10] mm: restrict the pcp batch scale factor to avoid
 too long latency
References: <20230920061856.257597-1-ying.huang@intel.com>
	<20230920061856.257597-5-ying.huang@intel.com>
	<20231011125219.kuoluyuwxzva5q5w@techsingularity.net>
Date: Thu, 12 Oct 2023 20:15:42 +0800
In-Reply-To: <20231011125219.kuoluyuwxzva5q5w@techsingularity.net> (Mel
	Gorman's message of "Wed, 11 Oct 2023 13:52:19 +0100")
Message-ID: <878r88f34h.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
X-Rspamd-Queue-Id: A669980012
X-Rspam-User: 
X-Rspamd-Server: rspam04
X-Stat-Signature: 7559coddiokj9dg3kt5ubzxb7ft4jwcb
X-HE-Tag: 1697113070-83967
X-HE-Meta: U2FsdGVkX1+3o4kPkn3PWwfBS04UznCF/PWYakzCRxqouAuitIUZV63iNEmwccveJbws5R3ybMwraNWD0IVKTRCO4pmBJhx/tdNbP1CBxjtnLUq+8LkiygFyW0JdUpIIhBiWtG2v65oi+k+I/OB8bTN4mX2rQN8blE44/UnuBZWBsj9tQpVbpIp/iNovwyRfJTR8/6IQC6s79c3P+AsAWFlYHBJliBFvyBflOV36zZul7v7htLdtuUe/S0g1ehJgFMu3Tvc1B9HxiPJjxOtSzm2ebq3vyQZ1BG0iZr6jqB2Jn3iGT42Jm2RYaALdiansQ6kYdjbjG4XXLcv2f5cqbjxajnVZ8GBmVE0kk0LowP+xx8wwDaftwQfDQAVWZ6W8QVg3EDBwcgmqGTVdrFuJoy3x8yRhRuk9xuHUn2oBzdG1PysZJ5aQMTD0QgANTQI9SaRmM29yISb2LfksN+wgqjq4fipn0BdAj+GX0pc16R2uTjwAw/IZ2CAQGIRPHcpHr3ABI08xmjpFeeyYFJoupi52n+erw7Lo2qCga4y2wX8heig4XsJt5mRtcyoqLnwvuFIrVoKxe30DCVzwPc5D41Q5ZVdaijrKoRnQKRxMfVM5kyu4srupfOflRj9ZfiVkV31C3mEw1f0lXJtoWrDw0O2diQjU/AEIcixhh79Q3lpU4jMo1Ik4k98PwStobuZxW87eQWPnVWfQ5nBF/SlT3NBG3o7fJD9tDlr4WSK+mxNvHDkhnScVzv6I5M7GtvPYmUSBV+k/j0TjB8hOpYdHNsagP48/zsvlzxCq/y3GeUWqbj8LpYXHPNrIxD+Q8zqS24+nam2dnYKkOxlXpUesXzOz+7nxA1OfQzgyrhDNEw7uMKTiqWB8W5N6hIQRahWqwlo7hRj3LLZuwIsUbjW1Gqni2oyGp9gL7HL7lj+M8XBfeaf8OClSsm5QDvCEDeUT42FdrInprR7qz+NMEO6
 Kw42i727
 YwQcAz4hCsyzaiAOAcriaAJYWbw3A9u/mv5EKaPdLC2F1VTo+/tf6FbYxQYPaJdq609Blk1SAoaRsl890XtQYL5jf15d5cSffWi6z/EBAAYYFGhruGBzRbsLq7SVogaTZxuFer5O9XtbfmE+q06zAllzY5vyHmhor4JCQG5bdD02gL0QtWFBbSDSvYl2ddr//oyXz4R54bfpuf9tfH5kAE3YRCA9kmBwCzQJK
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Mel Gorman <mgorman@techsingularity.net> writes:

> On Wed, Sep 20, 2023 at 02:18:50PM +0800, Huang Ying wrote:
>> In page allocator, PCP (Per-CPU Pageset) is refilled and drained in
>> batches to increase page allocation throughput, reduce page
>> allocation/freeing latency per page, and reduce zone lock contention.
>> But too large batch size will cause too long maximal
>> allocation/freeing latency, which may punish arbitrary users.  So the
>> default batch size is chosen carefully (in zone_batchsize(), the value
>> is 63 for zone > 1GB) to avoid that.
>> 
>> In commit 3b12e7e97938 ("mm/page_alloc: scale the number of pages that
>> are batch freed"), the batch size will be scaled for large number of
>> page freeing to improve page freeing performance and reduce zone lock
>> contention.  Similar optimization can be used for large number of
>> pages allocation too.
>> 
>> To find out a suitable max batch scale factor (that is, max effective
>> batch size), some tests and measurement on some machines were done as
>> follows.
>> 
>> A set of debug patches are implemented as follows,
>> 
>> - Set PCP high to be 2 * batch to reduce the effect of PCP high
>> 
>> - Disable free batch size scaling to get the raw performance.
>> 
>> - The code with zone lock held is extracted from rmqueue_bulk() and
>>   free_pcppages_bulk() to 2 separate functions to make it easy to
>>   measure the function run time with ftrace function_graph tracer.
>> 
>> - The batch size is hard coded to be 63 (default), 127, 255, 511,
>>   1023, 2047, 4095.
>> 
>> Then will-it-scale/page_fault1 is used to generate the page
>> allocation/freeing workload.  The page allocation/freeing throughput
>> (page/s) is measured via will-it-scale.  The page allocation/freeing
>> average latency (alloc/free latency avg, in us) and allocation/freeing
>> latency at 99 percentile (alloc/free latency 99%, in us) are measured
>> with ftrace function_graph tracer.
>> 
>> The test results are as follows,
>> 
>> Sapphire Rapids Server
>> ======================
>> Batch	throughput	free latency	free latency	alloc latency	alloc latency
>> 	page/s		avg / us	99% / us	avg / us	99% / us
>> -----	----------	------------	------------	-------------	-------------
>>   63	513633.4	 2.33		 3.57		 2.67		  6.83
>>  127	517616.7	 4.35		 6.65		 4.22		 13.03
>>  255	520822.8	 8.29		13.32		 7.52		 25.24
>>  511	524122.0	15.79		23.42		14.02		 49.35
>> 1023	525980.5	30.25		44.19		25.36		 94.88
>> 2047	526793.6	59.39		84.50		45.22		140.81
>> 
>> Ice Lake Server
>> ===============
>> Batch	throughput	free latency	free latency	alloc latency	alloc latency
>> 	page/s		avg / us	99% / us	avg / us	99% / us
>> -----	----------	------------	------------	-------------	-------------
>>   63	620210.3	 2.21		 3.68		 2.02		 4.35
>>  127	627003.0	 4.09		 6.86		 3.51		 8.28
>>  255	630777.5	 7.70		13.50		 6.17		15.97
>>  511	633651.5	14.85		22.62		11.66		31.08
>> 1023	637071.1	28.55		42.02		20.81		54.36
>> 2047	638089.7	56.54		84.06		39.28		91.68
>> 
>> Cascade Lake Server
>> ===================
>> Batch	throughput	free latency	free latency	alloc latency	alloc latency
>> 	page/s		avg / us	99% / us	avg / us	99% / us
>> -----	----------	------------	------------	-------------	-------------
>>   63	404706.7	 3.29		  5.03		 3.53		  4.75
>>  127	422475.2	 6.12		  9.09		 6.36		  8.76
>>  255	411522.2	11.68		 16.97		10.90		 16.39
>>  511	428124.1	22.54		 31.28		19.86		 32.25
>> 1023	414718.4	43.39		 62.52		40.00		 66.33
>> 2047	429848.7	86.64		120.34		71.14		106.08
>> 
>> Commet Lake Desktop
>> ===================
>> Batch	throughput	free latency	free latency	alloc latency	alloc latency
>> 	page/s		avg / us	99% / us	avg / us	99% / us
>> -----	----------	------------	------------	-------------	-------------
>> 
>>   63	795183.13	 2.18		 3.55		 2.03		 3.05
>>  127	803067.85	 3.91		 6.56		 3.85		 5.52
>>  255	812771.10	 7.35		10.80		 7.14		10.20
>>  511	817723.48	14.17		27.54		13.43		30.31
>> 1023	818870.19	27.72		40.10		27.89		46.28
>> 
>> Coffee Lake Desktop
>> ===================
>> Batch	throughput	free latency	free latency	alloc latency	alloc latency
>> 	page/s		avg / us	99% / us	avg / us	99% / us
>> -----	----------	------------	------------	-------------	-------------
>>   63	510542.8	 3.13		  4.40		 2.48		 3.43
>>  127	514288.6	 5.97		  7.89		 4.65		 6.04
>>  255	516889.7	11.86		 15.58		 8.96		12.55
>>  511	519802.4	23.10		 28.81		16.95		26.19
>> 1023	520802.7	45.30		 52.51		33.19		45.95
>> 2047	519997.1	90.63		104.00		65.26		81.74
>> 
>> From the above data, to restrict the allocation/freeing latency to be
>> less than 100 us in most times, the max batch scale factor needs to be
>> less than or equal to 5.
>> 
>> So, in this patch, the batch scale factor is restricted to be less
>> than or equal to 5.
>> 
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
>
> However, it's worth noting that the time to free depends on the CPU and
> while the CPUs you tested are reasonable, there are also slower CPUs out
> there and I've at least one account that the time is excessive. While
> this patch is fine, there may be a patch on top that makes this runtime
> configurable, a Kconfig default or both.

Sure.  Will add a Kconfig option first in a follow-on patch.

--
Best Regards,
Huang, Ying