From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99417C3DA7F for ; Mon, 5 Aug 2024 05:04:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0ECFD6B007B; Mon, 5 Aug 2024 01:04:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 09D2D6B0082; Mon, 5 Aug 2024 01:04:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA62D6B0085; Mon, 5 Aug 2024 01:04:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CD3336B007B for ; Mon, 5 Aug 2024 01:04:21 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8915741B50 for ; Mon, 5 Aug 2024 05:04:21 +0000 (UTC) X-FDA: 82417000722.06.49ACF98 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) by imf22.hostedemail.com (Postfix) with ESMTP id A643CC0014 for ; Mon, 5 Aug 2024 05:04:18 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MYZoy1Jx; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf22.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.16 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722834252; a=rsa-sha256; cv=none; b=x3UnoyOqNs86ZTwGnlQ4/SkUqbNprqe+5Bb5OqRSl1JolX8FRjWnhm3HH5u12c54CD1oB6 Ik4pNH19r2CSYmCnsMEqmn9XOv+Lq2jJyfAbzFX3F8pE3qPE2QlGP2KG2rAzab72zTJrCg 9LHbMIBMqef0d0kr2ZRG0Dewiphk9eg= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MYZoy1Jx; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf22.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.16 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722834252; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wRcGsNV0FRDDdi0WH20slHQZLJNLrglilO+vD/7V+r4=; b=lLlMN8/1iei4h+pMsRUA7G1W4Ygg3P4WyKKiBkOkerGx4V7jeTqunATb8MNB3KMNLsKi9Q VaklY5vZcZv60HIEsVfj4UJxUqYWMaJXiDtXS2f9xbl/LRB6T3oiM4Mq2RqQWzQ9NbfSqL fVrvpuVT7bLY4YTSv5DqwfR+6+ivWjE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1722834258; x=1754370258; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=anobf0ZsXFcdNFnp0hpLrlQMxZYFSUfVduWzYIpWujI=; b=MYZoy1JxSsSfVN4nSXkqfsXvt9wyzyFHOJvlUUA0Gb1etCOfr64GQ9zk nEwbrxQdkP/RrllqxCB9lIgtF8QibF7AxXXvxVnx6lwFhRqUyGHwSlMQ0 D2opHK1e3u7VEWF7zYwO5j7R0W24fHdVlj8gPjIq+WsakHsJTQeY06Ln3 RRqM56ogJYdtOeNEgODDrSJiUqCT1qOKxMpXIYobhl6X4P56xFa8Kp0JD 9ZosnzrTLs2NPcsm0NnPf0qrTlbGgwouZ0WyVMVZwEZKTdE6atgfy35g2 TiTLwPMdWhHe6mZZu2WqNoesZwIMtQGZRlCObnqBmyer29mvNWZit25T7 g==; X-CSE-ConnectionGUID: KbratFwpSmCqOw18ALjzAw== X-CSE-MsgGUID: N12/17DlRUq1oyMYFlobOA== X-IronPort-AV: E=McAfee;i="6700,10204,11154"; a="20930923" X-IronPort-AV: E=Sophos;i="6.09,263,1716274800"; d="scan'208";a="20930923" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Aug 2024 22:04:17 -0700 X-CSE-ConnectionGUID: 75mINzfqQlSlNJa30ZbZGA== X-CSE-MsgGUID: 4cV+GDDGS1i3KpjdBubAeg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,263,1716274800"; d="scan'208";a="86988628" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Aug 2024 22:04:16 -0700 From: "Huang, Ying" To: Yafang Shao Cc: akpm@linux-foundation.org, mgorman@techsingularity.net, linux-mm@kvack.org, Matthew Wilcox , David Rientjes Subject: Re: [PATCH v3 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max In-Reply-To: (Yafang Shao's message of "Mon, 5 Aug 2024 12:48:50 +0800") References: <20240804080107.21094-1-laoar.shao@gmail.com> <20240804080107.21094-4-laoar.shao@gmail.com> <87r0b3g35e.fsf@yhuang6-desk2.ccr.corp.intel.com> <87mslrfz9i.fsf@yhuang6-desk2.ccr.corp.intel.com> <87cymnfv3b.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Mon, 05 Aug 2024 13:00:42 +0800 Message-ID: <878qxbfts5.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: A643CC0014 X-Rspamd-Server: rspam01 X-Stat-Signature: 6zkjo3jpz5z5s1xkcanoiwqtnmb8x3sp X-HE-Tag: 1722834258-193154 X-HE-Meta: U2FsdGVkX19vlW788ZqdyZMrX6icST3RITKqjM9GqtaR+9abr4nwL59Patv97a5Qq463278f0MD+u9QA4SGdhuK46KHFeJxHU79eRt+6ZheN3EiaCegSQCWOK7lOPoOaeFRGemwgHmjndKa6bLO0ndvX3Q6DwCNrDVt1hyUH014WQ3Ax3abnXeSBAi4S57uByDBSLU/gLm2DbCzNShzS7ncMa7sNPOUQ09BZrWSg/DpT0BBVugKgZXP9LWsst9xPx0q+NI9lXqq2hqlMdTF5QDYk7Q27BLvxxIaBwx66m4hjyUT9tq7YccJFn4313AqwLlDNoXW4zazHVbWekgP/EAEhX72iwhCchzKxCHJYRf6wuJ7oUuINF5GRmKpyfnVpnJ5PJI6VwVM3Z0CGRj+3y6JNdz+pkciNUI5YIn0oYoqbC6Uw2/NSXCTOlrrUUQ/fv8hkzHxnrlTVomJ22LMsh23OP/4W4BNauExEggiU6yiWJIFR/pwXqNY8q7pyaf7JoJbxRBQYVomgeUyZz2gxhv19RB6KIdx0yZoKaHHKEJKTSCwY26Dq/v44rWPw8n2NEPrM1tLjX8YrU8oUAKHN0qYe4HNJ85uyGZgh+50BVYihakBa3XA0q1qrWx1reGfggntb3hVaKOc20w/ZCYkLQvVXQK+HG322bb3PklwIBXsbtN8pvcxMlpFbga5sQPXw4nvWFZ1BYIRpM5JH9IExDZcFhSz7PmS7x9KWjfZkaMdI/SWi9PzPm9GMx5mA0D3UQ3JCRp/cf6nPQiwNKNePNh42hAtiAGFbBCsMcCGqgXwUXDmovhdYaEhn0Lb+E+CKFbMBINrNWNDbV4wiXuL25uGt/0jYvsLqtIhBa6rvOQCPHjl5UHtZ8VU8bdQpZdqxm4mst3DuXa1eKwM6VQXQk/J8zxWcRQjyiMEyloR0HTP/DSlWtjnMpR+asdUUnxcoEKHVPD11W49SpTNtmGw MGFj7jnl HB+PkF8fqRX9JogLnOAVg0pLTrAnWWCSmALL73JAOfiUeirLl2ioi6Ny223HB4B9RRbQIv3f/ReiclrC8AtLSl94ygemkOyHQBfu+R51M/4dLdTFKyJ7XgkP+X2l5C+Se3ylNrqouj7srIDqpCLKNy2UX/P117h37UkQZrHfz1mJIaPocGygG2UCfaZx8TQZuTsM3mZucGlo5IvY6BYi2DkpqBfoJVOXvloZV41l1xEFYP2gkGJT0wfkvrPgofO3J2QF4tg14bCSDSZqjCcZ2vzFQoTQQvUvprVBTk1pOG15wOS1iiQD/nzLk8Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Yafang Shao writes: > On Mon, Aug 5, 2024 at 12:36=E2=80=AFPM Huang, Ying wrote: >> >> Yafang Shao writes: >> >> > On Mon, Aug 5, 2024 at 11:05=E2=80=AFAM Huang, Ying wrote: >> >> >> >> Yafang Shao writes: >> >> >> >> > On Mon, Aug 5, 2024 at 9:41=E2=80=AFAM Huang, Ying wrote: >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> [snip] >> >> >> >> >> >> > >> >> >> > Why introduce a systl knob? >> >> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D >> >> >> > >> >> >> > From the above data, it's clear that different CPU types have va= rying >> >> >> > allocation latencies concerning zone->lock contention. Typically= , people >> >> >> > don't release individual kernel packages for each type of x86_64= CPU. >> >> >> > >> >> >> > Furthermore, for latency-insensitive applications, we can keep t= he default >> >> >> > setting for better throughput. >> >> >> >> >> >> Do you have any data to prove that the default setting is better f= or >> >> >> throughput? If so, that will be a strong support for your patch. >> >> > >> >> > No, I don't. The primary reason we can't change the default value f= rom >> >> > 5 to 0 across our fleet of servers is that you initially set it to = 5. >> >> > The sysadmins believe you had a strong reason for setting it to 5 by >> >> > default; otherwise, it would be considered careless for the upstream >> >> > kernel. I also believe you must have had a solid justification for >> >> > setting the default value to 5; otherwise, why would you have >> >> > submitted your patches? >> >> >> >> In commit 52166607ecc9 ("mm: restrict the pcp batch scale factor to >> >> avoid too long latency"), I tried my best to run test on the machines >> >> available with a micro-benchmark (will-it-scale/page_fault1) which >> >> exercises kernel page allocator heavily. From the data in commit, >> >> larger CONFIG_PCP_BATCH_SCALE_MAX helps throughput a little, but not >> >> much. The 99% alloc/free latency can be kept within about 100us with >> >> CONFIG_PCP_BATCH_SCALE_MAX =3D=3D 5. So, we chose 5 as default value. >> >> >> >> But, we can always improve the default value with more data, on more >> >> types of machines and with more types of benchmarks, etc. >> >> >> >> Your data suggest smaller default value because you have data to show >> >> that larger default value has the latency spike issue (as large as te= ns >> >> ms) for some practical workloads. Which weren't tested previously. = In >> >> contrast, we don't have strong data to show the throughput advantages= of >> >> larger CONFIG_PCP_BATCH_SCALE_MAX value. >> >> >> >> So, I suggest to use a smaller default value for >> >> CONFIG_PCP_BATCH_SCALE_MAX. But, we may need more test to check the >> >> data for 1, 2, 3, and 4, in addtion to 0 and 5 to determine the best >> >> choice. >> > >> > Which smaller default value would be better? >> >> This depends on further test results. > > I believe you agree with me that you can't test all workloads. > >> >> > How can we ensure that other workloads, which we haven't tested, will >> > work well with this new default value? >> >> We cannot. We can only depends on the data available. If there are >> new data available in the future, we can make the change accordingly. > > So, your solution is to change the hardcoded value for untested > workloads and then release the kernel package again? > >> >> > If you have a better default value in mind, would you consider sending >> > a patch for it? I would be happy to test it with my test case. >> >> If you can test the value 1, 2, 3, and 4 with your workload, that will >> be very helpful! Both allocation latency and total free time (if >> possible) are valuable. > > You know I can't verify it with all workloads, right? > You have so much data to verify, which indicates uncertainty about any > default value. Why not make it tunable and let the user choose the > value they prefer? We only make decision based on data available. In theory, we cannot test all workloads, because there will be new workloads in the future. If we have data to show that smaller value will cause performance regressions for some reasonable workloads, we can make it user tunable. -- Best Regards, Huang, Ying