From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FD44C3DA47 for ; Thu, 11 Jul 2024 10:51:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A1CC6B0085; Thu, 11 Jul 2024 06:51:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 62A736B0088; Thu, 11 Jul 2024 06:51:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A3BC6B0089; Thu, 11 Jul 2024 06:51:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2A4536B0085 for ; Thu, 11 Jul 2024 06:51:39 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B0A4E14051E for ; Thu, 11 Jul 2024 10:51:38 +0000 (UTC) X-FDA: 82327155876.16.25F3FCB Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by imf07.hostedemail.com (Postfix) with ESMTP id 7CF1840020 for ; Thu, 11 Jul 2024 10:51:35 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YxuCpHWS; spf=pass (imf07.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720695053; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kzyDfEg4HjdKNwbqcn+UV17+hRPsZ1vV5R/TGFORgNU=; b=cXo7PDtiBNJVWX8ZD5s9aB7IcGZDfTfYMqi5gD579M9HOyE0D1Y3Mzm1GpCNq3R93Sf/fD xwYJ5nTGuc6nX7CoZwWxaIIbPSVrPrl2Ss9ZUZcjVOs7tayCw9R7enLne2Hg6UMkOyyUkl on4GvZEqvr9HUQwLNr5uWfaUyMgUZdI= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YxuCpHWS; spf=pass (imf07.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720695053; a=rsa-sha256; cv=none; b=EoXUnd62Lh+vF9CCRcqAy88I6xJyyMvIhWAvhjZBSqZaB+lnmlFMd5t3/2T6K6ppi6ku+j xWli91pDfHnKbg6Loy3uQD6BzD6hxbpE761JcGhof9Q/as+EZvgseX1WzFQuoIfuGQci77 vTPuSRvU2OVh1sk6DkwbwH4ZNX32+Dg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720695096; x=1752231096; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=EERdrMZLC5D+2EbSCRiwNMCwyI/oYkQtuIoi8mCdjOA=; b=YxuCpHWSvYqjTJZtuC3ZySyi1pVB3lDxlFVT18eqWQ+lffldIYMTFukm keMf2feUDkmIo8UbQSrtPbwxKRPW6i+UfAp6HkxyZSSq7KN2gLCUaWhZN ZzNCFNWrvwqSJHqCfcv0MaUq7xcigbE7jfJwiiF+QsPksR5kN/bCT8jRA BeK01snUj84xumFoAVmuof5/CJopRj9TtXgQPzcGjcZ8VY22595aIyy1L +hxA1kKP1wQSlDQ2SHluQNxz/6rBDhL7MdifFHY+f389yLK2QSERZcf7R RAzYcdLxX0zTpLp8WMLjjCrQJxAjZl8ZUQxp6v7Tm2Mm6V3MdZndJYW2R A==; X-CSE-ConnectionGUID: gVc8NpeuQoSiA87dPxCjoQ== X-CSE-MsgGUID: zIWkAyr3RNybMi/NCgpS+w== X-IronPort-AV: E=McAfee;i="6700,10204,11129"; a="17680305" X-IronPort-AV: E=Sophos;i="6.09,199,1716274800"; d="scan'208";a="17680305" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2024 03:51:34 -0700 X-CSE-ConnectionGUID: B2dhXXogTgur89lRVKdG8w== X-CSE-MsgGUID: Rmb3jSNLRLSNlDmMZui3Zg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,199,1716274800"; d="scan'208";a="52825914" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2024 03:51:32 -0700 From: "Huang, Ying" To: Yafang Shao Cc: akpm@linux-foundation.org, mgorman@techsingularity.net, linux-mm@kvack.org, Matthew Wilcox , David Rientjes Subject: Re: [PATCH 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max In-Reply-To: (Yafang Shao's message of "Thu, 11 Jul 2024 17:51:38 +0800") References: <20240707094956.94654-1-laoar.shao@gmail.com> <20240707094956.94654-4-laoar.shao@gmail.com> <878qyaarm6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o774a0pv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87frsg9waa.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 11 Jul 2024 18:49:41 +0800 Message-ID: <877cds9pa2.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7CF1840020 X-Stat-Signature: w54x1ahwhwar9r5z948gkn5tynjntzqe X-Rspam-User: X-HE-Tag: 1720695095-154321 X-HE-Meta: U2FsdGVkX184ZgQ60XzL2EeW7QmhnL90Tr6xp1bAFKiwpoB1Eu62hAJHUECzB9hy2lplU7AwcWZnj5DyotLNAWISOfWhG7raIbt0LjfVECRKUvzdWWUkOKGRSp4O3df6eIxv36CDLvD87iFG6ASFV65nqPpsUuvuPBhLK0beEdJtcGQyQTjhXCcT+MiJ6sYOv6Yc6Fiwwy8B9mA4eMWbRioyoeeRJMzhYi1u+P9tYujkTxB5Nj33gYPWrCd2CYNRcUVmOZMUJTa+JytrHThxw6+8xpU5iq2deWqiEi5I38q1wLh1MORXuHEhOwT6uHtM4t7HS5C7FbhcSGaYnFeamCMbQMUWqXnyvw1da01lzDBUwJxsopGzfCCWQS+ze3WOjk4SFlK+F9Qglo2lqDxIugTx7u0aA661jvB1T7O74EtKMd0tNd6vIN5rb4fA3fFhIi4bvQZCXCuGlPZk0rJYMLzUrf/XIst+Uv6EgTfmdTiFu5gZ/5CDSPmyV4/7APua8VbMdnEmpfL0KHgG5gFTnRnCxo4eiPGPr3IiKiRz9BuBm7Rdap+ONZtbbC3T+Q1xKKaUmHwxmwA2xpyjdwagdIqjtvQVNpms1DQK+IlCsZC4igrC8E+XWZYb7yECPj6vvezBbcY3MWrEYfClqc2OqUVLb2Eiufb8II4Hv3ITt63e+4B07n6cJbJrcuEdpd9vqCMdh/1REQSNwaatwK06VKUsWzRcVhi7AbZhu7DzLD8EmMH/eLGloIUqnJiffkY9kyh5PhN/FEpi4lEWGpeQUXYRg2LFpKA3Nc9HNwvVs/Mt5x1MR44NSD0ZCmrf5fXa8GMWX3j6zEIIfbX615mgCxiSW57jJaYvHpOjZviIK2If4v4kE0opBZ9X5gp32B3eyQedvUPwbDaLGqz3j/IebmF7GW2c8Et2mpI/lYYVu9T174Ztqk9qLKV3OM0JhOB3Hu6dhfi+Z+/oS34Hg9B Uzlgllnd 2YVPy86iizUdY+XLcwito4FnqSIrgGIPyoImJUZHEAVBUSkTzUt7s0JPkH3FSa0X/PN5Ynte2mqqKQvXcs7Fa8a4C2ua2ptprMNnjE4ynP2A50VB/Z70nIdmMMdkcjuu6+QMI57O/39nvvbzW+ZnKjeYiCrEF4f5RmTxPI4DaUV7kxG1xlaQKV1PhDm7iDqY4vf1OhIn+ellBRXYiz/cVN8DIAGQ9j5Whd/oSoRi6NHWBhJ4wFunmrocDTJIpCYaF2wG7unOHI6fH2N2+vxZnioshZObK8kJ+rcniZSlhVqb+kJysxUPOmbRFHQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Yafang Shao writes: > On Thu, Jul 11, 2024 at 4:20=E2=80=AFPM Huang, Ying wrote: >> >> Yafang Shao writes: >> >> > On Thu, Jul 11, 2024 at 2:44=E2=80=AFPM Huang, Ying wrote: >> >> >> >> Yafang Shao writes: >> >> >> >> > On Wed, Jul 10, 2024 at 10:51=E2=80=AFAM Huang, Ying wrote: >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> > The configuration parameter PCP_BATCH_SCALE_MAX poses challenges= for >> >> >> > quickly experimenting with specific workloads in a production en= vironment, >> >> >> > particularly when monitoring latency spikes caused by contention= on the >> >> >> > zone->lock. To address this, a new sysctl parameter vm.pcp_batch= _scale_max >> >> >> > is introduced as a more practical alternative. >> >> >> >> >> >> In general, I'm neutral to the change. I can understand that kern= el >> >> >> configuration isn't as flexible as sysctl knob. But, sysctl knob = is ABI >> >> >> too. >> >> >> >> >> >> > To ultimately mitigate the zone->lock contention issue, several = suggestions >> >> >> > have been proposed. One approach involves dividing large zones i= nto multi >> >> >> > smaller zones, as suggested by Matthew[0], while another entails= splitting >> >> >> > the zone->lock using a mechanism similar to memory arenas and sh= ifting away >> >> >> > from relying solely on zone_id to identify the range of free lis= ts a >> >> >> > particular page belongs to[1]. However, implementing these solut= ions is >> >> >> > likely to necessitate a more extended development effort. >> >> >> >> >> >> Per my understanding, the change will hurt instead of improve zone= ->lock >> >> >> contention. Instead, it will reduce page allocation/freeing laten= cy. >> >> > >> >> > I'm quite perplexed by your recent comment. You introduced a >> >> > configuration that has proven to be difficult to use, and you have >> >> > been resistant to suggestions for modifying it to a more user-frien= dly >> >> > and practical tuning approach. May I inquire about the rationale >> >> > behind introducing this configuration in the beginning? >> >> >> >> Sorry, I don't understand your words. Do you need me to explain what= is >> >> "neutral"? >> > >> > No, thanks. >> > After consulting with ChatGPT, I received a clear and comprehensive >> > explanation of what "neutral" means, providing me with a better >> > understanding of the concept. >> > >> > So, can you explain why you introduced it as a config in the beginning= ? >> >> I think that I have explained it in the commit log of commit >> 52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid too long >> latency"). Which introduces the config. > > What specifically are your expectations for how users should utilize > this config in real production workload? > >> >> Sysctl knob is ABI, which needs to be maintained forever. Can you >> explain why you need it? Why cannot you use a fixed value after initial >> experiments. > > Given the extensive scale of our production environment, with hundreds > of thousands of servers, it begs the question: how do you propose we > efficiently manage the various workloads that remain unaffected by the > sysctl change implemented on just a few thousand servers? Is it > feasible to expect us to recompile and release a new kernel for every > instance where the default value falls short? Surely, there must be > more practical and efficient approaches we can explore together to > ensure optimal performance across all workloads. > > When making improvements or modifications, kindly ensure that they are > not solely confined to a test or lab environment. It's vital to also > consider the needs and requirements of our actual users, along with > the diverse workloads they encounter in their daily operations. Have you found that your different systems requires different CONFIG_PCP_BATCH_SCALE_MAX value already? If no, I think that it's better for you to keep this patch in your downstream kernel for now. When you find that it is a common requirement, we can evaluate whether to make it a sysctl knob. -- Best Regards, Huang, Ying