From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75787C2BD09 for ; Fri, 12 Jul 2024 09:13:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CB3EC6B009A; Fri, 12 Jul 2024 05:13:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C633D6B009C; Fri, 12 Jul 2024 05:13:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B037E6B009D; Fri, 12 Jul 2024 05:13:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8A24A6B009A for ; Fri, 12 Jul 2024 05:13:03 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2E95FC0A28 for ; Fri, 12 Jul 2024 09:13:03 +0000 (UTC) X-FDA: 82330536246.03.639FEC0 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) by imf02.hostedemail.com (Postfix) with ESMTP id 1C4BD8001F for ; Fri, 12 Jul 2024 09:12:59 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ecYGtUj6; spf=pass (imf02.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.19 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720775564; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LZC0ax39+LItZgc2f8Afie8wapOwYoh/uwIjhhLLdwg=; b=6CbE1EgQXIcCtprOqPRMNozsU+/0ycBGyQVq0/ury9M9CvQUZEzCYHotHbL6Am+UD3rGRB jWA2tX2bopauIY7LOEBwB70kbW4MrnEaIfPDF86A+13h5woKRMAL0Kzqcu+nBR0Picw/YA vzHZgh7G2EHah3KHq4ApGZEu+JTUw+8= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ecYGtUj6; spf=pass (imf02.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.19 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720775564; a=rsa-sha256; cv=none; b=N5gE2u1SPeyu5ANVHntc2MDBmagUCMPE/s2++LMbmNqeuZN7UL9TcgQzT6lkqq6wbTRs70 BZb89lo/mj6zFpIuULPRARs93FCcG3miYjXZPtriQQc0s3zd5iE4U5iKrmIjnULCkJr+Tn ADP7OzxrVtILGstzGIoxRqoiBSuT8zI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720775580; x=1752311580; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=FhmALfQ3M4iiB3x08mu1jmbD/Cf3t0P4811D/F01yhM=; b=ecYGtUj6Ptro9n8Z/hQlArxoaxZurJ5mVMa6/CmocXy7xsuJJlbEUv8h lM5Gbr6/AxIfM2RqFdw3SY5mNbhB6jDMJjqj5HFktuF4kj+Sz77njIg3a JYzWxK40QjMi+rQBIpUI2R7wZ/IZUKcDUap24lx2Z6THHxKUuACu5k7kf OtrzgC/7UlJTqAwDAsHgs4XR7HkBPLQ4ED32ODg85lUevekzy9fFcu96J vF5ivWtGtwC9FS60r+ta78kMg+rzsMXxLusq6fuqtxsvhd4sX+6tH7s1G A2e2VlCBnhYDtVGCAZ9GbtEbY1qjHq8YKp5ZSd+idcOkTXs0TEKnc+IlR w==; X-CSE-ConnectionGUID: Fuku6O7JTnqkKyPYuGJUcA== X-CSE-MsgGUID: oQwt+o8ZTfeU3r+sy/nhjQ== X-IronPort-AV: E=McAfee;i="6700,10204,11130"; a="17917195" X-IronPort-AV: E=Sophos;i="6.09,202,1716274800"; d="scan'208";a="17917195" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2024 02:12:55 -0700 X-CSE-ConnectionGUID: yd9KZmruRu2/wVv58bOTTg== X-CSE-MsgGUID: XT+esMzeQEKFcE3X8sAXwg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,202,1716274800"; d="scan'208";a="72061635" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2024 02:12:45 -0700 From: "Huang, Ying" To: Yafang Shao Cc: akpm@linux-foundation.org, Matthew Wilcox , mgorman@techsingularity.net, linux-mm@kvack.org, David Rientjes Subject: Re: [PATCH 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max In-Reply-To: (Yafang Shao's message of "Fri, 12 Jul 2024 16:49:44 +0800") References: <20240707094956.94654-1-laoar.shao@gmail.com> <878qyaarm6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o774a0pv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87frsg9waa.fsf@yhuang6-desk2.ccr.corp.intel.com> <877cds9pa2.fsf@yhuang6-desk2.ccr.corp.intel.com> <87y1678l0f.fsf@yhuang6-desk2.ccr.corp.intel.com> <87plrj8g42.fsf@yhuang6-desk2.ccr.corp.intel.com> <87h6cv89n4.fsf@yhuang6-desk2.ccr.corp.intel.com> <87cynj878z.fsf@yhuang6-desk2.ccr.corp.intel.com> <874j8v851a.fsf@yhuang6-desk2.ccr.corp.intel.com> <87zfqn6mr5.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Fri, 12 Jul 2024 17:10:50 +0800 Message-ID: <87v81b6kmd.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 1C4BD8001F X-Stat-Signature: bq8m1cyrwzbg8zxa9w9t8tbdiaazmfik X-HE-Tag: 1720775579-792493 X-HE-Meta: U2FsdGVkX1/s2P6Mh+clG/fnyAD/uAY8vx0jyYa6M3+Zch8xKTmngIMePS0HrP+L/Zq0ZCB58M4KZ4zNG9PoscLiSxvXpxuss232xxOpdJ3WaokHkqOX6lzot6q6oVc1EZ25fgkbrANsta76CVj7msOs+ZmgPUBPf1iR7BvqJBG+NxqgCoQCmwkT+FcSKtpQlbqgR2E3DuV7h9NkV5mOz4ahe8Oo8uvrz7p6nNlNe7PEn4sr0/Di2nQ5HZEQf+N4oFfstCgCnZI8ZIMh2lenfoMIsQqNJRu/2Z8756Z4WNEvi7sahL8dJUw8aWEGurTzDn67j2YNAIsvU2QiPsjPQQOcOD2xM6GreqaGwjadbZKq6xyhLcfYwog+E8c1epsdmxv4XpnjF1VXml7jZB9BWjd0U7HoWeqUdgDf8nYt5Eeb1t5U3YVO52gcKRM4XJV8/HXTSPiNOmGhEa6i8joEXLjkptY2XlzQaVEp/WOGE1qVWUuIOe9KNBaFxfTGFGG9F1nTt3EUb909mjj91uWjnpWMLniy1LyeZD62Kt0lZCy4dZ1wk6iUhb0LWVTMa65SE42CjT6dUcfPAfUPMZpqYcTZAv2aeJ5JAfPOjq7UfBFn5NEDQU8I1pPRq9Adax8DW1VAvFUl0SzvFyxgywG956kJ96t2haSZg3RKxa3tbJ/tUYjNors0ccLpSL7A71Kgk4ejHYsHpVrzaRCC1wK9ywmis9NCc7A8sPolT01F/AtLyB3ZbPE072w4McagHDC8IG+eR2gboiefE0NgqOK1vKi2ZbreuuYfbzWFXbnV25auhWpg2gY1CzJKdrYCerevPwqMzdd5DJ05gu7wjzPykTdpnYQGB81PKYHchBbmEo2VgS62n3ACP8lbqn5vr7cy+jdwlXx973Gjzw4agHJQR6zMUb57D8myguTV4HFoCiNXqUAsT1WScuFrmC/G5GPluU/LG+Q50aYxa9/Kz36 /Ito/l80 siFE7E3hVOUKnktKThtAeFDav+asut9cqkaDSQggA7+9khUwl2ZRCH8RaMt1MCeDs1gseRYvrKmuGNPVKs9wu5TJj/v6inbRfev3Lr2QzoxCkeervJY3jVmNbz/wnYrNDAaGCfIb+RkNJkIAOTBDMrSn5UWGtFne7nts36Lp1G1jsxRYkPDGFAO+aUopofXGdZVtR/ACewSoV101yhfXagDAsKJ6uakHoSstYtlERIu7hEPM0h/LbqxTZ86t/aOpk+7cyFDhKxWcGFLZWjEaFSdYY9M3mjhWPJa4dPnSEszqjL32GdrNUJ52zHA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Yafang Shao writes: > On Fri, Jul 12, 2024 at 4:26=E2=80=AFPM Huang, Ying wrote: >> >> Yafang Shao writes: >> >> > On Fri, Jul 12, 2024 at 3:06=E2=80=AFPM Huang, Ying wrote: >> >> >> >> Yafang Shao writes: >> >> >> >> > On Fri, Jul 12, 2024 at 2:18=E2=80=AFPM Huang, Ying wrote: >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> > On Fri, Jul 12, 2024 at 1:26=E2=80=AFPM Huang, Ying wrote: >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> > On Fri, Jul 12, 2024 at 11:07=E2=80=AFAM Huang, Ying wrote: >> >> >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> >> >> > On Fri, Jul 12, 2024 at 9:21=E2=80=AFAM Huang, Ying wrote: >> >> >> >> >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> >> >> >> >> > On Thu, Jul 11, 2024 at 6:51=E2=80=AFPM Huang, Ying wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> >> >> >> >> >> >> > On Thu, Jul 11, 2024 at 4:20=E2=80=AFPM Huang, Ying = wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > On Thu, Jul 11, 2024 at 2:44=E2=80=AFPM Huang, Yi= ng wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > On Wed, Jul 10, 2024 at 10:51=E2=80=AFAM Huang= , Ying wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > The configuration parameter PCP_BATCH_SCALE= _MAX poses challenges for >> >> >> >> >> >> >> >> >> >> > quickly experimenting with specific workloa= ds in a production environment, >> >> >> >> >> >> >> >> >> >> > particularly when monitoring latency spikes= caused by contention on the >> >> >> >> >> >> >> >> >> >> > zone->lock. To address this, a new sysctl p= arameter vm.pcp_batch_scale_max >> >> >> >> >> >> >> >> >> >> > is introduced as a more practical alternati= ve. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> In general, I'm neutral to the change. I can= understand that kernel >> >> >> >> >> >> >> >> >> >> configuration isn't as flexible as sysctl kno= b. But, sysctl knob is ABI >> >> >> >> >> >> >> >> >> >> too. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > To ultimately mitigate the zone->lock conte= ntion issue, several suggestions >> >> >> >> >> >> >> >> >> >> > have been proposed. One approach involves d= ividing large zones into multi >> >> >> >> >> >> >> >> >> >> > smaller zones, as suggested by Matthew[0], = while another entails splitting >> >> >> >> >> >> >> >> >> >> > the zone->lock using a mechanism similar to= memory arenas and shifting away >> >> >> >> >> >> >> >> >> >> > from relying solely on zone_id to identify = the range of free lists a >> >> >> >> >> >> >> >> >> >> > particular page belongs to[1]. However, imp= lementing these solutions is >> >> >> >> >> >> >> >> >> >> > likely to necessitate a more extended devel= opment effort. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Per my understanding, the change will hurt in= stead of improve zone->lock >> >> >> >> >> >> >> >> >> >> contention. Instead, it will reduce page all= ocation/freeing latency. >> >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> > I'm quite perplexed by your recent comment. Yo= u introduced a >> >> >> >> >> >> >> >> >> > configuration that has proven to be difficult = to use, and you have >> >> >> >> >> >> >> >> >> > been resistant to suggestions for modifying it= to a more user-friendly >> >> >> >> >> >> >> >> >> > and practical tuning approach. May I inquire a= bout the rationale >> >> >> >> >> >> >> >> >> > behind introducing this configuration in the b= eginning? >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Sorry, I don't understand your words. Do you ne= ed me to explain what is >> >> >> >> >> >> >> >> >> "neutral"? >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > No, thanks. >> >> >> >> >> >> >> >> > After consulting with ChatGPT, I received a clear= and comprehensive >> >> >> >> >> >> >> >> > explanation of what "neutral" means, providing me= with a better >> >> >> >> >> >> >> >> > understanding of the concept. >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > So, can you explain why you introduced it as a co= nfig in the beginning ? >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> I think that I have explained it in the commit log = of commit >> >> >> >> >> >> >> >> 52166607ecc9 ("mm: restrict the pcp batch scale fac= tor to avoid too long >> >> >> >> >> >> >> >> latency"). Which introduces the config. >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > What specifically are your expectations for how user= s should utilize >> >> >> >> >> >> >> > this config in real production workload? >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Sysctl knob is ABI, which needs to be maintained fo= rever. Can you >> >> >> >> >> >> >> >> explain why you need it? Why cannot you use a fixe= d value after initial >> >> >> >> >> >> >> >> experiments. >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > Given the extensive scale of our production environm= ent, with hundreds >> >> >> >> >> >> >> > of thousands of servers, it begs the question: how d= o you propose we >> >> >> >> >> >> >> > efficiently manage the various workloads that remain= unaffected by the >> >> >> >> >> >> >> > sysctl change implemented on just a few thousand ser= vers? Is it >> >> >> >> >> >> >> > feasible to expect us to recompile and release a new= kernel for every >> >> >> >> >> >> >> > instance where the default value falls short? Surely= , there must be >> >> >> >> >> >> >> > more practical and efficient approaches we can explo= re together to >> >> >> >> >> >> >> > ensure optimal performance across all workloads. >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > When making improvements or modifications, kindly en= sure that they are >> >> >> >> >> >> >> > not solely confined to a test or lab environment. It= 's vital to also >> >> >> >> >> >> >> > consider the needs and requirements of our actual us= ers, along with >> >> >> >> >> >> >> > the diverse workloads they encounter in their daily = operations. >> >> >> >> >> >> >> >> >> >> >> >> >> >> Have you found that your different systems requires di= fferent >> >> >> >> >> >> >> CONFIG_PCP_BATCH_SCALE_MAX value already? >> >> >> >> >> >> > >> >> >> >> >> >> > For specific workloads that introduce latency, we set t= he value to 0. >> >> >> >> >> >> > For other workloads, we keep it unchanged until we dete= rmine that the >> >> >> >> >> >> > default value is also suboptimal. What is the issue wit= h this >> >> >> >> >> >> > approach? >> >> >> >> >> >> >> >> >> >> >> >> Firstly, this is a system wide configuration, not workloa= d specific. >> >> >> >> >> >> So, other workloads run on the same system will be impact= ed too. Will >> >> >> >> >> >> you run one workload only on one system? >> >> >> >> >> > >> >> >> >> >> > It seems we're living on different planets. You're happily= working in >> >> >> >> >> > your lab environment, while I'm struggling with real-world= production >> >> >> >> >> > issues. >> >> >> >> >> > >> >> >> >> >> > For servers: >> >> >> >> >> > >> >> >> >> >> > Server 1 to 10,000: vm.pcp_batch_scale_max =3D 0 >> >> >> >> >> > Server 10,001 to 1,000,000: vm.pcp_batch_scale_max =3D 5 >> >> >> >> >> > Server 1,000,001 and beyond: Happy with all values >> >> >> >> >> > >> >> >> >> >> > Is this hard to understand? >> >> >> >> >> > >> >> >> >> >> > In other words: >> >> >> >> >> > >> >> >> >> >> > For applications: >> >> >> >> >> > >> >> >> >> >> > Application 1 to 10,000: vm.pcp_batch_scale_max =3D 0 >> >> >> >> >> > Application 10,001 to 1,000,000: vm.pcp_batch_scale_max = =3D 5 >> >> >> >> >> > Application 1,000,001 and beyond: Happy with all values >> >> >> >> >> >> >> >> >> >> Good to know this. Thanks! >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Secondly, we need some evidences to introduce a new syste= m ABI. For >> >> >> >> >> >> example, we need to use different configuration on differ= ent systems >> >> >> >> >> >> otherwise some workloads will be hurt. Can you provide s= ome evidences >> >> >> >> >> >> to support your change? IMHO, it's not good enough to sa= y I don't know >> >> >> >> >> >> why I just don't want to change existing systems. If so,= it may be >> >> >> >> >> >> better to wait until you have more evidences. >> >> >> >> >> > >> >> >> >> >> > It seems the community encourages developers to experiment= with their >> >> >> >> >> > improvements in lab environments using meticulously design= ed test >> >> >> >> >> > cases A, B, C, and as many others as they can imagine, ult= imately >> >> >> >> >> > obtaining perfect data. However, it discourages developers= from >> >> >> >> >> > directly addressing real-world workloads. Sigh. >> >> >> >> >> >> >> >> >> >> You cannot know whether your workloads benefit or hurt for t= he different >> >> >> >> >> batch number and how in your production environment? If you= cannot, how >> >> >> >> >> do you decide which workload deploys on which system (with d= ifferent >> >> >> >> >> batch number configuration). If you can, can you provide su= ch >> >> >> >> >> information to support your patch? >> >> >> >> > >> >> >> >> > We leverage a meticulous selection of network metrics, partic= ularly >> >> >> >> > focusing on TcpExt indicators, to keep a close eye on applica= tion >> >> >> >> > latency. This includes metrics such as TcpExt.TCPTimeouts, >> >> >> >> > TcpExt.RetransSegs, TcpExt.DelayedACKLost, TcpExt.TCPSlowStar= tRetrans, >> >> >> >> > TcpExt.TCPFastRetrans, TcpExt.TCPOFOQueue, and more. >> >> >> >> > >> >> >> >> > In instances where a problematic container terminates, we've = noticed a >> >> >> >> > sharp spike in TcpExt.TCPTimeouts, reaching over 40 occurrenc= es per >> >> >> >> > second, which serves as a clear indication that other applica= tions are >> >> >> >> > experiencing latency issues. By fine-tuning the vm.pcp_batch_= scale_max >> >> >> >> > parameter to 0, we've been able to drastically reduce the max= imum >> >> >> >> > frequency of these timeouts to less than one per second. >> >> >> >> >> >> >> >> Thanks a lot for sharing this. I learned much from it! >> >> >> >> >> >> >> >> > At present, we're selectively applying this adjustment to clu= sters >> >> >> >> > that exclusively host the identified problematic applications= , and >> >> >> >> > we're closely monitoring their performance to ensure stabilit= y. To >> >> >> >> > date, we've observed no network latency issues as a result of= this >> >> >> >> > change. However, we remain cautious about extending this opti= mization >> >> >> >> > to other clusters, as the decision ultimately depends on a va= riety of >> >> >> >> > factors. >> >> >> >> > >> >> >> >> > It's important to note that we're not eager to implement this= change >> >> >> >> > across our entire fleet, as we recognize the potential for un= foreseen >> >> >> >> > consequences. Instead, we're taking a cautious approach by in= itially >> >> >> >> > applying it to a limited number of servers. This allows us to= assess >> >> >> >> > its impact and make informed decisions about whether or not t= o expand >> >> >> >> > its use in the future. >> >> >> >> >> >> >> >> So, you haven't observed any performance hurt yet. Right? >> >> >> > >> >> >> > Right. >> >> >> > >> >> >> >> If you >> >> >> >> haven't, I suggest you to keep the patch in your downstream ker= nel for a >> >> >> >> while. In the future, if you find the performance of some work= loads >> >> >> >> hurts because of the new batch number, you can repost the patch= with the >> >> >> >> supporting data. If in the end, the performance of more and mo= re >> >> >> >> workloads is good with the new batch number. You may consider = to make 0 >> >> >> >> the default value :-) >> >> >> > >> >> >> > That is not how the real world works. >> >> >> > >> >> >> > In the real world: >> >> >> > >> >> >> > - No one knows what may happen in the future. >> >> >> > Therefore, if possible, we should make systems flexible, unless >> >> >> > there is a strong justification for using a hard-coded value. >> >> >> > >> >> >> > - Minimize changes whenever possible. >> >> >> > These systems have been working fine in the past, even if with= lower >> >> >> > performance. Why make changes just for the sake of improving >> >> >> > performance? Does the key metric of your performance data truly = matter >> >> >> > for their workload? >> >> >> >> >> >> These are good policy in your organization and business. But, it'= s not >> >> >> necessary the policy that Linux kernel upstream should take. >> >> > >> >> > You mean the Upstream Linux kernel only designed for the lab ? >> >> > >> >> >> >> >> >> Community needs to consider long-term maintenance overhead, so it = adds >> >> >> new ABI (such as sysfs knob) to kernel with the necessary justific= ation. >> >> >> In general, it prefer to use a good default value or an automatic >> >> >> algorithm that works for everyone. Community tries avoiding (or f= ixing) >> >> >> regressions as much as possible, but this will not stop kernel from >> >> >> changing, even if it's big. >> >> > >> >> > Please explain to me why the kernel config is not ABI, but the sysc= tl is ABI. >> >> >> >> Linux kernel will not break ABI until the last users stop using it. >> > >> > However, you haven't given a clear reference why the systl is an ABI. >> >> TBH, I don't find a formal document said it explicitly after some >> searching. >> >> Hi, Andrew, Matthew, >> >> Can you help me on this? Whether sysctl is considered Linux kernel ABI? >> Or something similar? > > In my experience, we consistently utilize an if-statement to configure > sysctl settings in our production environments. > > if [ -f ${sysctl_file} ]; then > echo ${new_value} > ${sysctl_file} > fi > > Additionally, you can incorporate this into rc.local to ensure the > configuration is applied upon system reboot. > > Even if you add it to the sysctl.conf without the if-statement, it > won't break anything. > > The pcp-related sysctl parameter, vm.percpu_pagelist_high_fraction, > underwent a naming change along with a functional update from its > predecessor, vm.percpu_pagelist_fraction, in commit 74f44822097c > ("mm/page_alloc: introduce vm.percpu_pagelist_high_fraction"). Despite > this significant change, there have been no reported issues or > complaints, suggesting that the renaming and functional update have > not negatively impacted the system's functionality. Thanks for your information. From the commit, sysctl isn't considered as the kernel ABI. Even if so, IMHO, we shouldn't introduce a user tunable knob without a real world requirements except more flexibility. -- Best Regards, Huang, Ying