From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDD10C2BD09 for ; Fri, 12 Jul 2024 08:26:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7652E6B009D; Fri, 12 Jul 2024 04:26:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 715686B009E; Fri, 12 Jul 2024 04:26:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B5FC6B009F; Fri, 12 Jul 2024 04:26:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3A4FD6B009D for ; Fri, 12 Jul 2024 04:26:44 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E20171609D3 for ; Fri, 12 Jul 2024 08:26:43 +0000 (UTC) X-FDA: 82330419486.21.CC89BC1 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) by imf03.hostedemail.com (Postfix) with ESMTP id 748F920005 for ; Fri, 12 Jul 2024 08:26:41 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=En9guNIE; spf=pass (imf03.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720772767; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2QWfp6VaPrdB/vH97pCQy6mpCfSStFQun1Nx2eIFoeM=; b=mFr/vPIIw4S7G5jNol/uMNAtwVPClgXerEGfSwYdCtFT8hASINuqvwLfmY+tYHqfAXAIxW 7fLEq7O5Bdt8+IKDApJ7GmsqAYtn2UB9sBmMh9bFc9cVtUsF025lV9EEXQIfhVxpzyfG0G HQaPdYdQUzc+MK3CZTsj63S/kmaBG5E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720772767; a=rsa-sha256; cv=none; b=vNICz9bgYkKgahj0Oz2zCCqNcEpTODwA5ZIKdNepKCvRL4EJedHZle04i8/2k5vcIrA0WW lrw7SZ8IRoUQ3qygOJ6ONxI5oqF9h5E3G5kwU8T6LBzdCmKfQHV8dvhv0c9hoFyP9sWWt1 3/2eZ0iBTRZso2Y11fdECht1aLDoGvs= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=En9guNIE; spf=pass (imf03.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720772801; x=1752308801; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=ViNU3XjVeQSUO+6I6Uspgwto7iQR6LaWNsmox+2qDl0=; b=En9guNIEtEVgBG3IINm7ur9W7Rgg9r+Dkw/LYQiyabCEkC+HqZaMNqN4 To52OZcq7ca28tTtV18RY3B4BiRUvurBnDU1T9Jyp6hBcrD9S/3G1ae+x 189nuT4tVLULiVTyn70u+hEUDDKnFRnWES0eA4/BuW7ux1z9CNH1dW+ou uKEUz41axSlOF77iZpnjdZ+udTXpENXMTUWjVkgarH2tGk8qs89Sq3oR7 dUyAGbbhJ/eRLLVJ177Sn2mnWN+3sZYm5SsDBkm9uHKME5uOJ6pGz4+i+ gaZe1+mO0QbIIRhIFoBVb4npwvfv1TQ5Wjt0lgMg6ZVo8o6M4nDsDYwV+ w==; X-CSE-ConnectionGUID: mUw+YOS8RpyfdDyEbf42oQ== X-CSE-MsgGUID: BOfyMeCnS7q/6bgW3HdWyw== X-IronPort-AV: E=McAfee;i="6700,10204,11130"; a="18152212" X-IronPort-AV: E=Sophos;i="6.09,202,1716274800"; d="scan'208";a="18152212" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2024 01:26:39 -0700 X-CSE-ConnectionGUID: FiM4jVD9RveK1H7lDunNMA== X-CSE-MsgGUID: bsTbEgrRT+KsXl+KH58fNw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,202,1716274800"; d="scan'208";a="86343567" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2024 01:26:38 -0700 From: "Huang, Ying" To: Yafang Shao , akpm@linux-foundation.org, Matthew Wilcox Cc: mgorman@techsingularity.net, linux-mm@kvack.org, David Rientjes Subject: Re: [PATCH 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max In-Reply-To: (Yafang Shao's message of "Fri, 12 Jul 2024 15:36:15 +0800") References: <20240707094956.94654-1-laoar.shao@gmail.com> <20240707094956.94654-4-laoar.shao@gmail.com> <878qyaarm6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o774a0pv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87frsg9waa.fsf@yhuang6-desk2.ccr.corp.intel.com> <877cds9pa2.fsf@yhuang6-desk2.ccr.corp.intel.com> <87y1678l0f.fsf@yhuang6-desk2.ccr.corp.intel.com> <87plrj8g42.fsf@yhuang6-desk2.ccr.corp.intel.com> <87h6cv89n4.fsf@yhuang6-desk2.ccr.corp.intel.com> <87cynj878z.fsf@yhuang6-desk2.ccr.corp.intel.com> <874j8v851a.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Fri, 12 Jul 2024 16:24:46 +0800 Message-ID: <87zfqn6mr5.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 33ee9squ4j7c763w1hwh78udg5sniskd X-Rspamd-Queue-Id: 748F920005 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1720772801-10174 X-HE-Meta: U2FsdGVkX183OLf7RLb8sAFQ0zGcFu/YnZv/F/OQv3FEJCISxb1dl+ILzVZNt0UnwP5KXffCYxK0ExsJc7ZeHYEwtGmY7an89FHomY7uEtpDb/iL4hfhar0fX7/m21irqm9sjAVPLr4a2mhwKfKumcpqVOjaZKgnd/gmO+YLxNRA+/4XDMMydGrcCdsA+2Np1P8r8SkggRDm1Q1N82hJIvb+ERHPj+dYjBIz77WsvUpTwX1N94eLm/BvES7dB0N8eWbarGlW+u37Zm8XA10mFKRjqZRrg1vLW/MyaeZ1YooIhW68cdGie6pGSSnfDmzrtfaV/+ZOxTWGE1AbQX9J7hjzp4p3hPsl04xo6PQa6fgVR8mZ4MDrIFFz9UEMqUivRfz2HLY2Nnowm85Odzgis/nYjI+sQsJOkksOtkWI5+5Xui1+ZUK7bJKDvvvb+ronQQzUnuN5bVBeK2BIxIEOZIoDnPx8mBvt+sNvpvAtEN+KH/4MQIfqxVpCDNFWBSgwUDOOqD1xvuShpipI3YA+kZsB1H6i4HwldC8BMy2M2OfB42LQ770azBA+RrmhOTx0tZjchwbi+W3L+5L5Gpslm2kIQ5enx5Zika5gD2aCBSexyDWzap7pnGDmXyuux+aeB1sdyREPUcudzXn4K+6x7IqawWQf8fkVzUqiriqDbYzo3OJ0zNLU1n5Zb8W7U3mFIaqU/BD96vOV62RyF6qREMovQv3TSYHmOoKYRp0ZrKhCSVMIFBNBNsZSFSNxwBSxKLxo09ugjUhu5sBQ45mOYQs5v1Lwhl8MJZa4TTcuuv/BjIDkZGJwdwL/qQDLli7z6xJQ7e6yEmPzh36M2Rr6Ovz/6NveeMI3A+Ii0QX3+WUGwt/DdUpOzdXhRxaNIpTTaZ6SbZliC0+GL1G+4hUUgcxHzWIPmoUCqslDZsfbXVRV1PImg0a79OBx0GF9UcZfnw+VIQ3uT/IJpToMdd8 Uyf0tMsP P7+FPQE71qDNr2UViGL6p/v3mxtLRbpoDPy/WTOcIAfcPHFWFInf1+s492gjrgHoscSxktZBf5jhM/mk6kdoSGmSs0rrNP3MrKyLXtE7bFLBcZ56VHu4NS203j5gj2Cl1fqhCaDXESZ6X8lA3qU74rzsbhwJF95Fm8iuFjxOJQoVTywmqSxd83VAH+OUrFN39xwt102SxEdGKvAxlekicjVt6sM2mEHwNwy646Q2HukooWTdSmS46SMdwbGZqFGPFa3+jJJRnqu+PsalxI98JJRt28mXdZLe8hLsfQGAoNkMxG/9falBG2gGdKQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Yafang Shao writes: > On Fri, Jul 12, 2024 at 3:06=E2=80=AFPM Huang, Ying wrote: >> >> Yafang Shao writes: >> >> > On Fri, Jul 12, 2024 at 2:18=E2=80=AFPM Huang, Ying wrote: >> >> >> >> Yafang Shao writes: >> >> >> >> > On Fri, Jul 12, 2024 at 1:26=E2=80=AFPM Huang, Ying wrote: >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> > On Fri, Jul 12, 2024 at 11:07=E2=80=AFAM Huang, Ying wrote: >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> > On Fri, Jul 12, 2024 at 9:21=E2=80=AFAM Huang, Ying wrote: >> >> >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> >> >> > On Thu, Jul 11, 2024 at 6:51=E2=80=AFPM Huang, Ying wrote: >> >> >> >> >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> >> >> >> >> > On Thu, Jul 11, 2024 at 4:20=E2=80=AFPM Huang, Ying wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> >> >> >> >> >> >> > On Thu, Jul 11, 2024 at 2:44=E2=80=AFPM Huang, Ying = wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > On Wed, Jul 10, 2024 at 10:51=E2=80=AFAM Huang, Y= ing wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > The configuration parameter PCP_BATCH_SCALE_MA= X poses challenges for >> >> >> >> >> >> >> >> >> > quickly experimenting with specific workloads = in a production environment, >> >> >> >> >> >> >> >> >> > particularly when monitoring latency spikes ca= used by contention on the >> >> >> >> >> >> >> >> >> > zone->lock. To address this, a new sysctl para= meter vm.pcp_batch_scale_max >> >> >> >> >> >> >> >> >> > is introduced as a more practical alternative. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> In general, I'm neutral to the change. I can un= derstand that kernel >> >> >> >> >> >> >> >> >> configuration isn't as flexible as sysctl knob. = But, sysctl knob is ABI >> >> >> >> >> >> >> >> >> too. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > To ultimately mitigate the zone->lock contenti= on issue, several suggestions >> >> >> >> >> >> >> >> >> > have been proposed. One approach involves divi= ding large zones into multi >> >> >> >> >> >> >> >> >> > smaller zones, as suggested by Matthew[0], whi= le another entails splitting >> >> >> >> >> >> >> >> >> > the zone->lock using a mechanism similar to me= mory arenas and shifting away >> >> >> >> >> >> >> >> >> > from relying solely on zone_id to identify the= range of free lists a >> >> >> >> >> >> >> >> >> > particular page belongs to[1]. However, implem= enting these solutions is >> >> >> >> >> >> >> >> >> > likely to necessitate a more extended developm= ent effort. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Per my understanding, the change will hurt inste= ad of improve zone->lock >> >> >> >> >> >> >> >> >> contention. Instead, it will reduce page alloca= tion/freeing latency. >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > I'm quite perplexed by your recent comment. You i= ntroduced a >> >> >> >> >> >> >> >> > configuration that has proven to be difficult to = use, and you have >> >> >> >> >> >> >> >> > been resistant to suggestions for modifying it to= a more user-friendly >> >> >> >> >> >> >> >> > and practical tuning approach. May I inquire abou= t the rationale >> >> >> >> >> >> >> >> > behind introducing this configuration in the begi= nning? >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Sorry, I don't understand your words. Do you need = me to explain what is >> >> >> >> >> >> >> >> "neutral"? >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > No, thanks. >> >> >> >> >> >> >> > After consulting with ChatGPT, I received a clear an= d comprehensive >> >> >> >> >> >> >> > explanation of what "neutral" means, providing me wi= th a better >> >> >> >> >> >> >> > understanding of the concept. >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > So, can you explain why you introduced it as a confi= g in the beginning ? >> >> >> >> >> >> >> >> >> >> >> >> >> >> I think that I have explained it in the commit log of = commit >> >> >> >> >> >> >> 52166607ecc9 ("mm: restrict the pcp batch scale factor= to avoid too long >> >> >> >> >> >> >> latency"). Which introduces the config. >> >> >> >> >> >> > >> >> >> >> >> >> > What specifically are your expectations for how users s= hould utilize >> >> >> >> >> >> > this config in real production workload? >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> Sysctl knob is ABI, which needs to be maintained forev= er. Can you >> >> >> >> >> >> >> explain why you need it? Why cannot you use a fixed v= alue after initial >> >> >> >> >> >> >> experiments. >> >> >> >> >> >> > >> >> >> >> >> >> > Given the extensive scale of our production environment= , with hundreds >> >> >> >> >> >> > of thousands of servers, it begs the question: how do y= ou propose we >> >> >> >> >> >> > efficiently manage the various workloads that remain un= affected by the >> >> >> >> >> >> > sysctl change implemented on just a few thousand server= s? Is it >> >> >> >> >> >> > feasible to expect us to recompile and release a new ke= rnel for every >> >> >> >> >> >> > instance where the default value falls short? Surely, t= here must be >> >> >> >> >> >> > more practical and efficient approaches we can explore = together to >> >> >> >> >> >> > ensure optimal performance across all workloads. >> >> >> >> >> >> > >> >> >> >> >> >> > When making improvements or modifications, kindly ensur= e that they are >> >> >> >> >> >> > not solely confined to a test or lab environment. It's = vital to also >> >> >> >> >> >> > consider the needs and requirements of our actual users= , along with >> >> >> >> >> >> > the diverse workloads they encounter in their daily ope= rations. >> >> >> >> >> >> >> >> >> >> >> >> Have you found that your different systems requires diffe= rent >> >> >> >> >> >> CONFIG_PCP_BATCH_SCALE_MAX value already? >> >> >> >> >> > >> >> >> >> >> > For specific workloads that introduce latency, we set the = value to 0. >> >> >> >> >> > For other workloads, we keep it unchanged until we determi= ne that the >> >> >> >> >> > default value is also suboptimal. What is the issue with t= his >> >> >> >> >> > approach? >> >> >> >> >> >> >> >> >> >> Firstly, this is a system wide configuration, not workload s= pecific. >> >> >> >> >> So, other workloads run on the same system will be impacted = too. Will >> >> >> >> >> you run one workload only on one system? >> >> >> >> > >> >> >> >> > It seems we're living on different planets. You're happily wo= rking in >> >> >> >> > your lab environment, while I'm struggling with real-world pr= oduction >> >> >> >> > issues. >> >> >> >> > >> >> >> >> > For servers: >> >> >> >> > >> >> >> >> > Server 1 to 10,000: vm.pcp_batch_scale_max =3D 0 >> >> >> >> > Server 10,001 to 1,000,000: vm.pcp_batch_scale_max =3D 5 >> >> >> >> > Server 1,000,001 and beyond: Happy with all values >> >> >> >> > >> >> >> >> > Is this hard to understand? >> >> >> >> > >> >> >> >> > In other words: >> >> >> >> > >> >> >> >> > For applications: >> >> >> >> > >> >> >> >> > Application 1 to 10,000: vm.pcp_batch_scale_max =3D 0 >> >> >> >> > Application 10,001 to 1,000,000: vm.pcp_batch_scale_max =3D 5 >> >> >> >> > Application 1,000,001 and beyond: Happy with all values >> >> >> >> >> >> >> >> Good to know this. Thanks! >> >> >> >> >> >> >> >> >> >> >> >> >> >> Secondly, we need some evidences to introduce a new system A= BI. For >> >> >> >> >> example, we need to use different configuration on different= systems >> >> >> >> >> otherwise some workloads will be hurt. Can you provide some= evidences >> >> >> >> >> to support your change? IMHO, it's not good enough to say I= don't know >> >> >> >> >> why I just don't want to change existing systems. If so, it= may be >> >> >> >> >> better to wait until you have more evidences. >> >> >> >> > >> >> >> >> > It seems the community encourages developers to experiment wi= th their >> >> >> >> > improvements in lab environments using meticulously designed = test >> >> >> >> > cases A, B, C, and as many others as they can imagine, ultima= tely >> >> >> >> > obtaining perfect data. However, it discourages developers fr= om >> >> >> >> > directly addressing real-world workloads. Sigh. >> >> >> >> >> >> >> >> You cannot know whether your workloads benefit or hurt for the = different >> >> >> >> batch number and how in your production environment? If you ca= nnot, how >> >> >> >> do you decide which workload deploys on which system (with diff= erent >> >> >> >> batch number configuration). If you can, can you provide such >> >> >> >> information to support your patch? >> >> >> > >> >> >> > We leverage a meticulous selection of network metrics, particula= rly >> >> >> > focusing on TcpExt indicators, to keep a close eye on application >> >> >> > latency. This includes metrics such as TcpExt.TCPTimeouts, >> >> >> > TcpExt.RetransSegs, TcpExt.DelayedACKLost, TcpExt.TCPSlowStartRe= trans, >> >> >> > TcpExt.TCPFastRetrans, TcpExt.TCPOFOQueue, and more. >> >> >> > >> >> >> > In instances where a problematic container terminates, we've not= iced a >> >> >> > sharp spike in TcpExt.TCPTimeouts, reaching over 40 occurrences = per >> >> >> > second, which serves as a clear indication that other applicatio= ns are >> >> >> > experiencing latency issues. By fine-tuning the vm.pcp_batch_sca= le_max >> >> >> > parameter to 0, we've been able to drastically reduce the maximum >> >> >> > frequency of these timeouts to less than one per second. >> >> >> >> >> >> Thanks a lot for sharing this. I learned much from it! >> >> >> >> >> >> > At present, we're selectively applying this adjustment to cluste= rs >> >> >> > that exclusively host the identified problematic applications, a= nd >> >> >> > we're closely monitoring their performance to ensure stability. = To >> >> >> > date, we've observed no network latency issues as a result of th= is >> >> >> > change. However, we remain cautious about extending this optimiz= ation >> >> >> > to other clusters, as the decision ultimately depends on a varie= ty of >> >> >> > factors. >> >> >> > >> >> >> > It's important to note that we're not eager to implement this ch= ange >> >> >> > across our entire fleet, as we recognize the potential for unfor= eseen >> >> >> > consequences. Instead, we're taking a cautious approach by initi= ally >> >> >> > applying it to a limited number of servers. This allows us to as= sess >> >> >> > its impact and make informed decisions about whether or not to e= xpand >> >> >> > its use in the future. >> >> >> >> >> >> So, you haven't observed any performance hurt yet. Right? >> >> > >> >> > Right. >> >> > >> >> >> If you >> >> >> haven't, I suggest you to keep the patch in your downstream kernel= for a >> >> >> while. In the future, if you find the performance of some workloa= ds >> >> >> hurts because of the new batch number, you can repost the patch wi= th the >> >> >> supporting data. If in the end, the performance of more and more >> >> >> workloads is good with the new batch number. You may consider to = make 0 >> >> >> the default value :-) >> >> > >> >> > That is not how the real world works. >> >> > >> >> > In the real world: >> >> > >> >> > - No one knows what may happen in the future. >> >> > Therefore, if possible, we should make systems flexible, unless >> >> > there is a strong justification for using a hard-coded value. >> >> > >> >> > - Minimize changes whenever possible. >> >> > These systems have been working fine in the past, even if with lo= wer >> >> > performance. Why make changes just for the sake of improving >> >> > performance? Does the key metric of your performance data truly mat= ter >> >> > for their workload? >> >> >> >> These are good policy in your organization and business. But, it's n= ot >> >> necessary the policy that Linux kernel upstream should take. >> > >> > You mean the Upstream Linux kernel only designed for the lab ? >> > >> >> >> >> Community needs to consider long-term maintenance overhead, so it adds >> >> new ABI (such as sysfs knob) to kernel with the necessary justificati= on. >> >> In general, it prefer to use a good default value or an automatic >> >> algorithm that works for everyone. Community tries avoiding (or fixi= ng) >> >> regressions as much as possible, but this will not stop kernel from >> >> changing, even if it's big. >> > >> > Please explain to me why the kernel config is not ABI, but the sysctl = is ABI. >> >> Linux kernel will not break ABI until the last users stop using it. > > However, you haven't given a clear reference why the systl is an ABI. TBH, I don't find a formal document said it explicitly after some searching. Hi, Andrew, Matthew, Can you help me on this? Whether sysctl is considered Linux kernel ABI? Or something similar? >> This usually means tens years if not forever. Kernel config options >> aren't considered ABI, they are used by developers and distributions. >> They come and go from version to version. >> >> >> >> >> IIUC, because of the different requirements, there are upstream and >> >> downstream kernels. >> > >> > The downstream developer backport features from the upsteam kernel, >> > and if they find issues in the upstream kernel, they should contribute >> > it back. That is how the Linux Community works, right ? >> >> Yes. If they are issues for upstream kernel too. -- Best Regards, Huang, Ying