From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87DE6C3DA45 for ; Fri, 12 Jul 2024 01:21:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A41E96B007B; Thu, 11 Jul 2024 21:21:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F1176B0082; Thu, 11 Jul 2024 21:21:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B8EC6B0083; Thu, 11 Jul 2024 21:21:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6C9606B007B for ; Thu, 11 Jul 2024 21:21:27 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BAA5E16085B for ; Fri, 12 Jul 2024 01:21:26 +0000 (UTC) X-FDA: 82329347772.04.12E3228 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) by imf27.hostedemail.com (Postfix) with ESMTP id 25E2D40013 for ; Fri, 12 Jul 2024 01:21:22 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BgfZZisE; spf=pass (imf27.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720747267; a=rsa-sha256; cv=none; b=uYjIXP8NiChZV3x+L9+DM44BJQexN9jaPqJi1yg5vbeq8PLJnwkZT4wswhgOMDjNvIxjT9 lD1a8BbUMs6cfvHUsfj2TRQ9er6ZsCKMoCAUu95JSXbDPFs4GiVJaCPf2B1G149+9LGw57 LUqqJTfovlipzdh7UuDA0dNYsWpKiHA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BgfZZisE; spf=pass (imf27.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720747267; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cRWgkKqa+8H+KyROW933q6meFYuzlhYO77ovx4uSzOQ=; b=jEQKCZMRi7RkmKbZwdNmrhMRyjsba1AxQvQSob2ALrsPpwDQbfX0z9PXdJnm0RkubyNJB5 QWCECXRNHPVXTyf+BoQ/BEDxh1QKUCyKjzBF8nt4sx94/fqVMrC5Hd7cflwaTt+Ea5xL8w 3Fha1/Rh87V3qKtbeJ4Bs5Ued6kxJf4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720747284; x=1752283284; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=N06niZ+iBcE8pvqRXqGnng7u6jfaRRiQ2OIyFhJOL+A=; b=BgfZZisEgY0ALJJTWzjn6eL4OK7OFwTSu19AryXIbSd7lOq9T1afL6NS 4wWYp178ctgJocBT6UOWDtyC3J4Uh9I4JUpg8WVBKbDpg9b8XM4/Qx6eg GR0bu80BT+Rczy7pDD++dVUiwsaV1DS/7lSVLYEQVaEJPNtzFPU7mQLoS RE2qcJ8CZMnepoMxf6htyDQpI1N1wlk0vxPxXQoeySYLYkCVgP++EHewi 27w69iuFgXrCFQG5dqb7oTqcKYI7DXMIEy1a3F6W/o6LivdXvzAvqT7gJ tn45NJS9HdxmQQjXCbl3NjGYKrkORev5hPKXB2hzuuvcM3f33H7d35vVG w==; X-CSE-ConnectionGUID: tb+2rPhyTw2DTy1rY5yuKg== X-CSE-MsgGUID: aNMc35LdT8qdt6NQk5RL/Q== X-IronPort-AV: E=McAfee;i="6700,10204,11130"; a="18309787" X-IronPort-AV: E=Sophos;i="6.09,201,1716274800"; d="scan'208";a="18309787" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2024 18:21:22 -0700 X-CSE-ConnectionGUID: TrnezL7rRQ+ipdIqAmXn+w== X-CSE-MsgGUID: qKZFsH5eRFWrgTaPVs3UuQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,201,1716274800"; d="scan'208";a="53137982" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2024 18:21:19 -0700 From: "Huang, Ying" To: Yafang Shao Cc: akpm@linux-foundation.org, mgorman@techsingularity.net, linux-mm@kvack.org, Matthew Wilcox , David Rientjes Subject: Re: [PATCH 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max In-Reply-To: (Yafang Shao's message of "Thu, 11 Jul 2024 20:45:01 +0800") References: <20240707094956.94654-1-laoar.shao@gmail.com> <20240707094956.94654-4-laoar.shao@gmail.com> <878qyaarm6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o774a0pv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87frsg9waa.fsf@yhuang6-desk2.ccr.corp.intel.com> <877cds9pa2.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Fri, 12 Jul 2024 09:19:28 +0800 Message-ID: <87y1678l0f.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Stat-Signature: dp6nwbnwubkc8umz9erbtopx7brkqard X-Rspamd-Queue-Id: 25E2D40013 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1720747282-955281 X-HE-Meta: U2FsdGVkX1806idMGj1TwV8LjUAKxMPlumJ0PJaiNKUt6Qq03OryCRhMHoA2aaeBDfT2ZH9nudqJnRVY2RMSn0MsqRe2KP1sIQvruNqoUR1Y1o5JPBdLHrVB+NAsQEnA9QLts7KzRcrimpooaoXs8CX455njl/DO/ajIzIs7d2pJK7Bam8CuWnb6Qtsp7ZxZ7z+6zVjIzqUTfBrDKVq2k/Z9yS2r4EMavVWGss95uAHsgQbG1RmDoWiPLKQTdNXKIa9mPnXd3Wh3xOiXFE7bdLZkJODBGoSjgh14tvhLP+Mlf5dGoA9WhOGx1fver1vSpa3XVtg1o+g+dk2pHXxrX1LbFktoO68ggEcrvl8KWLLhVUVW3RykYpk5222jv704n2Vt6ggtVOjDotAT8/QKF/tV/+ZhOXbt3RAz4RI8kj40VORzgL76OCmYWQNItt5BttUnubbeB2ecv2raA+gBjQ8Qr58GIfgfK4BrQRkCCmn69j4bEAQJY/JQ91Dnl89KM7+qk2swsVV79ZcARgE2MPnWP2yiojYoT2osEJYhNhFHjwqarU5It6j8jQgB3a5gppcv40u30U04bMhhnK/9MnTnl7e71+9p1EzQWBJ6pxVmi00bVyVvGo5dbpS5wcF/ubuAHibQuYlKjpnN5IOkoLVwv4gHKI97CjIspAHnH+zVPJ02njkYyWUO5mqPG020aeeh9Ksg4xgTn/AMVbWM06zb2lAoeX8x4zDSmVWilAHRZ+Xi+jBOx69GqPVw9M3tVCOm5jXAIps4gxOIljiP1ZjQUlGjWe0T/7l3bBY52B/ZBkXobfduSyqkh/2OX7sOhXVHYcfo220pT04nX6/R/C1lX5hXUteJb6KAy2u1WHOc+OaUruDRSZufPAiRhSsAH+SEYnxR8E7e1F51yogpnpcBaK5TDksQMovji8rnoFdTfyg7/B0b9zAsnRtgbunGsxUbjAp9w7Qh5b3nlUZ tehbQVH3 ycZ2SydoJ/7fAX17kP1GjedW9L9D95n9n9xkn1ekxIOanIh8fNro+9P74e5B6iUl+x27w8JJJFHugfy8pd7+z2buAFbuZOjPiRKRO4bGyMzaPmWHM3T/NNYieXsNU0rOTf8FAFj9d64y8SYeLtixXBXYqpOqs4u3B4hZhHbUrvCBkQy186p60c857XlE1bzjiUedCHmvmdvMu69j7sRqmKKq+Jyt3kg30N2jMOmzWmCfMwl6/ELkdYa8m0+K49OkaOtrtXVuTY35D+yHStWHoJMW1P9H4JLP7jkbMoxf2xywwElsrscG6Iy+eUw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Yafang Shao writes: > On Thu, Jul 11, 2024 at 6:51=E2=80=AFPM Huang, Ying wrote: >> >> Yafang Shao writes: >> >> > On Thu, Jul 11, 2024 at 4:20=E2=80=AFPM Huang, Ying wrote: >> >> >> >> Yafang Shao writes: >> >> >> >> > On Thu, Jul 11, 2024 at 2:44=E2=80=AFPM Huang, Ying wrote: >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> > On Wed, Jul 10, 2024 at 10:51=E2=80=AFAM Huang, Ying wrote: >> >> >> >> >> >> >> >> Yafang Shao writes: >> >> >> >> >> >> >> >> > The configuration parameter PCP_BATCH_SCALE_MAX poses challen= ges for >> >> >> >> > quickly experimenting with specific workloads in a production= environment, >> >> >> >> > particularly when monitoring latency spikes caused by content= ion on the >> >> >> >> > zone->lock. To address this, a new sysctl parameter vm.pcp_ba= tch_scale_max >> >> >> >> > is introduced as a more practical alternative. >> >> >> >> >> >> >> >> In general, I'm neutral to the change. I can understand that k= ernel >> >> >> >> configuration isn't as flexible as sysctl knob. But, sysctl kn= ob is ABI >> >> >> >> too. >> >> >> >> >> >> >> >> > To ultimately mitigate the zone->lock contention issue, sever= al suggestions >> >> >> >> > have been proposed. One approach involves dividing large zone= s into multi >> >> >> >> > smaller zones, as suggested by Matthew[0], while another enta= ils splitting >> >> >> >> > the zone->lock using a mechanism similar to memory arenas and= shifting away >> >> >> >> > from relying solely on zone_id to identify the range of free = lists a >> >> >> >> > particular page belongs to[1]. However, implementing these so= lutions is >> >> >> >> > likely to necessitate a more extended development effort. >> >> >> >> >> >> >> >> Per my understanding, the change will hurt instead of improve z= one->lock >> >> >> >> contention. Instead, it will reduce page allocation/freeing la= tency. >> >> >> > >> >> >> > I'm quite perplexed by your recent comment. You introduced a >> >> >> > configuration that has proven to be difficult to use, and you ha= ve >> >> >> > been resistant to suggestions for modifying it to a more user-fr= iendly >> >> >> > and practical tuning approach. May I inquire about the rationale >> >> >> > behind introducing this configuration in the beginning? >> >> >> >> >> >> Sorry, I don't understand your words. Do you need me to explain w= hat is >> >> >> "neutral"? >> >> > >> >> > No, thanks. >> >> > After consulting with ChatGPT, I received a clear and comprehensive >> >> > explanation of what "neutral" means, providing me with a better >> >> > understanding of the concept. >> >> > >> >> > So, can you explain why you introduced it as a config in the beginn= ing ? >> >> >> >> I think that I have explained it in the commit log of commit >> >> 52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid too l= ong >> >> latency"). Which introduces the config. >> > >> > What specifically are your expectations for how users should utilize >> > this config in real production workload? >> > >> >> >> >> Sysctl knob is ABI, which needs to be maintained forever. Can you >> >> explain why you need it? Why cannot you use a fixed value after init= ial >> >> experiments. >> > >> > Given the extensive scale of our production environment, with hundreds >> > of thousands of servers, it begs the question: how do you propose we >> > efficiently manage the various workloads that remain unaffected by the >> > sysctl change implemented on just a few thousand servers? Is it >> > feasible to expect us to recompile and release a new kernel for every >> > instance where the default value falls short? Surely, there must be >> > more practical and efficient approaches we can explore together to >> > ensure optimal performance across all workloads. >> > >> > When making improvements or modifications, kindly ensure that they are >> > not solely confined to a test or lab environment. It's vital to also >> > consider the needs and requirements of our actual users, along with >> > the diverse workloads they encounter in their daily operations. >> >> Have you found that your different systems requires different >> CONFIG_PCP_BATCH_SCALE_MAX value already? > > For specific workloads that introduce latency, we set the value to 0. > For other workloads, we keep it unchanged until we determine that the > default value is also suboptimal. What is the issue with this > approach? Firstly, this is a system wide configuration, not workload specific. So, other workloads run on the same system will be impacted too. Will you run one workload only on one system? Secondly, we need some evidences to introduce a new system ABI. For example, we need to use different configuration on different systems otherwise some workloads will be hurt. Can you provide some evidences to support your change? IMHO, it's not good enough to say I don't know why I just don't want to change existing systems. If so, it may be better to wait until you have more evidences. >> If no, I think that it's >> better for you to keep this patch in your downstream kernel for now. >> When you find that it is a common requirement, we can evaluate whether >> to make it a sysctl knob. -- Best Regards, Huang, Ying