From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5E21C3DA45 for ; Fri, 12 Jul 2024 02:25:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2AE696B008C; Thu, 11 Jul 2024 22:25:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 25E496B0092; Thu, 11 Jul 2024 22:25:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 127066B0093; Thu, 11 Jul 2024 22:25:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E756C6B008C for ; Thu, 11 Jul 2024 22:25:56 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 59CFA12078E for ; Fri, 12 Jul 2024 02:25:56 +0000 (UTC) X-FDA: 82329510312.01.ADFF14B Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) by imf25.hostedemail.com (Postfix) with ESMTP id 892A9A000A for ; Fri, 12 Jul 2024 02:25:54 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SeO2lbor; spf=pass (imf25.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.50 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720751138; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sKXX22tt33q+99lA+ZipEhKjo6SHcU4rDFzZ+uxo4jE=; b=i5w5bntwZ0pC6hQo1frJoYCyrEtNhCiSEomYpZxFU5WTvtnVc7MnSgD3CNuZR0dMNrd/6v t6sh0eBo6CoZB5hmlU5+CjVDSChzHyAaxDBsG6nVoW+hc2+TfkHPxj2RrYvSo4e9Iauw2c PRa2sYjZ2CDi9bmw7GdgwWFQguesNqI= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SeO2lbor; spf=pass (imf25.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.50 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720751138; a=rsa-sha256; cv=none; b=eO93QzKrtcLfbWQtiubmJlljXswH+j42xaApwf9MIKwCRjee6mt3tDX1Y8inkIikj1nQ3T 83KfVaNVM0VMVzTUB+kK0auHwl8B9uhEAl0IDB3kxJn+C4+UlzLjRb1RjDOzaK42svJ0d0 ixFedMgc/R3O9nsfxGLYndlmyr3wMtc= Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-6b5f2ac0fb9so8291706d6.2 for ; Thu, 11 Jul 2024 19:25:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720751153; x=1721355953; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=sKXX22tt33q+99lA+ZipEhKjo6SHcU4rDFzZ+uxo4jE=; b=SeO2lborUuF9/nLmNUuE8mtRsPX3WC9OakzEBLfsA9J+znRw6bEyrP2CqQX8gkZ1Qa U7WmLgvrjpqDViMu2uQJkikQ2YMBYXoLMJOtE9fvHH/88eXlVFUg0VRPacu1lUv3Z2g+ iCW8pJeGgXYwJ6P4M/guCS6662UCad1HU1ymGJWMANcJqULnxkve5ezqE4HlsHL+YlDg pOAYXvjFu/vn7jA2xIFnO7jYck7sJX9mLrtpItRLXxX4W8niDSSLhW2Y/i4FF8+BnWlH E/XL97DxmxcqEhnuap3QkccdwdC/CgBw4Sk8Pwgvle3IP+scFnJ/dg2PJAZHlCWKHCEF QVug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720751153; x=1721355953; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sKXX22tt33q+99lA+ZipEhKjo6SHcU4rDFzZ+uxo4jE=; b=g7WsIVJWHzYufqzLzByYkx1jLODbFpA15pmLOccS6gltJqEN07SiLgmcDrh35HiQ87 H8O7cPU+qaBRkWYvffJ2Vwu/MwRbH2clBs4Xe3u9iqZbF1vSViQZrP1U1/YZBdtwN81S siRiQfmVd6M67sQDGRk90B/+Yp0ZBOhRqlkv8Sal9W+1CANmR7bUSfaIIaGEfh8dL2qY VlcGPWmeZuPRZ/U5OxOvM/CMjY4dl0YKzxHCav5LzWPYlpoJn8rkKuSxglfhjRPdmrGD SrccnlnXWHxDkMLUsCIIxgkaERnjKdmp7LomzKFEnBgZkR+qyrKUkfXp73cw6ofX1lsp yvNQ== X-Forwarded-Encrypted: i=1; AJvYcCVE/8+OCZpW0a/tfw7noz8DS1R6QfalRvNpmTq2q8boix3IUuf8iAlEjB84PO3gTZYCXs/Q+bGW2FtsXJKNuUUykrE= X-Gm-Message-State: AOJu0Yxj2X/d9yOyGIIaLAFO4Romez2iS48DDfWEqsI3yZ2zjKyClpL5 pkbOYXrFCy1Ei8Tay/QwnNXwKyTN347NrGVupULJdYOAvEvdIPgnj0qiYD6fBtn+Z3ZYqSfTeBI 8mqVUc207ru16adR5/rg3fLFxgiI= X-Google-Smtp-Source: AGHT+IE7PV2TY+W76PIwcCFhD131KssAtTalV6Dnv7cckbBeHOUD0DJU4qZha0YEL0G3eAsMGtcdzai1reBo84fN1AA= X-Received: by 2002:a05:6214:2687:b0:6b0:7a5c:e12c with SMTP id 6a1803df08f44-6b61bf5ce1amr121733076d6.29.1720751153606; Thu, 11 Jul 2024 19:25:53 -0700 (PDT) MIME-Version: 1.0 References: <20240707094956.94654-1-laoar.shao@gmail.com> <20240707094956.94654-4-laoar.shao@gmail.com> <878qyaarm6.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o774a0pv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87frsg9waa.fsf@yhuang6-desk2.ccr.corp.intel.com> <877cds9pa2.fsf@yhuang6-desk2.ccr.corp.intel.com> <87y1678l0f.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87y1678l0f.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yafang Shao Date: Fri, 12 Jul 2024 10:25:16 +0800 Message-ID: Subject: Re: [PATCH 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max To: "Huang, Ying" Cc: akpm@linux-foundation.org, mgorman@techsingularity.net, linux-mm@kvack.org, Matthew Wilcox , David Rientjes Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 892A9A000A X-Stat-Signature: sq97nwcoupjjxnnad88fu87u35xebdki X-HE-Tag: 1720751154-16857 X-HE-Meta: U2FsdGVkX1/PUYU5e0DzJyofCcN6T8I3GLr9s0kGyBYdtIaOS0qD36YhWO0sQ3PhdRU2Yxgqy21QnBZgutRgx0QQuZvU4b9QNtKznWUOAVV1PJ8Ny3sm8nI8oz90QvUdZt8QiybcW0cCryupCr1S2yYH2NaXVZWpj3hzU5Kt3Sl3q3cZzufUBADuID+U2iK/CsirK0RhcVvAD3fZOCR3Pkgl3XFcXXneuo2pxwL1888l3WDPX2lwlmd6d2lMGzm6UEhzkQ2d1VyANWFHvY+YzSmRUkiMxErsXaqNqL/xzhySJNqvFSJJryfA0z1BVKHRqk7fCxRE3U7cdUO8MSfu3Y/293dEgb6GoQObksBipZbsuR9gZwVcBR14d6du/E+00+v+b2cBXtdTV5U1DoBkK2ojh2+NblfKaE6RJpJSfpn4P8a1QRKq5teuhI2u5FEsLIBaFRrm5mm/yySCBV03WzbIanYZoN7wFHTsE92DPtCTmZ+avyBBnOVI/cm3WrWtQ044JhpQcSVd4X+4zQTMK/riYV4ugms/CPfTVmqAquxBc4cMNADRqKgWoK9MtyqL0x8g4WL05XvTgZk/nq9mVWNRaXL81ch3uWGeF07vXLj5erSnwJ1EvbGLyf3DKrS7WbC7GxavhoVAGKcYygRJoUkb0qXppyft9K3MBpa3o1V/SmV1c71zpxG3YCxTjObPokKntyrlkVc9v8FO5cJvyYE4s1WVGMmFVxQJpJwyxmBqgZhuRUYCif7twRhlLKjHeJzpeqBiXimnN3yQQJ4HRuhPgUCqHQBV1bElwXQlCehqgORhq7+BytcP/6klWEhNecpqlnO4LeAzDSdpvAF57zjlZVqTrlMXJUGAbsAq7wtdbBxzBVhuuAwz5aa9fsx2p8PvyAv5XgtjqhbvxeWVflIp7lOZuDXmMrj7OIVZmFfLAM+RG4f+oKr280wHrLdzxFkueue1okPMwWmtxpL rrPruEC4 qrmzaaPx1NwaUolBvZoYp3FYw8K7lEh29x14ZuwxJZv/ZICFrNRdvMY2QkFbk5DVnj4gNMDCDYzETAl05MiL22Bug79f5wuO+UcDgRbXW/QO2Ey4ktyllacYeju9E7mFbjQ46GC2v6quqv7lgNXyGzCz+miyrYlx5x4VhtV3LS+ZNYU25iz5Jzmtuw6o8yf8R5fuCY6QB8OciJPEegEg/gUR8GAyL+jpq3W7DVVSQhoR6x6VnCjf1KMX16KXYSEjvr6CR9NPN0np7IM3tcDJsdVqd6sCNtMg2wc9FzfB5A+HIY9rr/Um02NLEAn8ZQkPMkxePE+Y5caFLGCTr7xUuXC040VYnf5osQtIjcSpdbBigmgbzB14iB9sXXnQ22J1VgLJT1ZuRGtV8FJ0ZOL3OKBgdh4Cp0E1QwQzB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000231, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 12, 2024 at 9:21=E2=80=AFAM Huang, Ying = wrote: > > Yafang Shao writes: > > > On Thu, Jul 11, 2024 at 6:51=E2=80=AFPM Huang, Ying wrote: > >> > >> Yafang Shao writes: > >> > >> > On Thu, Jul 11, 2024 at 4:20=E2=80=AFPM Huang, Ying wrote: > >> >> > >> >> Yafang Shao writes: > >> >> > >> >> > On Thu, Jul 11, 2024 at 2:44=E2=80=AFPM Huang, Ying wrote: > >> >> >> > >> >> >> Yafang Shao writes: > >> >> >> > >> >> >> > On Wed, Jul 10, 2024 at 10:51=E2=80=AFAM Huang, Ying wrote: > >> >> >> >> > >> >> >> >> Yafang Shao writes: > >> >> >> >> > >> >> >> >> > The configuration parameter PCP_BATCH_SCALE_MAX poses chall= enges for > >> >> >> >> > quickly experimenting with specific workloads in a producti= on environment, > >> >> >> >> > particularly when monitoring latency spikes caused by conte= ntion on the > >> >> >> >> > zone->lock. To address this, a new sysctl parameter vm.pcp_= batch_scale_max > >> >> >> >> > is introduced as a more practical alternative. > >> >> >> >> > >> >> >> >> In general, I'm neutral to the change. I can understand that= kernel > >> >> >> >> configuration isn't as flexible as sysctl knob. But, sysctl = knob is ABI > >> >> >> >> too. > >> >> >> >> > >> >> >> >> > To ultimately mitigate the zone->lock contention issue, sev= eral suggestions > >> >> >> >> > have been proposed. One approach involves dividing large zo= nes into multi > >> >> >> >> > smaller zones, as suggested by Matthew[0], while another en= tails splitting > >> >> >> >> > the zone->lock using a mechanism similar to memory arenas a= nd shifting away > >> >> >> >> > from relying solely on zone_id to identify the range of fre= e lists a > >> >> >> >> > particular page belongs to[1]. However, implementing these = solutions is > >> >> >> >> > likely to necessitate a more extended development effort. > >> >> >> >> > >> >> >> >> Per my understanding, the change will hurt instead of improve= zone->lock > >> >> >> >> contention. Instead, it will reduce page allocation/freeing = latency. > >> >> >> > > >> >> >> > I'm quite perplexed by your recent comment. You introduced a > >> >> >> > configuration that has proven to be difficult to use, and you = have > >> >> >> > been resistant to suggestions for modifying it to a more user-= friendly > >> >> >> > and practical tuning approach. May I inquire about the rationa= le > >> >> >> > behind introducing this configuration in the beginning? > >> >> >> > >> >> >> Sorry, I don't understand your words. Do you need me to explain= what is > >> >> >> "neutral"? > >> >> > > >> >> > No, thanks. > >> >> > After consulting with ChatGPT, I received a clear and comprehensi= ve > >> >> > explanation of what "neutral" means, providing me with a better > >> >> > understanding of the concept. > >> >> > > >> >> > So, can you explain why you introduced it as a config in the begi= nning ? > >> >> > >> >> I think that I have explained it in the commit log of commit > >> >> 52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid too= long > >> >> latency"). Which introduces the config. > >> > > >> > What specifically are your expectations for how users should utilize > >> > this config in real production workload? > >> > > >> >> > >> >> Sysctl knob is ABI, which needs to be maintained forever. Can you > >> >> explain why you need it? Why cannot you use a fixed value after in= itial > >> >> experiments. > >> > > >> > Given the extensive scale of our production environment, with hundre= ds > >> > of thousands of servers, it begs the question: how do you propose we > >> > efficiently manage the various workloads that remain unaffected by t= he > >> > sysctl change implemented on just a few thousand servers? Is it > >> > feasible to expect us to recompile and release a new kernel for ever= y > >> > instance where the default value falls short? Surely, there must be > >> > more practical and efficient approaches we can explore together to > >> > ensure optimal performance across all workloads. > >> > > >> > When making improvements or modifications, kindly ensure that they a= re > >> > not solely confined to a test or lab environment. It's vital to also > >> > consider the needs and requirements of our actual users, along with > >> > the diverse workloads they encounter in their daily operations. > >> > >> Have you found that your different systems requires different > >> CONFIG_PCP_BATCH_SCALE_MAX value already? > > > > For specific workloads that introduce latency, we set the value to 0. > > For other workloads, we keep it unchanged until we determine that the > > default value is also suboptimal. What is the issue with this > > approach? > > Firstly, this is a system wide configuration, not workload specific. > So, other workloads run on the same system will be impacted too. Will > you run one workload only on one system? It seems we're living on different planets. You're happily working in your lab environment, while I'm struggling with real-world production issues. For servers: Server 1 to 10,000: vm.pcp_batch_scale_max =3D 0 Server 10,001 to 1,000,000: vm.pcp_batch_scale_max =3D 5 Server 1,000,001 and beyond: Happy with all values Is this hard to understand? In other words: For applications: Application 1 to 10,000: vm.pcp_batch_scale_max =3D 0 Application 10,001 to 1,000,000: vm.pcp_batch_scale_max =3D 5 Application 1,000,001 and beyond: Happy with all values > > Secondly, we need some evidences to introduce a new system ABI. For > example, we need to use different configuration on different systems > otherwise some workloads will be hurt. Can you provide some evidences > to support your change? IMHO, it's not good enough to say I don't know > why I just don't want to change existing systems. If so, it may be > better to wait until you have more evidences. It seems the community encourages developers to experiment with their improvements in lab environments using meticulously designed test cases A, B, C, and as many others as they can imagine, ultimately obtaining perfect data. However, it discourages developers from directly addressing real-world workloads. Sigh. --=20 Regards Yafang