From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BBC0FC3DA61
	for <linux-mm@archiver.kernel.org>; Mon, 29 Jul 2024 06:00:45 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 3E2446B009C; Mon, 29 Jul 2024 02:00:45 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 36AAA6B009E; Mon, 29 Jul 2024 02:00:45 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 1BD636B00A2; Mon, 29 Jul 2024 02:00:45 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id E97FD6B009C
	for <linux-mm@kvack.org>; Mon, 29 Jul 2024 02:00:44 -0400 (EDT)
Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id 8BD3D12011D
	for <linux-mm@kvack.org>; Mon, 29 Jul 2024 06:00:44 +0000 (UTC)
X-FDA: 82391741208.12.2CD78A5
Received: from mail-qv1-f52.google.com (mail-qv1-f52.google.com [209.85.219.52])
	by imf18.hostedemail.com (Postfix) with ESMTP id B066B1C000E
	for <linux-mm@kvack.org>; Mon, 29 Jul 2024 06:00:42 +0000 (UTC)
Authentication-Results: imf18.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=UW7Kc+Ee;
	spf=pass (imf18.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.52 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1722232816;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=rTlVHJjWjasf7H8MTnkRA+H48CYgoQNAQKR5Hr8e2K8=;
	b=iqeO3+zhBbY3x4Xk0YcHAT1KRaUzo/45uwN/oW0zn0QE5JlnsJlKadcqNzx1LTyc7eZCv6
	sX7HY8WYk7m3B0ZZADDlPw3FViekCv5bXM1ObB0BEL3/ZYyd3dHamhf9Wl5Iwqs+9ChUFh
	GoqD1ePFPOfpCOSOQVt1wGoJTyvVP7w=
ARC-Authentication-Results: i=1;
	imf18.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=UW7Kc+Ee;
	spf=pass (imf18.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.52 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722232816; a=rsa-sha256;
	cv=none;
	b=mzy4/R6MdBXlzaCsONAomfalo/vLSgL5dNakQXt8dA4hST9VipALJP2G+lZmmx6e5dkD1i
	MPUc/6qh66wudzcJMPWQcEkRd8TFZytXzqyEVI35BjpKlHP7QDb0RvGg/kRm44PD23xykv
	Sq0uGXQWmVU3coga0kGp3eFdcvWkEkw=
Received: by mail-qv1-f52.google.com with SMTP id 6a1803df08f44-6b97097f7fdso18644326d6.0
        for <linux-mm@kvack.org>; Sun, 28 Jul 2024 23:00:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1722232842; x=1722837642; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=rTlVHJjWjasf7H8MTnkRA+H48CYgoQNAQKR5Hr8e2K8=;
        b=UW7Kc+Eeb5M/enABxODJKYu52b2oRIAOfOjWjLscoPMakN8kZlNUVsO3EvdTv2z0M+
         mZpUyfHuBG+s5H6SfzGk1LQFJp/XKfNj/F8T1gF+odYrCDb45FOCT2khLlApjp8R6xns
         0iaN8VwVCKR2LXBo5S0VBvbiFUNhv1NkkZoD68HjckidHcpGO995noXziWSO8r96QWh9
         5x+sKCz/2lNt3iW19stNCc9SOGr8vDgeMJXbjkNS6AGJhivTd6oiI97OwQgvHMJxweU/
         1lTBJAFByH6x9sx/+6HywZUzJSLyGYLsTGYl+EPCxOa7+4pEn0RrJ+8ckm7NKorQgd6G
         dkLw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722232842; x=1722837642;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=rTlVHJjWjasf7H8MTnkRA+H48CYgoQNAQKR5Hr8e2K8=;
        b=SSVzl5ji/MaU5R6CBRNm4yg2IanbAbSmu7s1rcDJScTG5JhuUxkugVDbv74QKZoiwn
         9ugo1PO67E41Cl/6RfD8+K/2+eOXKikblgR7B0NUirbtOMi5VTGXDt/+mHLOwyw79SBk
         QoXfiYFyirfkZ9c9WY0WcgPrMBzFKVwxi0myL2QY2TEjhIkXqlVDHDumBgD2phDwxCoe
         YayjJIE7Vno27konLmDPW+Lb5qTbp2Lde7nl259r2hySyoyaTqzwoTxP/GttXey1o+6A
         V9hK+LNLo9NbRQah/H9sauhlfvsCXogZdVJdjBSmrWJdV5JhyGx/UNAQXm4NwJtZYuwl
         jerg==
X-Forwarded-Encrypted: i=1; AJvYcCWz/zxeg/6tHskE1sEGiVCwWzifImav1Ep9agMLjOkGrQKNvCf8F3Djgi7QjkSSBLO/tQCzw4KWtQgPG0oFpbnuoMw=
X-Gm-Message-State: AOJu0YzJdCozEVmCuGI6vkkXgX8teNqW3rCABfRuEmX2QQ+Qs+XhG6/p
	dyPtQBQY4ARRAJgo127qGTLkkVMLmp7aZtp6fXJDeIqJpLsTO8+qyUd2ZVeS7EBnT4nwfd7vriY
	qWAF9A5OPebv5uV37syPmdxTeK9A=
X-Google-Smtp-Source: AGHT+IHa7a4WqW05Y10k7aVBfrTISk48yU9tAQAnsUa+3X8fUCdjcqP7EYJ4qtOBRfiSdFgWVbvpPe6cZo3xYtvqE0Q=
X-Received: by 2002:a05:6214:2607:b0:6b5:40d:c2d9 with SMTP id
 6a1803df08f44-6bb559baf6cmr112876876d6.19.1722232841401; Sun, 28 Jul 2024
 23:00:41 -0700 (PDT)
MIME-Version: 1.0
References: <20240729023532.1555-1-laoar.shao@gmail.com> <20240729023532.1555-4-laoar.shao@gmail.com>
 <878qxkyjfr.fsf@yhuang6-desk2.ccr.corp.intel.com> <CALOAHbBm+Mdhu9qcJkoOcLhVUoLKF50dLfhpTd_w_uR8h3yy6g@mail.gmail.com>
 <874j88ye65.fsf@yhuang6-desk2.ccr.corp.intel.com> <CALOAHbCi3E61PWHKBJneWT-A-X_C9ezBdjZn0zrLM9gig5XJ+Q@mail.gmail.com>
 <87zfq0wxub.fsf@yhuang6-desk2.ccr.corp.intel.com>
In-Reply-To: <87zfq0wxub.fsf@yhuang6-desk2.ccr.corp.intel.com>
From: Yafang Shao <laoar.shao@gmail.com>
Date: Mon, 29 Jul 2024 14:00:04 +0800
Message-ID: <CALOAHbDF3veOjfLuoo8ufznvn2w1qxZR18iz3MASOZSiG3jB_A@mail.gmail.com>
Subject: Re: [PATCH v2 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max
To: "Huang, Ying" <ying.huang@intel.com>
Cc: akpm@linux-foundation.org, mgorman@techsingularity.net, linux-mm@kvack.org, 
	Matthew Wilcox <willy@infradead.org>, David Rientjes <rientjes@google.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam03
X-Rspam-User: 
X-Rspamd-Queue-Id: B066B1C000E
X-Stat-Signature: y9qqxp4o6pohnse3jc9563mmjggwdnmh
X-HE-Tag: 1722232842-353778
X-HE-Meta: U2FsdGVkX1/2jzQUWlHyLo5shcUCrT3gFQsoa2lj7PC7L0RDnHUwAXeGgOUVgQHeFfAvbK4nMbYtf4wpYozJbR73355kupKayIqQMNQZsA3qg1gmZ/S2WsNw4vZvvjoi4xRA8KqrbYdgXkcFYMTESCYlBTeiNMkEprIYKtMfIeG1F6UGCKEkGzzdDt557PFRbc8nkUo4oDEUhXestovGO2OUBmjtP0Sdi9BNmXyVjLWu4AcQngH6bKITaWM6tJoqg8UBTrA32Wbvy7DC58t0C60W0WtPvDgT9SEwI0t33ID5HaRKt13vIBakwQrgugXrm6hjQ9sEjNoCJ+OrzffgVSuqMnKwWotG6aO36j5VRfl5IDqc2raaVT2NSYaMXfJiAxRFepJZBLLXbAbr9R6cRVzeb+U3KengudlSp2r2PfRShuFfsgYRJd7xgX/XHHjX8n8GsgQfiaPLkPu32y614/uQ81FdPhMN1ExHq9LPT642r0GDANme4eHzPwuOxAYueGAcJ/pTXd8gb6vRkz2awLcWPVNWir7eoxXXYsV1hmMpJSa3LOdIMTClF7EwSWNtbjj6raBmcl6DAQ/6yYnjgBYNUoNIQFSkSkcusIaYwqgrmP519afOYIsTpAyB5u8sLyWMHXJaEuqs6wrT5gMeHAf/ySrM82U6ttVjSobn4+r/ofDqiI8NpxP2WWDtI3OI9WryIgqD82oA2RGF9pn/XdGwQG27VZ/+q/1Dg0j7NL3ZfZv/SJc1m5I33vE9WojhHy4Bt1fbxap0ylcG9JUV/G2H8SqOgyaihQdpaqS4E9BeJcJmhQmrQYJl6hpSXJ5S5Ps9fYwjpHRu5eHSoQ2eoeWtXRQO5XBbaChEUDy1RLKCpCusLg2fa1w8DAnu4tTlZKwibonWtTevl+nwZntpw/aZh8Bnl67eaVm9C19PO0d5ZFTBwVFjKxpQYxWEX3YHRGHiCAlRHqpNIt3WKiy
 nqwSVnN3
 JkrlTFCGZrhbZZF5NdqSaHtZIeiHG5E9sByUP9a7fFVbS+ZYEqynfioIAXJiUV8nHu726iHOCBQg6Z1LcYWqw5fIBEX0cnSc7uysufTV/gftkXMIiY/zwRN6Sng3ws8AoV19JccjvxaJSJhRmM4hafuHSR56jZb3s1WvB7joF2dBhHxQh2bZ/h3BgyUXPzkxZzwM+qxGtzF9mDkgz6lyYEH1qHkXz6r7YNwBFKpo25U/XVJj4D5YbTbMLfMQzILd5vDQtGT56gO10dZICh8Qiw3caHGZ+wj7pfPVQuxeoFE/75GgUG35ySmy79ucruFoB1UmyTX/L2uogMC7YRa2iuZ7HOBffhdu2K1TlHz/+aydCUcFJDwdo48H/ysR0GciiJq7skXmJcQH9s6KKcg1bek859ZCjWHjwwjRWebO5O2Jv+S4A7ehiZz9FkQteZGDOcpE2uPRNaEoEmCErMwGTFiT1Y3nEClU00Lvi
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Mon, Jul 29, 2024 at 1:54=E2=80=AFPM Huang, Ying <ying.huang@intel.com> =
wrote:
>
> Yafang Shao <laoar.shao@gmail.com> writes:
>
> > On Mon, Jul 29, 2024 at 1:16=E2=80=AFPM Huang, Ying <ying.huang@intel.c=
om> wrote:
> >>
> >> Yafang Shao <laoar.shao@gmail.com> writes:
> >>
> >> > On Mon, Jul 29, 2024 at 11:22=E2=80=AFAM Huang, Ying <ying.huang@int=
el.com> wrote:
> >> >>
> >> >> Hi, Yafang,
> >> >>
> >> >> Yafang Shao <laoar.shao@gmail.com> writes:
> >> >>
> >> >> > During my recent work to resolve latency spikes caused by zone->l=
ock
> >> >> > contention[0], I found that CONFIG_PCP_BATCH_SCALE_MAX is difficu=
lt to use
> >> >> > in practice.
> >> >>
> >> >> As we discussed before [1], I still feel confusing about the descri=
ption
> >> >> about zone->lock contention.  How about change the description to
> >> >> something like,
> >> >
> >> > Sure, I will change it.
> >> >
> >> >>
> >> >> Larger page allocation/freeing batch number may cause longer run ti=
me of
> >> >> code holding zone->lock.  If zone->lock is heavily contended at the=
 same
> >> >> time, latency spikes may occur even for casual page allocation/free=
ing.
> >> >> Although reducing the batch number cannot make zone->lock contended
> >> >> lighter, it can reduce the latency spikes effectively.
> >> >>
> >> >> [1] https://lore.kernel.org/linux-mm/87ttgv8hlz.fsf@yhuang6-desk2.c=
cr.corp.intel.com/
> >> >>
> >> >> > To demonstrate this, I wrote a Python script:
> >> >> >
> >> >> >   import mmap
> >> >> >
> >> >> >   size =3D 6 * 1024**3
> >> >> >
> >> >> >   while True:
> >> >> >       mm =3D mmap.mmap(-1, size)
> >> >> >       mm[:] =3D b'\xff' * size
> >> >> >       mm.close()
> >> >> >
> >> >> > Run this script 10 times in parallel and measure the allocation l=
atency by
> >> >> > measuring the duration of rmqueue_bulk() with the BCC tools
> >> >> > funclatency[1]:
> >> >> >
> >> >> >   funclatency -T -i 600 rmqueue_bulk
> >> >> >
> >> >> > Here are the results for both AMD and Intel CPUs.
> >> >> >
> >> >> > AMD EPYC 7W83 64-Core Processor, single NUMA node, KVM virtual se=
rver
> >> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >> >> >
> >> >> > - Default value of 5
> >> >> >
> >> >> >      nsecs               : count     distribution
> >> >> >          0 -> 1          : 0        |                            =
            |
> >> >> >          2 -> 3          : 0        |                            =
            |
> >> >> >          4 -> 7          : 0        |                            =
            |
> >> >> >          8 -> 15         : 0        |                            =
            |
> >> >> >         16 -> 31         : 0        |                            =
            |
> >> >> >         32 -> 63         : 0        |                            =
            |
> >> >> >         64 -> 127        : 0        |                            =
            |
> >> >> >        128 -> 255        : 0        |                            =
            |
> >> >> >        256 -> 511        : 0        |                            =
            |
> >> >> >        512 -> 1023       : 12       |                            =
            |
> >> >> >       1024 -> 2047       : 9116     |                            =
            |
> >> >> >       2048 -> 4095       : 2004     |                            =
            |
> >> >> >       4096 -> 8191       : 2497     |                            =
            |
> >> >> >       8192 -> 16383      : 2127     |                            =
            |
> >> >> >      16384 -> 32767      : 2483     |                            =
            |
> >> >> >      32768 -> 65535      : 10102    |                            =
            |
> >> >> >      65536 -> 131071     : 212730   |*******************         =
            |
> >> >> >     131072 -> 262143     : 314692   |****************************=
*           |
> >> >> >     262144 -> 524287     : 430058   |****************************=
************|
> >> >> >     524288 -> 1048575    : 224032   |********************        =
            |
> >> >> >    1048576 -> 2097151    : 73567    |******                      =
            |
> >> >> >    2097152 -> 4194303    : 17079    |*                           =
            |
> >> >> >    4194304 -> 8388607    : 3900     |                            =
            |
> >> >> >    8388608 -> 16777215   : 750      |                            =
            |
> >> >> >   16777216 -> 33554431   : 88       |                            =
            |
> >> >> >   33554432 -> 67108863   : 2        |                            =
            |
> >> >> >
> >> >> > avg =3D 449775 nsecs, total: 587066511229 nsecs, count: 1305242
> >> >> >
> >> >> > The avg alloc latency can be 449us, and the max latency can be hi=
gher
> >> >> > than 30ms.
> >> >> >
> >> >> > - Value set to 0
> >> >> >
> >> >> >      nsecs               : count     distribution
> >> >> >          0 -> 1          : 0        |                            =
            |
> >> >> >          2 -> 3          : 0        |                            =
            |
> >> >> >          4 -> 7          : 0        |                            =
            |
> >> >> >          8 -> 15         : 0        |                            =
            |
> >> >> >         16 -> 31         : 0        |                            =
            |
> >> >> >         32 -> 63         : 0        |                            =
            |
> >> >> >         64 -> 127        : 0        |                            =
            |
> >> >> >        128 -> 255        : 0        |                            =
            |
> >> >> >        256 -> 511        : 0        |                            =
            |
> >> >> >        512 -> 1023       : 92       |                            =
            |
> >> >> >       1024 -> 2047       : 8594     |                            =
            |
> >> >> >       2048 -> 4095       : 2042818  |******                      =
            |
> >> >> >       4096 -> 8191       : 8737624  |**************************  =
            |
> >> >> >       8192 -> 16383      : 13147872 |****************************=
************|
> >> >> >      16384 -> 32767      : 8799951  |**************************  =
            |
> >> >> >      32768 -> 65535      : 2879715  |********                    =
            |
> >> >> >      65536 -> 131071     : 659600   |**                          =
            |
> >> >> >     131072 -> 262143     : 204004   |                            =
            |
> >> >> >     262144 -> 524287     : 78246    |                            =
            |
> >> >> >     524288 -> 1048575    : 30800    |                            =
            |
> >> >> >    1048576 -> 2097151    : 12251    |                            =
            |
> >> >> >    2097152 -> 4194303    : 2950     |                            =
            |
> >> >> >    4194304 -> 8388607    : 78       |                            =
            |
> >> >> >
> >> >> > avg =3D 19359 nsecs, total: 708638369918 nsecs, count: 36604636
> >> >> >
> >> >> > The avg was reduced significantly to 19us, and the max latency is=
 reduced
> >> >> > to less than 8ms.
> >> >> >
> >> >> > - Conclusion
> >> >> >
> >> >> > On this AMD CPU, reducing vm.pcp_batch_scale_max significantly he=
lps reduce
> >> >> > latency. Latency-sensitive applications will benefit from this tu=
ning.
> >> >> >
> >> >> > However, I don't have access to other types of AMD CPUs, so I was=
 unable to
> >> >> > test it on different AMD models.
> >> >> >
> >> >> > Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz, two NUMA nodes
> >> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >> >> >
> >> >> > - Default value of 5
> >> >> >
> >> >> >      nsecs               : count     distribution
> >> >> >          0 -> 1          : 0        |                            =
            |
> >> >> >          2 -> 3          : 0        |                            =
            |
> >> >> >          4 -> 7          : 0        |                            =
            |
> >> >> >          8 -> 15         : 0        |                            =
            |
> >> >> >         16 -> 31         : 0        |                            =
            |
> >> >> >         32 -> 63         : 0        |                            =
            |
> >> >> >         64 -> 127        : 0        |                            =
            |
> >> >> >        128 -> 255        : 0        |                            =
            |
> >> >> >        256 -> 511        : 0        |                            =
            |
> >> >> >        512 -> 1023       : 2419     |                            =
            |
> >> >> >       1024 -> 2047       : 34499    |*                           =
            |
> >> >> >       2048 -> 4095       : 4272     |                            =
            |
> >> >> >       4096 -> 8191       : 9035     |                            =
            |
> >> >> >       8192 -> 16383      : 4374     |                            =
            |
> >> >> >      16384 -> 32767      : 2963     |                            =
            |
> >> >> >      32768 -> 65535      : 6407     |                            =
            |
> >> >> >      65536 -> 131071     : 884806   |****************************=
************|
> >> >> >     131072 -> 262143     : 145931   |******                      =
            |
> >> >> >     262144 -> 524287     : 13406    |                            =
            |
> >> >> >     524288 -> 1048575    : 1874     |                            =
            |
> >> >> >    1048576 -> 2097151    : 249      |                            =
            |
> >> >> >    2097152 -> 4194303    : 28       |                            =
            |
> >> >> >
> >> >> > avg =3D 96173 nsecs, total: 106778157925 nsecs, count: 1110263
> >> >> >
> >> >> > - Conclusion
> >> >> >
> >> >> > This Intel CPU works fine with the default setting.
> >> >> >
> >> >> > Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz, single NUMA node
> >> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >> >> >
> >> >> > Using the cpuset cgroup, we can restrict the test script to run o=
n NUMA
> >> >> > node 0 only.
> >> >> >
> >> >> > - Default value of 5
> >> >> >
> >> >> >      nsecs               : count     distribution
> >> >> >          0 -> 1          : 0        |                            =
            |
> >> >> >          2 -> 3          : 0        |                            =
            |
> >> >> >          4 -> 7          : 0        |                            =
            |
> >> >> >          8 -> 15         : 0        |                            =
            |
> >> >> >         16 -> 31         : 0        |                            =
            |
> >> >> >         32 -> 63         : 0        |                            =
            |
> >> >> >         64 -> 127        : 0        |                            =
            |
> >> >> >        128 -> 255        : 0        |                            =
            |
> >> >> >        256 -> 511        : 46       |                            =
            |
> >> >> >        512 -> 1023       : 695      |                            =
            |
> >> >> >       1024 -> 2047       : 19950    |*                           =
            |
> >> >> >       2048 -> 4095       : 1788     |                            =
            |
> >> >> >       4096 -> 8191       : 3392     |                            =
            |
> >> >> >       8192 -> 16383      : 2569     |                            =
            |
> >> >> >      16384 -> 32767      : 2619     |                            =
            |
> >> >> >      32768 -> 65535      : 3809     |                            =
            |
> >> >> >      65536 -> 131071     : 616182   |****************************=
************|
> >> >> >     131072 -> 262143     : 295587   |*******************         =
            |
> >> >> >     262144 -> 524287     : 75357    |****                        =
            |
> >> >> >     524288 -> 1048575    : 15471    |*                           =
            |
> >> >> >    1048576 -> 2097151    : 2939     |                            =
            |
> >> >> >    2097152 -> 4194303    : 243      |                            =
            |
> >> >> >    4194304 -> 8388607    : 3        |                            =
            |
> >> >> >
> >> >> > avg =3D 144410 nsecs, total: 150281196195 nsecs, count: 1040651
> >> >> >
> >> >> > The zone->lock contention becomes severe when there is only a sin=
gle NUMA
> >> >> > node. The average latency is approximately 144us, with the maximu=
m
> >> >> > latency exceeding 4ms.
> >> >> >
> >> >> > - Value set to 0
> >> >> >
> >> >> >      nsecs               : count     distribution
> >> >> >          0 -> 1          : 0        |                            =
            |
> >> >> >          2 -> 3          : 0        |                            =
            |
> >> >> >          4 -> 7          : 0        |                            =
            |
> >> >> >          8 -> 15         : 0        |                            =
            |
> >> >> >         16 -> 31         : 0        |                            =
            |
> >> >> >         32 -> 63         : 0        |                            =
            |
> >> >> >         64 -> 127        : 0        |                            =
            |
> >> >> >        128 -> 255        : 0        |                            =
            |
> >> >> >        256 -> 511        : 24       |                            =
            |
> >> >> >        512 -> 1023       : 2686     |                            =
            |
> >> >> >       1024 -> 2047       : 10246    |                            =
            |
> >> >> >       2048 -> 4095       : 4061529  |*********                   =
            |
> >> >> >       4096 -> 8191       : 16894971 |****************************=
************|
> >> >> >       8192 -> 16383      : 6279310  |**************              =
            |
> >> >> >      16384 -> 32767      : 1658240  |***                         =
            |
> >> >> >      32768 -> 65535      : 445760   |*                           =
            |
> >> >> >      65536 -> 131071     : 110817   |                            =
            |
> >> >> >     131072 -> 262143     : 20279    |                            =
            |
> >> >> >     262144 -> 524287     : 4176     |                            =
            |
> >> >> >     524288 -> 1048575    : 436      |                            =
            |
> >> >> >    1048576 -> 2097151    : 8        |                            =
            |
> >> >> >    2097152 -> 4194303    : 2        |                            =
            |
> >> >> >
> >> >> > avg =3D 8401 nsecs, total: 247739809022 nsecs, count: 29488508
> >> >> >
> >> >> > After setting it to 0, the avg latency is reduced to around 8us, =
and the
> >> >> > max latency is less than 4ms.
> >> >> >
> >> >> > - Conclusion
> >> >> >
> >> >> > On this Intel CPU, this tuning doesn't help much. Latency-sensiti=
ve
> >> >> > applications work well with the default setting.
> >> >> >
> >> >> > It is worth noting that all the above data were tested using the =
upstream
> >> >> > kernel.
> >> >> >
> >> >> > Why introduce a systl knob?
> >> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
> >> >> >
> >> >> > From the above data, it's clear that different CPU types have var=
ying
> >> >> > allocation latencies concerning zone->lock contention. Typically,=
 people
> >> >> > don't release individual kernel packages for each type of x86_64 =
CPU.
> >> >> >
> >> >> > Furthermore, for latency-insensitive applications, we can keep th=
e default
> >> >> > setting for better throughput. In our production environment, we =
set this
> >> >> > value to 0 for applications running on Kubernetes servers while k=
eeping it
> >> >> > at the default value of 5 for other applications like big data. I=
t's not
> >> >> > common to release individual kernel packages for each application=
.
> >> >>
> >> >> Thanks for detailed performance data!
> >> >>
> >> >> Is there any downside observed to set CONFIG_PCP_BATCH_SCALE_MAX to=
 0 in
> >> >> your environment?  If not, I suggest to use 0 as default for
> >> >> CONFIG_PCP_BATCH_SCALE_MAX.  Because we have clear evidence that
> >> >> CONFIG_PCP_BATCH_SCALE_MAX hurts latency for some workloads.  After
> >> >> that, if someone found some other workloads need larger
> >> >> CONFIG_PCP_BATCH_SCALE_MAX, we can make it tunable dynamically.
> >> >>
> >> >
> >> > The decision doesn=E2=80=99t rest with us, the kernel team at our co=
mpany.
> >> > It=E2=80=99s made by the system administrators who manage a large nu=
mber of
> >> > servers. The latency spikes only occur on the Kubernetes (k8s)
> >> > servers, not in other environments like big data servers. We have
> >> > informed other system administrators, such as those managing the big
> >> > data servers, about the latency spike issues, but they are unwilling
> >> > to make the change.
> >> >
> >> > No one wants to make changes unless there is evidence showing that t=
he
> >> > old settings will negatively impact them. However, as you know,
> >> > latency is not a critical concern for big data; throughput is more
> >> > important. If we keep the current settings, we will have to release
> >> > different kernel packages for different environments, which is a
> >> > significant burden for us.
> >>
> >> Totally understand your requirements.  And, I think that this is bette=
r
> >> to be resolved in your downstream kernel.  If there are clear evidence=
s
> >> to prove small batch number hurts throughput for some workloads, we ca=
n
> >> make the change in the upstream kernel.
> >>
> >
> > Please don't make this more complicated. We are at an impasse.
> >
> > The key issue here is that the upstream kernel has a default value of
> > 5, not 0. If you can change it to 0, we can persuade our users to
> > follow the upstream changes. They currently set it to 5, not because
> > you, the author, chose this value, but because it is the default in
> > Linus's tree. Since it's in Linus's tree, kernel developers worldwide
> > support it. It's not just your decision as the author, but the entire
> > community supports this default.
> >
> > If, in the future, we find that the value of 0 is not suitable, you'll
> > tell us, "It is an issue in your downstream kernel, not in the
> > upstream kernel, so we won't accept it."  PANIC.
>
> I don't think so.  I suggest you to change the default value to 0.  If
> someone reported that his workloads need some other value, then we have
> evidence that different workloads need different value.  At that time,
> we can suggest to add an user tunable knob.
>

The problem is that others are unaware we've set it to 0, and I can't
constantly monitor the linux-mm mailing list. Additionally, it's
possible that you can't always keep an eye on it either.

I believe we should hear Andrew's suggestion. Andrew, what is your opinion?


--=20
Regards
Yafang