From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6282C3065C for ; Wed, 3 Jul 2024 02:14:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3719B6B007B; Tue, 2 Jul 2024 22:14:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 31F5B6B009A; Tue, 2 Jul 2024 22:14:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C0C36B009B; Tue, 2 Jul 2024 22:14:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F06806B009A for ; Tue, 2 Jul 2024 22:14:22 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id AF7FAA23E5 for ; Wed, 3 Jul 2024 02:14:22 +0000 (UTC) X-FDA: 82296821964.07.5945A3A Received: from mail-qv1-f44.google.com (mail-qv1-f44.google.com [209.85.219.44]) by imf14.hostedemail.com (Postfix) with ESMTP id E3C7910000F for ; Wed, 3 Jul 2024 02:14:20 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fwOJczb6; spf=pass (imf14.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.44 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719972843; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iL9UMUTsixqmzKEbtWP25gh8uPrwXtAAwu9SsjWihwU=; b=SdsUeXh0nMECEpQqTqDDml4SVzeJ/eCBd6pz9opHKnYYttoBw+vlxv73DEIjXy61czfXsQ iA9ZmljEa9UnO1lPPavCgqblGh7lVIGKR65tmo/7w8DgaJxEKlq806so5Nm0pg+AbACncV pVVUKjngtymrp96hybX92iTHVlHqxxw= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fwOJczb6; spf=pass (imf14.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.44 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719972843; a=rsa-sha256; cv=none; b=ybGVKcgp1Dk051590TGipzLnSdS0TFu4UKSIjW2P9XoVkatPXLn+GXxcABaV4AMEE6iIpk 8s4bdJydFByF3SdIBTGt+buL77l4siwVS+3o6cKQZ4u/edLZzBIKpZyHqKHB0/T5Z41M7y ta2odZfB4wrafod+eSdyQXctq87bRRs= Received: by mail-qv1-f44.google.com with SMTP id 6a1803df08f44-6b5daf5ea91so5361836d6.1 for ; Tue, 02 Jul 2024 19:14:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719972860; x=1720577660; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=iL9UMUTsixqmzKEbtWP25gh8uPrwXtAAwu9SsjWihwU=; b=fwOJczb6Jl8Qgp25mXU65daeTIDgCKzbhgvsM81hgwHU7F8MF8WmH2BS2jeLSDxNlx YQiviOH7ktvHQ14zKdloSskESx/kqyfHHZzYxHFjIkXCgyBkwxrk8/aDnSkOD6o9lge7 LV03aa4A8uREXTLQekHFwJDWqFykSdW7CZnAnI5bEZZZJ5T6Gn6rK7bGcYFE2Wb3bJ6b oKwkVM7HkWQ0C5OZPdMGM6jGSprLIKRIOn7dl4ka8M7pzTUdlmw97WCCDKg0WW1CgRsf 9MoOXT5LOUrXYG6nDw+inVVXqhxqELwzxru0sZjpNiMmz12Cp5IBg87wd4y0iQE9Pjfr 95tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719972860; x=1720577660; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iL9UMUTsixqmzKEbtWP25gh8uPrwXtAAwu9SsjWihwU=; b=tmvqfgX1v0exoG2kjjmOwSAPFlyYUGH+NvtvphEDoKXZO1juHb1rAYpfXIy4oxUITg 8BhV0wQFVxfo2Q7b14qRs1wHKsBqedGk/KOrqGOu9oFz/NuqpE4LB7eqlBtBAxY5i52i rnIWa0B/5ZeqWPA0gxbWQ12yDUggLdqA4VhyasL9Yt77QRQmvq0o72XuYj7UrxR9thN/ KCRb7KdpO3VGBpWyDmUpAAZz521wOmsE9MyKon2U+cY6bqti78EogzFYedDRKY/HMROn 43sUoMNYJIS6StzI0NXkam6nFh1wEtIRps9Qi9YBddPpIuQxGtp82ed27KYCoKUrHd1R thyg== X-Forwarded-Encrypted: i=1; AJvYcCWL+S10C7cdcDKTIRHvl60bTn/kdPQow/SjIXa0RjjG1CQ+OPtn3BUublufcaf1JXVo7kbUCd/Yc6cauIgsTOOsj3E= X-Gm-Message-State: AOJu0YwsoxSJzAV5XVySywECjYHdHutjXzAmSZZfz92pA6YLuYzwcs9D 1u2B80K3ILnX0TW29Xo0Zaxp3rFobjWOlT70Q+ljwjwcjeGOcrmET7EDn38CHvz1OcweGKVdPxQ /E9sVmnGDjzZlAsLJE7rk4yiOYaA= X-Google-Smtp-Source: AGHT+IGFvWgSVn6G6bMvdNe2/lg5Z5VNW7aLAthVJ6IMRw7DpqhZGEbv2xrnJrzLlwEiHzEgQV2abUs2M48tUYKjKKo= X-Received: by 2002:ad4:5bce:0:b0:6b5:e2d9:cfb9 with SMTP id 6a1803df08f44-6b5e2d9d398mr1018246d6.39.1719972859907; Tue, 02 Jul 2024 19:14:19 -0700 (PDT) MIME-Version: 1.0 References: <20240701142046.6050-1-laoar.shao@gmail.com> <20240701195143.7e8d597abc14b255f3bc4bcd@linux-foundation.org> <874j98noth.fsf@yhuang6-desk2.ccr.corp.intel.com> <87wmm3kzmy.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87wmm3kzmy.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yafang Shao Date: Wed, 3 Jul 2024 10:13:43 +0800 Message-ID: Subject: Re: [PATCH] mm: Enable setting -1 for vm.percpu_pagelist_high_fraction to set the minimum pagelist To: "Huang, Ying" Cc: Andrew Morton , linux-mm@kvack.org, Matthew Wilcox , David Rientjes , Mel Gorman Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: mzsaf44bawmpi6kdcnuuffezg98axnnx X-Rspam-User: X-Rspamd-Queue-Id: E3C7910000F X-Rspamd-Server: rspam02 X-HE-Tag: 1719972860-236557 X-HE-Meta: U2FsdGVkX19nSBNcA0LmCO6cHs63zLWaKMVL8V3LxSGwpaNUyRl9yQ+2KpsTyb+x9y4BAF/KloZJtkP4iq0FagGwpm84BJbqDeszuLisskd+O5qVCxbUMdwxKju4fVAHMh2PvSXtXcr0lzsyR+xgDlRY77/wNnPgRfX+qmKJE3x2+WC/433F2RGsZouuvjdMzV0hvySlfIK0ZeOXLFBfOPYEoSDFmYdXtqElFPk6W8BJgAAb5VgtdxgyMNWPbpVlNuPqbHV/THM+mJq6x2zXAp/t2bfUWsrFkfWTenjoxrrAGD+7bzo0msw251k/QlS2oYueyeTzPGSlK+wO19MaZRo4D7HxCUoWQW5x1FQpXMm+81DJcTuZaumUogrXV4NrmC8cRqsmC5dbu34QkC1bCwarRpP9jv/TZzJBPaYHt2QDcUtRnGjrlXP+fDOdjPPep/eie0pw0kmjZ/YLDBIkclbylqfx3USW5F91I9EmtD4c+rYGyptvcisLarwv1xg0I+Fi5lQ4meF8TTeQV76MrZ6+9bav1AxD57tLpuM0hU3ZRu/uNVsJ0mVA/il5qqxNuVn602y2Nxf6w2VjSSq/DVHfWvMy0S0251XtNlhoHLcPFIB6PwmctjJmEBs3gDKPAvuQUMlDHhx3nVSk4ZTrKUgvAXd+ig2JkyttheI3ja0xaJnueEZ0ujGwx298lkxFJ8wEypL/Sv/dN4+pjBD+Z9a+WDWs1g+IGUWvzDT0O2ziPLUc0Gb7jM2MyziY0kfNziDQHYjSZBXfXPJWcbwYluLQyyO6lB+I2DUnTfxt5nenEuU8THNy4f1UHQIG9S150tdieRbAghMB3FcW/dHIZ2irMjTBglLlJsAJ4B1onj6d0nhhYXKTtQzj1yrQj5CjobDGnuyjkmOjx1Yb4yWDlsbgW+x7alat3QIqpggaK6l8Yypqk7t+2xUX/6eP2Objcd5T7fegMFzi+njrrE9 SWUnTN5c ZoZnAL3wpdWTqi6fLnx8ICDK6aje05i0bGOoDByzqOl78y/BXFahlXIoFBwK8+ssr9eErVMnIMamycaGyaCfczLWrgRE2V2gj3R6Yv+VRELxKhSuWqiM/nQOh7tJ454t5Dwlc34hqgDgWjaXP/+wOCvaMnOKtYzPocbdqwM6V25c48lLPCqzRzylJzp+wQ+wy8+sz+XNHrNZCNbtQnWRjZ7l2HH29HBg4RQhjLECq/h2orcO2lnlDvLknUwURXNY5kJqMT622EnBKly4WlN5Su+o1VdjNH2cTUXIRBpUgZO/eAfgl5cOjgByQieIOQblVzqldqfxYA1pl0JQ9kgSbdBGcC2PfzQ8XAv4ivtJ7ngDf1iNRltgE6ZM/a/mumnFH6HZeSNDFFV5j92nqxwLzxGMGfhnZptEJc/eB8WD2uEEeHvGj4d44w8eUw74G6nhBjmsCJyTls6swFiyuu7YXhttWihq7mh+5r/9A X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 3, 2024 at 9:57=E2=80=AFAM Huang, Ying w= rote: > > Yafang Shao writes: > > > On Tue, Jul 2, 2024 at 5:10=E2=80=AFPM Huang, Ying wrote: > >> > >> Yafang Shao writes: > >> > >> > On Tue, Jul 2, 2024 at 10:51=E2=80=AFAM Andrew Morton wrote: > >> >> > >> >> On Mon, 1 Jul 2024 22:20:46 +0800 Yafang Shao wrote: > >> >> > >> >> > Currently, we're encountering latency spikes in our container env= ironment > >> >> > when a specific container with multiple Python-based tasks exits.= These > >> >> > tasks may hold the zone->lock for an extended period, significant= ly > >> >> > impacting latency for other containers attempting to allocate mem= ory. > >> >> > >> >> Is this locking issue well understood? Is anyone working on it? A > >> >> reasonably detailed description of the issue and a description of a= ny > >> >> ongoing work would be helpful here. > >> > > >> > In our containerized environment, we have a specific type of contain= er > >> > that runs 18 processes, each consuming approximately 6GB of RSS. The= se > >> > processes are organized as separate processes rather than threads du= e > >> > to the Python Global Interpreter Lock (GIL) being a bottleneck in a > >> > multi-threaded setup. Upon the exit of these containers, other > >> > containers hosted on the same machine experience significant latency > >> > spikes. > >> > > >> > Our investigation using perf tracing revealed that the root cause of > >> > these spikes is the simultaneous execution of exit_mmap() by each of > >> > the exiting processes. This concurrent access to the zone->lock > >> > results in contention, which becomes a hotspot and negatively impact= s > >> > performance. The perf results clearly indicate this contention as a > >> > primary contributor to the observed latency issues. > >> > > >> > + 77.02% 0.00% uwsgi [kernel.kallsyms] > >> > [k] mmput =E2=96=92 > >> > - 76.98% 0.01% uwsgi [kernel.kallsyms] > >> > [k] exit_mmap =E2=96=92 > >> > - 76.97% exit_mmap > >> > =E2=96=92 > >> > - 58.58% unmap_vmas > >> > =E2=96=92 > >> > - 58.55% unmap_single_vma > >> > =E2=96=92 > >> > - unmap_page_range > >> > =E2=96=92 > >> > - 58.32% zap_pte_range > >> > =E2=96=92 > >> > - 42.88% tlb_flush_mmu > >> > =E2=96=92 > >> > - 42.76% free_pages_and_swap_cache > >> > =E2=96=92 > >> > - 41.22% release_pages > >> > =E2=96=92 > >> > - 33.29% free_unref_page_list > >> > =E2=96=92 > >> > - 32.37% free_unref_page_commit > >> > =E2=96=92 > >> > - 31.64% free_pcppages_bulk > >> > =E2=96=92 > >> > + 28.65% _raw_spin_lock > >> > =E2=96=92 > >> > 1.28% __list_del_entry_valid > >> > =E2=96=92 > >> > + 3.25% folio_lruvec_lock_irqsave > >> > =E2=96=92 > >> > + 0.75% __mem_cgroup_uncharge_list > >> > =E2=96=92 > >> > 0.60% __mod_lruvec_state > >> > =E2=96=92 > >> > 1.07% free_swap_cache > >> > =E2=96=92 > >> > + 11.69% page_remove_rmap > >> > =E2=96=92 > >> > 0.64% __mod_lruvec_page_state > >> > - 17.34% remove_vma > >> > =E2=96=92 > >> > - 17.25% vm_area_free > >> > =E2=96=92 > >> > - 17.23% kmem_cache_free > >> > =E2=96=92 > >> > - 17.15% __slab_free > >> > =E2=96=92 > >> > - 14.56% discard_slab > >> > =E2=96=92 > >> > free_slab > >> > =E2=96=92 > >> > __free_slab > >> > =E2=96=92 > >> > __free_pages > >> > =E2=96=92 > >> > - free_unref_page > >> > =E2=96=92 > >> > - 13.50% free_unref_page_commit > >> > =E2=96=92 > >> > - free_pcppages_bulk > >> > =E2=96=92 > >> > + 13.44% _raw_spin_lock > >> > > >> > By enabling the mm_page_pcpu_drain() we can find the detailed stack: > >> > > >> > <...>-1540432 [224] d..3. 618048.023883: mm_page_pcpu_drai= n: > >> > page=3D0000000035a1b0b7 pfn=3D0x11c19c72 order=3D0 migratetyp > >> > e=3D1 > >> > <...>-1540432 [224] d..3. 618048.023887: > >> > =3D> free_pcppages_bulk > >> > =3D> free_unref_page_commit > >> > =3D> free_unref_page_list > >> > =3D> release_pages > >> > =3D> free_pages_and_swap_cache > >> > =3D> tlb_flush_mmu > >> > =3D> zap_pte_range > >> > =3D> unmap_page_range > >> > =3D> unmap_single_vma > >> > =3D> unmap_vmas > >> > =3D> exit_mmap > >> > =3D> mmput > >> > =3D> do_exit > >> > =3D> do_group_exit > >> > =3D> get_signal > >> > =3D> arch_do_signal_or_restart > >> > =3D> exit_to_user_mode_prepare > >> > =3D> syscall_exit_to_user_mode > >> > =3D> do_syscall_64 > >> > =3D> entry_SYSCALL_64_after_hwframe > >> > > >> > The servers experiencing these issues are equipped with impressive > >> > hardware specifications, including 256 CPUs and 1TB of memory, all > >> > within a single NUMA node. The zoneinfo is as follows, > >> > > >> > Node 0, zone Normal > >> > pages free 144465775 > >> > boost 0 > >> > min 1309270 > >> > low 1636587 > >> > high 1963904 > >> > spanned 564133888 > >> > present 296747008 > >> > managed 291974346 > >> > cma 0 > >> > protection: (0, 0, 0, 0) > >> > ... > >> > ... > >> > pagesets > >> > cpu: 0 > >> > count: 2217 > >> > high: 6392 > >> > batch: 63 > >> > vm stats threshold: 125 > >> > cpu: 1 > >> > count: 4510 > >> > high: 6392 > >> > batch: 63 > >> > vm stats threshold: 125 > >> > cpu: 2 > >> > count: 3059 > >> > high: 6392 > >> > batch: 63 > >> > > >> > ... > >> > > >> > The high is around 100 times the batch size. > >> > > >> > We also traced the latency associated with the free_pcppages_bulk() > >> > function during the container exit process: > >> > > >> > 19:48:54 > >> > nsecs : count distribution > >> > 0 -> 1 : 0 | = | > >> > 2 -> 3 : 0 | = | > >> > 4 -> 7 : 0 | = | > >> > 8 -> 15 : 0 | = | > >> > 16 -> 31 : 0 | = | > >> > 32 -> 63 : 0 | = | > >> > 64 -> 127 : 0 | = | > >> > 128 -> 255 : 0 | = | > >> > 256 -> 511 : 148 |***************** = | > >> > 512 -> 1023 : 334 |*******************************= *********| > >> > 1024 -> 2047 : 33 |*** = | > >> > 2048 -> 4095 : 5 | = | > >> > 4096 -> 8191 : 7 | = | > >> > 8192 -> 16383 : 12 |* = | > >> > 16384 -> 32767 : 30 |*** = | > >> > 32768 -> 65535 : 21 |** = | > >> > 65536 -> 131071 : 15 |* = | > >> > 131072 -> 262143 : 27 |*** = | > >> > 262144 -> 524287 : 84 |********** = | > >> > 524288 -> 1048575 : 203 |************************ = | > >> > 1048576 -> 2097151 : 284 |*******************************= *** | > >> > 2097152 -> 4194303 : 327 |*******************************= ******** | > >> > 4194304 -> 8388607 : 215 |************************* = | > >> > 8388608 -> 16777215 : 116 |************* = | > >> > 16777216 -> 33554431 : 47 |***** = | > >> > 33554432 -> 67108863 : 8 | = | > >> > 67108864 -> 134217727 : 3 | = | > >> > > >> > avg =3D 3066311 nsecs, total: 5887317501 nsecs, count: 1920 > >> > > >> > The latency can reach tens of milliseconds. > >> > > >> > By adjusting the vm.percpu_pagelist_high_fraction parameter to set t= he > >> > minimum pagelist high at 4 times the batch size, we were able to > >> > significantly reduce the latency associated with the > >> > free_pcppages_bulk() function during container exits.: > >> > > >> > nsecs : count distribution > >> > 0 -> 1 : 0 | = | > >> > 2 -> 3 : 0 | = | > >> > 4 -> 7 : 0 | = | > >> > 8 -> 15 : 0 | = | > >> > 16 -> 31 : 0 | = | > >> > 32 -> 63 : 0 | = | > >> > 64 -> 127 : 0 | = | > >> > 128 -> 255 : 120 | = | > >> > 256 -> 511 : 365 |* = | > >> > 512 -> 1023 : 201 | = | > >> > 1024 -> 2047 : 103 | = | > >> > 2048 -> 4095 : 84 | = | > >> > 4096 -> 8191 : 87 | = | > >> > 8192 -> 16383 : 4777 |************** = | > >> > 16384 -> 32767 : 10572 |*******************************= | > >> > 32768 -> 65535 : 13544 |*******************************= *********| > >> > 65536 -> 131071 : 12723 |*******************************= ****** | > >> > 131072 -> 262143 : 8604 |************************* = | > >> > 262144 -> 524287 : 3659 |********** = | > >> > 524288 -> 1048575 : 921 |** = | > >> > 1048576 -> 2097151 : 122 | = | > >> > 2097152 -> 4194303 : 5 | = | > >> > > >> > avg =3D 103814 nsecs, total: 5805802787 nsecs, count: 55925 > >> > > >> > After successfully tuning the vm.percpu_pagelist_high_fraction sysct= l > >> > knob to set the minimum pagelist high at a level that effectively > >> > mitigated latency issues, we observed that other containers were no > >> > longer experiencing similar complaints. As a result, we decided to > >> > implement this tuning as a permanent workaround and have deployed it > >> > across all clusters of servers where these containers may be deploye= d. > >> > >> Thanks for your detailed data. > >> > >> IIUC, the latency of free_pcppages_bulk() during process exiting > >> shouldn't be a problem? > > > > Right. The problem arises when the process holds the lock for too > > long, causing other processes that are attempting to allocate memory > > to experience delays or wait times. > > > >> Because users care more about the total time of > >> process exiting, that is, throughput. And I suspect that the zone->lo= ck > >> contention and page allocating/freeing throughput will be worse with > >> your configuration? > > > > While reducing throughput may not be a significant concern due to the > > minimal difference, the potential for latency spikes, a crucial metric > > for assessing system stability, is of greater concern to users. Higher > > latency can lead to request errors, impacting the user experience. > > Therefore, maintaining stability, even at the cost of slightly lower > > throughput, is preferable to experiencing higher throughput with > > unstable performance. > > > >> > >> But the latency of free_pcppages_bulk() and page allocation in other > >> processes is a problem. And your configuration can help it. > >> > >> Another choice is to change CONFIG_PCP_BATCH_SCALE_MAX. In that way, > >> you have a normal PCP size (high) but smaller PCP batch. I guess that > >> may help both latency and throughput in your system. Could you give i= t > >> a try? > > > > Currently, our kernel does not include the CONFIG_PCP_BATCH_SCALE_MAX > > configuration option. However, I've observed your recent improvements > > to the zone->lock mechanism, particularly commit 52166607ecc9 ("mm: > > restrict the pcp batch scale factor to avoid too long latency"), which > > has prompted me to experiment with manually setting the > > pcp->free_factor to zero. While this adjustment provided some > > improvement, the results were not as significant as I had hoped. > > > > BTW, perhaps we should consider the implementation of a sysctl knob as > > an alternative to CONFIG_PCP_BATCH_SCALE_MAX? This would allow users > > to more easily adjust it. > > If you cannot test upstream behavior, it's hard to make changes to > upstream. Could you find a way to do that? I'm afraid I can't run an upstream kernel in our production environment :( Lots of code changes have to be made. > > IIUC, PCP high will not influence allocate/free latency, PCP batch will. It seems incorrect. Looks at the code in free_unref_page_commit() : if (pcp->count >=3D high) { free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), pcp, pindex); } And nr_pcp_free() : min_nr_free =3D batch; max_nr_free =3D high - batch; batch =3D clamp_t(int, pcp->free_count, min_nr_free, max_nr_free); return batch; The 'batch' is not a fixed value but changed dynamically, isn't it ? > Your configuration will influence PCP batch via configuration PCP high. > So, it may be reasonable to find a way to adjust PCP batch directly. > But, we need practical requirements and test methods first. > --=20 Regards Yafang