From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 478F2CA101E for ; Mon, 2 Sep 2024 01:33:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE8F76B0121; Sun, 1 Sep 2024 21:33:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B719F6B0123; Sun, 1 Sep 2024 21:33:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9EBB48D0052; Sun, 1 Sep 2024 21:33:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7C0D56B0121 for ; Sun, 1 Sep 2024 21:33:37 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 06081C0C39 for ; Mon, 2 Sep 2024 01:33:37 +0000 (UTC) X-FDA: 82518076074.24.C432F42 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) by imf21.hostedemail.com (Postfix) with ESMTP id 4E8B11C0007 for ; Mon, 2 Sep 2024 01:33:34 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=irPCEMfA; spf=pass (imf21.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725240792; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1XJh9QPIRjR7+xpvepjLVo2c4zFXQ+K1R2cA/f9kIM8=; b=PIV+AsYtiLjJQu0WTblcYlejqwhJF88EqSII8ECjccCOLhwtxwTt86XrGFiTT8hnT9enhm 2+PPRasD+EIrRmscgWtkMg7dXiuGWfTVTrw0vXSM2AXRt6byB8nFnTFXKZIHKC3vwhljz+ d32f6utV6Ly5W92NbmH0EW+JujiB0Ic= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=irPCEMfA; spf=pass (imf21.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725240792; a=rsa-sha256; cv=none; b=uj+Ucn+V4wVZkhrS0Q3/frAd4IX4/IwJS1Am0yqgyuDnHerPZketnu+FFDHkWfhsmZthE1 VKpSDUpTQ6E/xVVpv83jMDQ6h40S/5YXj2v+bH/287b6M2+NHnjwYbAaEDwXa4dmIdz0c8 oO3oy19YCQB6pdOHQy9nbwNDjGUe4Lk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725240814; x=1756776814; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=+nCv4P0jKRaRO8VOTVEVucrRgtN0A2Xsa2rrL/fSubg=; b=irPCEMfAYsVOt2WaAplGS1DaT1xc0fbA3BMvFBNjoMbZViyIrUIYBG8y euOIWIU/YCFKl8k5t9Scw5w5sYA1B9bnB6NQAuasHapGRxomhCze1NXvm S7JC0ieI/Y1DvjfMbwAq+Tt1AERdt3RT8TsweWDs/3qNmgJsJi0HQHMcW NdiByiaomueLzJQphmAURIbpDPMeXSB7gLqRg6KSosNJdXi5pRRYqnKen rli6a0b804aoaKCZ1NLLmFUInAyAOVGbRNV3a1Oz8t3gPwdeGDwd3eOla 5NcGcepHxhN2TyHFMbnWG05yml0anO437OaeUO8609p5ZLsItpNn+Sgnz Q==; X-CSE-ConnectionGUID: 0RCu5GsGSTud8wPvEf7Hkg== X-CSE-MsgGUID: 9k6OTPmWTUWEx8mInem+og== X-IronPort-AV: E=McAfee;i="6700,10204,11182"; a="23943013" X-IronPort-AV: E=Sophos;i="6.10,194,1719903600"; d="scan'208";a="23943013" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Sep 2024 18:33:33 -0700 X-CSE-ConnectionGUID: qjTeZ70BSRK0RWwG/D41QQ== X-CSE-MsgGUID: KxDCcF24QbuvmFoSNqS8KA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,194,1719903600"; d="scan'208";a="69262553" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by ORVIESA003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Sep 2024 18:33:31 -0700 From: "Huang, Ying" To: mawupeng Cc: , , , , , , , Subject: Re: [PATCH] mm, proc: collect percpu free pages into the free pages In-Reply-To: <2ee7cb17-9003-482c-9741-f1f51f61ab4b@huawei.com> (mawupeng's message of "Mon, 2 Sep 2024 09:11:58 +0800") References: <20240830014453.3070909-1-mawupeng1@huawei.com> <87a5guh2fb.fsf@yhuang6-desk2.ccr.corp.intel.com> <2ee7cb17-9003-482c-9741-f1f51f61ab4b@huawei.com> Date: Mon, 02 Sep 2024 09:29:57 +0800 Message-ID: <871q22hmga.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Stat-Signature: 1mmdn7e7648dr1ebw8bsxrgemrh6umq7 X-Rspamd-Queue-Id: 4E8B11C0007 X-Rspamd-Server: rspam11 X-HE-Tag: 1725240814-776902 X-HE-Meta: U2FsdGVkX18P+ltT9Kf0lUhcRn8yGCMEW5U4hJbBHSArXWLxuAi9+PYrexAw+HroozV0VkWtMPT29kBxg0sKhXHM0jO6v0yTNt1wcqNe7cLhP9djwgsyHpKqMscOIGGT/EhnNgL1gH75CvdZ4pO3RE2VLAOOuWQzVgiMPItkhQrSjZy8KlKWUXmhT7W0dooZsY3QnQPZwqaAF0q1PfCF3Q+C8Um5XAwNZzaW1kVqJNcm+wgHoW08/YLX1+No2fS3b6Jt7fIXWvJulNrlTUU0pUhJqJtWsLSBj/PEy5qaCc4CgoNIczy8vu2yRs+oR15NUQ0+TN1nQ7C9Bbd0kSwPpLgGREDqBKZqRkF5i5FQCNGxH2GJuSlwKHPNgMrqQfjujBPxL2nL/BBCNrTScT3IzsbTB8NmjE3CPOYXiYwMEofowbNxC8+tjR8ku4uDv03gtfNLjg7NCxZBvboec5uvKg2oS1YdAdQ4gHqKgW3b+JWkXwimg5eSp3/byDRTz444BDfCFTe5BR/Gn+e3YmiXOOeIOP0qLi6zn4sj1fzOQ6bfXmE0C522pz5CA47A1G8pJtRTgcZQpWI+aHq2WFbjbAr2YmYk+LTcT1RqVwlVdgO6RcLKqDdSGw1mpfHY2xHty4p7Avn1gxMQ1M1r5q2rim/dMqc3LagRmnUYA2jlzgVIkTT4HXJN171CZELu2XdoQwVCTSV5rHDUrVtwOuIbYKJwGudNV5EBkXQYsNZGjqUiCuWjmrhUEAkleJJ87Sa8m4XNB0Oj4g75rxg6BtZ8/NaxPsbWIhLho9idB6pD2F0A6e+R28RCF4crSaj6rOTgM3Rx9f+NmE/D3OgDw1xBPkGK0gh3Q04WWZUu8jBtrEdO7CJ0bRUbUifRh2eE/+PSnczLFqNT4QFJyo8+jtDJqI+mdztRgH/TV8LE0rYC5YeK1W1Rgt5wv3rwkgC+HLCMDuIW/pDp5RoEtWIMkco R3/IJrLI uabE+t+MtyPprQwXM3GTjltOCCty+ffpKiBBQMefU2IHDT7yUZGVgayyWSSMXFPTKeAdkQ0wrrmjLYbGZVuhJ1Fcub2+u3M4xPjUUTKjj5njIlUyxj2ojiIOuggG756n8JLJFLOuKKL4cratFesCaH52rX41jWETgSjJ/h6GTQ+ks/YScgA10C4D05JeAuKubBi0Ut9sUKOAGPitFmDkg+vQUkA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: mawupeng writes: > On 2024/8/30 15:53, Huang, Ying wrote: >> Hi, Wupeng, >> >> Wupeng Ma writes: >> >>> From: Ma Wupeng >>> >>> The introduction of Per-CPU-Pageset (PCP) per zone aims to enhance the >>> performance of the page allocator by enabling page allocation without >>> requiring the zone lock. This kind of memory is free memory however is >>> not included in Memfree or MemAvailable. >>> >>> With the support of higt-order pcp and pcp auto-tuning, the size of the >>> pages in this list has become a matter of concern due to the following >>> patches: >>> >>> 1. Introduction of Order 1~3 and PMD level PCP in commit 44042b449872 >>> ("mm/page_alloc: allow high-order pages to be stored on the per-cpu >>> lists"). >>> 2. Introduction of PCP auto-tuning in commit 90b41691b988 ("mm: add >>> framework for PCP high auto-tuning"). >> >> With PCP auto-tuning, the idle pages in PCP will be freed to buddy after >> some time (may be as long as tens seconds in some cases). > > Thank you for the detailed explanation regarding PCP auto-tuning. If the > PCP pages are freed to the buddy after a certain period due to auto-tuning, > it's possible that there is no direct association between PCP auto-tuning > and the increase in the PCP count as indicated below, especially if no > actual tasks have commenced after booting. The primary reason for the > increase might be more orders and a surplus of CPUs. > >> >>> Which lead to the total amount of the pcp can not be ignored just after >>> booting without any real tasks for as the result show below: >>> >>> w/o patch with patch diff diff/total >>> MemTotal: 525424652 kB 525424652 kB 0 kB 0% >>> MemFree: 517030396 kB 520134136 kB 3103740 kB 0.6% >>> MemAvailable: 515837152 kB 518941080 kB 3103928 kB 0.6% > > We do the following experiments which make the pcp amount even bigger: > 1. alloc 8G of memory in all of the 600+ cpus > 2. kill all the above user tasks > 3. waiting for 36h > > the pcp amount 6161097(24644M) which 4.6% of the total 512G memory. > > >>> >>> On a machine with 16 zones and 600+ CPUs, prior to these commits, the PCP >>> list contained 274368 pages (1097M) immediately after booting. In the >>> mainline, this number has increased to 3003M, marking a 173% increase. >>> >>> Since available memory is used by numerous services to determine memory >>> pressure. A substantial PCP memory volume leads to an inaccurate estimation >>> of available memory size, significantly impacting the service logic. >>> >>> Remove the useless CONFIG_HIGMEM in si_meminfo_node since it will always >>> false in is_highmem_idx if config is not enabled. >>> >>> Signed-off-by: Ma Wupeng >>> Signed-off-by: Liu Shixin >> >> This has been discussed before in the thread of the previous version, >> better to refer to it and summarize it. >> >> [1] https://lore.kernel.org/linux-mm/YwSGqtEICW5AlhWr@dhcp22.suse.cz/ > > As Michal Hocko mentioned in previous discussion: > 1. If it is a real problem? > 2. MemAvailable is documented as available without swapping, however > pcp need to drain reclaim. > > 1. Since available memory is used by numerous services to determine memory > pressure. A substantial PCP memory volume leads to an inaccurate estimation > of available memory size, significantly impacting the service logic. > 2. MemAvailable here do seems wired. There is no reason to drain pcp to > drop clean page cache As Michal Hocko already pointed in this post, drain > clean page cache is much cheaper than drain remote pcp.Any idea on this? Drain remote PCP may be not that expensive now after commit 4b23a68f9536 ("mm/page_alloc: protect PCP lists with a spinlock"). No IPI is needed to drain the remote PCP. > [1] https://lore.kernel.org/linux-mm/ZWRYZmulV0B-Jv3k@tiehlicka/ -- Best Regards, Huang, Ying