From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 049C0C54FC6 for ; Tue, 3 Sep 2024 01:50:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 201EB8D0120; Mon, 2 Sep 2024 21:50:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 18B288D00E7; Mon, 2 Sep 2024 21:50:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 02BA38D0120; Mon, 2 Sep 2024 21:50:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D46CD8D00E7 for ; Mon, 2 Sep 2024 21:50:58 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 29CF98084D for ; Tue, 3 Sep 2024 01:50:58 +0000 (UTC) X-FDA: 82521748596.10.38C3888 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf08.hostedemail.com (Postfix) with ESMTP id A826116000B for ; Tue, 3 Sep 2024 01:50:54 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725328181; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CkGKBLMoxgD2AF8Frbpfwb3LYhIXn1s9iWB6jIQYnEA=; b=RmI+5EUuhYeSQKKQpII0NlK/+QuhHYIA6OPbHATHkexKJY/hNgKfzmY/3FAhBX+e0bYRti daLITYNGNEXAoKsTrDpY/HEtiXh4TpRYILU8WSniV9NRzx+nOyo7lD/odL8VJ+oUVJTX9c iN2ZfT0LAW7V/P2geb+RklVak4CJtOw= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725328181; a=rsa-sha256; cv=none; b=7JYCvU2jXFJRO6pq1TwdtQ1ZAYONTXNRusfPFMJRr7l0rmsD2I22aW5s6odjMIAbMuuqYG 0nitb+mGASCyh5JkdhpLPr3UH6RUoCAJ0Jt7Ug2zu2Nystj8fp7eJeGavQ4lqt0cwi2UAn QvxH6AfEyml85WZslZxth2f8v7rpcXI= Received: from mail.maildlp.com (unknown [172.19.163.17]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4WyT940d6qz1j7yh; Tue, 3 Sep 2024 09:50:32 +0800 (CST) Received: from kwepemg100017.china.huawei.com (unknown [7.202.181.58]) by mail.maildlp.com (Postfix) with ESMTPS id 74E5E1A0188; Tue, 3 Sep 2024 09:50:50 +0800 (CST) Received: from [10.174.178.120] (10.174.178.120) by kwepemg100017.china.huawei.com (7.202.181.58) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 3 Sep 2024 09:50:49 +0800 Message-ID: <193da117-30b8-425a-b095-6fd8aca1c987@huawei.com> Date: Tue, 3 Sep 2024 09:50:48 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird CC: , , , , , , , , Subject: Re: [PATCH] mm, proc: collect percpu free pages into the free pages Content-Language: en-US To: References: <20240830014453.3070909-1-mawupeng1@huawei.com> <87a5guh2fb.fsf@yhuang6-desk2.ccr.corp.intel.com> <2ee7cb17-9003-482c-9741-f1f51f61ab4b@huawei.com> <871q22hmga.fsf@yhuang6-desk2.ccr.corp.intel.com> From: mawupeng In-Reply-To: <871q22hmga.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.178.120] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemg100017.china.huawei.com (7.202.181.58) X-Stat-Signature: 4n5o7arfpnc61zmyndykcnz65mmmt8sb X-Rspam-User: X-Rspamd-Queue-Id: A826116000B X-Rspamd-Server: rspam02 X-HE-Tag: 1725328254-174187 X-HE-Meta: U2FsdGVkX18EiHJ45umziMqGCJiw1WCSmFDDvTyTKLAYGi4eJ0Z+y8nIm6CoK/kxzMJJc39vreSFESvUNozrvmsLDKgD0bLO5EqU0bvk/sQGnRcyXRUboY4i+Fp4U1US/Lln+/pXJiRe3bNayVYSEkeMASYTgGh46JQ6d56Lr8smvV3rmoLTYJPKHJXIA4HQ6RphOsrMRX9vAauOhSt9lnK7giEgIXO9I+FKYFgqlBD794YmwqlSwTbkYRjZtV++z2Mh/kWUfDIHxylFZrVR1tRKZ3Vz3JVCpMF9OLg00a8UZI/aWFn31/OsOtwf5wHT+zxlyY/2YopzxjpV8Fdmpot+GZmP4Qfm0KQ82yMrpv/P+mwOE667dpraYMyG/qUOVb3u1qCRmqMJ/SOwBZLAM9QS5mukTLfsRI6HPaHYIPrJ9F+3Us2+Wt4AYwh8XUH5oPhgLTb9ht1PuvouX/VpDw1iNHyjQFX9gPjDQw+wnXvIxegsXGIZrWpLp3VwgCSbGBT8++WGKb+SLafCTIMzFoxDpJOUKm3NssPgBCdZeNoUaM6AeeUOZjgAGh5ahvQYrmJaJRmQcnSsdYhjuhmW/m8eeMalbxDR/w3eM4FLbliMAIbRVrmm/Rkbu6EjdbD7LKnj36yoS7dGTJOMJ5Y8wvc5dcq6b63L7JboYSUFTzU+odlU6z3zGJcUol2qllt9P9FocwwfEGyeHDnCjMHvPjDlCnlsPYqiuYREpYOgbwKLmK/DuLvHxWRVHQN2DNxhLw6d4InPUiLXXW9QmihyIw+4vRWLnZ2nRFeLKYqjDtMPpnVCmybX1P02Amf9eFJt8AyWIHgvsAwh6m+kePThNVN1t3Zp6SKYA15o39ngVJFYXVZp4bRHUSh0rCL03gyZ/G0IB74hco+dyslN3w63YFEPCaPz/7tZFj+0q2VYOJ3wTxobW9DZeLBac+4weLa48A1BG90d5iKpb5DW/eW 3Tn08iKF 7qojo/Lfdhb7olhXcddHDiS4y2vJQCy8ZscRwzFem5sQYMt/kU2cK5s550anspHaK5pBr/zsS2V7wgLE3BxoVMZESAhW5/NMttVIN3TZBo8C2By9/l5iOEsZN2U6cKXpcX2esN6VUpHVN3bDIEHQCbl0sqiTLqI7sX1Xm1ZRhySdX3TB9HU1y9RZAJtxp2dGcCM7+TqpUgpK/TUAUMpGX+/dnlw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/9/2 9:29, Huang, Ying wrote: > mawupeng writes: > >> On 2024/8/30 15:53, Huang, Ying wrote: >>> Hi, Wupeng, >>> >>> Wupeng Ma writes: >>> >>>> From: Ma Wupeng >>>> >>>> The introduction of Per-CPU-Pageset (PCP) per zone aims to enhance the >>>> performance of the page allocator by enabling page allocation without >>>> requiring the zone lock. This kind of memory is free memory however is >>>> not included in Memfree or MemAvailable. >>>> >>>> With the support of higt-order pcp and pcp auto-tuning, the size of the >>>> pages in this list has become a matter of concern due to the following >>>> patches: >>>> >>>> 1. Introduction of Order 1~3 and PMD level PCP in commit 44042b449872 >>>> ("mm/page_alloc: allow high-order pages to be stored on the per-cpu >>>> lists"). >>>> 2. Introduction of PCP auto-tuning in commit 90b41691b988 ("mm: add >>>> framework for PCP high auto-tuning"). >>> >>> With PCP auto-tuning, the idle pages in PCP will be freed to buddy after >>> some time (may be as long as tens seconds in some cases). >> >> Thank you for the detailed explanation regarding PCP auto-tuning. If the >> PCP pages are freed to the buddy after a certain period due to auto-tuning, >> it's possible that there is no direct association between PCP auto-tuning >> and the increase in the PCP count as indicated below, especially if no >> actual tasks have commenced after booting. The primary reason for the >> increase might be more orders and a surplus of CPUs. >> >>> >>>> Which lead to the total amount of the pcp can not be ignored just after >>>> booting without any real tasks for as the result show below: >>>> >>>> w/o patch with patch diff diff/total >>>> MemTotal: 525424652 kB 525424652 kB 0 kB 0% >>>> MemFree: 517030396 kB 520134136 kB 3103740 kB 0.6% >>>> MemAvailable: 515837152 kB 518941080 kB 3103928 kB 0.6% >> >> We do the following experiments which make the pcp amount even bigger: >> 1. alloc 8G of memory in all of the 600+ cpus >> 2. kill all the above user tasks >> 3. waiting for 36h >> >> the pcp amount 6161097(24644M) which 4.6% of the total 512G memory. >> >> >>>> >>>> On a machine with 16 zones and 600+ CPUs, prior to these commits, the PCP >>>> list contained 274368 pages (1097M) immediately after booting. In the >>>> mainline, this number has increased to 3003M, marking a 173% increase. >>>> >>>> Since available memory is used by numerous services to determine memory >>>> pressure. A substantial PCP memory volume leads to an inaccurate estimation >>>> of available memory size, significantly impacting the service logic. >>>> >>>> Remove the useless CONFIG_HIGMEM in si_meminfo_node since it will always >>>> false in is_highmem_idx if config is not enabled. >>>> >>>> Signed-off-by: Ma Wupeng >>>> Signed-off-by: Liu Shixin >>> >>> This has been discussed before in the thread of the previous version, >>> better to refer to it and summarize it. >>> >>> [1] https://lore.kernel.org/linux-mm/YwSGqtEICW5AlhWr@dhcp22.suse.cz/ >> >> As Michal Hocko mentioned in previous discussion: >> 1. If it is a real problem? >> 2. MemAvailable is documented as available without swapping, however >> pcp need to drain reclaim. >> >> 1. Since available memory is used by numerous services to determine memory >> pressure. A substantial PCP memory volume leads to an inaccurate estimation >> of available memory size, significantly impacting the service logic. >> 2. MemAvailable here do seems wired. There is no reason to drain pcp to >> drop clean page cache As Michal Hocko already pointed in this post, drain >> clean page cache is much cheaper than drain remote pcp.Any idea on this? > > Drain remote PCP may be not that expensive now after commit 4b23a68f9536 > ("mm/page_alloc: protect PCP lists with a spinlock"). No IPI is needed > to drain the remote PCP. This looks really great, we can think a way to drop pcp before goto slowpath before swap. > >> [1] https://lore.kernel.org/linux-mm/ZWRYZmulV0B-Jv3k@tiehlicka/ > > -- > Best Regards, > Huang, Ying >