From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7346FC54FC6 for ; Mon, 2 Sep 2024 01:12:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D36118D005B; Sun, 1 Sep 2024 21:12:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE5C88D0052; Sun, 1 Sep 2024 21:12:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BAD9F8D005B; Sun, 1 Sep 2024 21:12:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9BECB8D0052 for ; Sun, 1 Sep 2024 21:12:08 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 3B27C141829 for ; Mon, 2 Sep 2024 01:12:08 +0000 (UTC) X-FDA: 82518021936.09.15620D7 Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) by imf29.hostedemail.com (Postfix) with ESMTP id C120112000A for ; Mon, 2 Sep 2024 01:12:04 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725239402; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5Iq/jL9loGogpwyEFmaRdzA1RGU2DexO31krVlgfuEs=; b=NvPkKcHLwC8ZpMpbzovadpUBE7YMTrXjaoGNdXuSeJk20R8A5m9YZsDe/aMI1J/VnWZ5TX mdNXRg6ZfZxPtplmhx3rkjnXYnxjlLSsHIq2EZWJbEYHNM7unaln4YSI9JgDzANz8/wuYU gUSif8TGWAZE4Rwn+auGnX/eK4wBtG0= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725239402; a=rsa-sha256; cv=none; b=f3td9Sw3lOnIWC8/tsueSCE3PrzOrjCr7pnH0GnFyhkHK309c1hBjJOboyhPpWRWG3KuPw CTYJDierg+A+fLM73//llOi7czN6SPkPFdXLu2+pj2Uqt0GeIPLHivrQQ+j08Qk3RrWoAz 5/UPKxvIu8JY9QoLxTA5O/ZX7LhtHfc= Received: from mail.maildlp.com (unknown [172.19.163.17]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4WxrLk3lQkz1S9dp; Mon, 2 Sep 2024 09:11:42 +0800 (CST) Received: from kwepemg100017.china.huawei.com (unknown [7.202.181.58]) by mail.maildlp.com (Postfix) with ESMTPS id 58CEA1A0188; Mon, 2 Sep 2024 09:12:00 +0800 (CST) Received: from [10.174.178.120] (10.174.178.120) by kwepemg100017.china.huawei.com (7.202.181.58) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 2 Sep 2024 09:11:59 +0800 Message-ID: <2ee7cb17-9003-482c-9741-f1f51f61ab4b@huawei.com> Date: Mon, 2 Sep 2024 09:11:58 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird CC: , , , , , , , , Subject: Re: [PATCH] mm, proc: collect percpu free pages into the free pages Content-Language: en-US To: References: <20240830014453.3070909-1-mawupeng1@huawei.com> <87a5guh2fb.fsf@yhuang6-desk2.ccr.corp.intel.com> From: mawupeng In-Reply-To: <87a5guh2fb.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.178.120] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemg100017.china.huawei.com (7.202.181.58) X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C120112000A X-Stat-Signature: aqffkf16pp9fa1jjtgxq6fcubsbapzph X-Rspam-User: X-HE-Tag: 1725239524-137467 X-HE-Meta: U2FsdGVkX19nbVphZjLEj/qJagL8Lj+uJeKpKi73H1hxJHPaULa3uTJjkARPYMEPFmIm7G1cPUMszC8zqhbhgndpbVm77+vnifP5vvOVzWvlkiCe5/ECSOFaeSWDiu/WZbwDzR6qo9LpCQNSrLegeM4jfj8PbJdOTVy+dK4hRELT/twp+dkJCcS0Lin8qkXGONYJutBmQJ5tq+4PhqlJZ/tfg1Cs9p9z2x8/8YyHvS1Ee8bYWg4pQKkrDg6F4BccMigieV4+xbKnKFvHsbHY93YPVGF/EE65y7nJ0iSf0wkMy4MHFDB82PsC0ugcGq+gs8ERZn0DVwxt/EcevMfM/lo9xRdUQjKIF1Jhoz4VHNyl6WReZs1hXdKIM/dMoRT0MjivqvTR1vlx2JF6PXlRumXXAM6xLsSTp1RW9vKwcivNwsePJyBSLK5pD81KARiTANsn/lorRsFYQiKzHl/2LdwYe81KMKfukX+g88bW62CjCt8BM82FzbSlKJRuzUXtAzapkWDwyD9gOhvzCxQ/OX+9J+TJ/L2C/gnLZPJeJaVx2S4jjDZ3h3BzatD4XBg4a5/lc+Q+G57PKdNvkm19++o1m8HLs46uuNEpJZKkoaElBpK8n7v9v8QZGYkqwD09HfwWR0aCWDO9SKXxev50+GNGv/C+kY6o0/M9pnbWHM2DglHTG7oAWgChF2qVUo65XxlSxx0+r4Tn3rsWfnmx92XREET5PIn3ZA0XY0/T6nfcgjGgGaDtHYfYORr2KZazPAYX7XAnCuK+n7laVVIL50vr9RvHNhQ+VqFR8x9etstIByEAZz0f4FxoorlyGn0NzNoB73h6H2pUk2jPnrVyBnpTC7EZTpRNUAp8oadREN66ZGK0eEQO8M6e1K3f6iRp5c0VWLkT+bCOxNqKtfU8yYwZUz4GACE8c3APW6ArOm0pKPjh4HAf1C/LyptqxOAeVTccqekKNKsy+w56tX5 oam1SryY filDWJ7h1YUI4edeDmr914906nCYkHpKWHnDVxGYLp1JOz03Yy1/dD5uLP4WqJnhkrlHAgYb4TCiH52VPUVLl62V/QVA7/7S6bFbcK9F+jnnvpotI+uBTPyqWvs6ASfJNhcC5Zx7lb7EH+F9lXpCACOQon3FRH2pcNEbP90Pqu9v9OjNvEEkXB4skKHJOJXLOTeiQQ6bdlBV6LsfUthsfZCYa4g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/8/30 15:53, Huang, Ying wrote: > Hi, Wupeng, > > Wupeng Ma writes: > >> From: Ma Wupeng >> >> The introduction of Per-CPU-Pageset (PCP) per zone aims to enhance the >> performance of the page allocator by enabling page allocation without >> requiring the zone lock. This kind of memory is free memory however is >> not included in Memfree or MemAvailable. >> >> With the support of higt-order pcp and pcp auto-tuning, the size of the >> pages in this list has become a matter of concern due to the following >> patches: >> >> 1. Introduction of Order 1~3 and PMD level PCP in commit 44042b449872 >> ("mm/page_alloc: allow high-order pages to be stored on the per-cpu >> lists"). >> 2. Introduction of PCP auto-tuning in commit 90b41691b988 ("mm: add >> framework for PCP high auto-tuning"). > > With PCP auto-tuning, the idle pages in PCP will be freed to buddy after > some time (may be as long as tens seconds in some cases). Thank you for the detailed explanation regarding PCP auto-tuning. If the PCP pages are freed to the buddy after a certain period due to auto-tuning, it's possible that there is no direct association between PCP auto-tuning and the increase in the PCP count as indicated below, especially if no actual tasks have commenced after booting. The primary reason for the increase might be more orders and a surplus of CPUs. > >> Which lead to the total amount of the pcp can not be ignored just after >> booting without any real tasks for as the result show below: >> >> w/o patch with patch diff diff/total >> MemTotal: 525424652 kB 525424652 kB 0 kB 0% >> MemFree: 517030396 kB 520134136 kB 3103740 kB 0.6% >> MemAvailable: 515837152 kB 518941080 kB 3103928 kB 0.6% We do the following experiments which make the pcp amount even bigger: 1. alloc 8G of memory in all of the 600+ cpus 2. kill all the above user tasks 3. waiting for 36h the pcp amount 6161097(24644M) which 4.6% of the total 512G memory. >> >> On a machine with 16 zones and 600+ CPUs, prior to these commits, the PCP >> list contained 274368 pages (1097M) immediately after booting. In the >> mainline, this number has increased to 3003M, marking a 173% increase. >> >> Since available memory is used by numerous services to determine memory >> pressure. A substantial PCP memory volume leads to an inaccurate estimation >> of available memory size, significantly impacting the service logic. >> >> Remove the useless CONFIG_HIGMEM in si_meminfo_node since it will always >> false in is_highmem_idx if config is not enabled. >> >> Signed-off-by: Ma Wupeng >> Signed-off-by: Liu Shixin > > This has been discussed before in the thread of the previous version, > better to refer to it and summarize it. > > [1] https://lore.kernel.org/linux-mm/YwSGqtEICW5AlhWr@dhcp22.suse.cz/ As Michal Hocko mentioned in previous discussion: 1. If it is a real problem? 2. MemAvailable is documented as available without swapping, however pcp need to drain reclaim. 1. Since available memory is used by numerous services to determine memory pressure. A substantial PCP memory volume leads to an inaccurate estimation of available memory size, significantly impacting the service logic. 2. MemAvailable here do seems wired. There is no reason to drain pcp to drop clean page cache As Michal Hocko already pointed in this post, drain clean page cache is much cheaper than drain remote pcp.Any idea on this? [1] https://lore.kernel.org/linux-mm/ZWRYZmulV0B-Jv3k@tiehlicka/ > > -- > Best Regards, > Huang, Ying >