From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 956C4C32772 for ; Tue, 23 Aug 2022 12:46:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04EBD6B0073; Tue, 23 Aug 2022 08:46:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F3FF18D0002; Tue, 23 Aug 2022 08:46:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB9838D0001; Tue, 23 Aug 2022 08:46:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CAE3D6B0073 for ; Tue, 23 Aug 2022 08:46:49 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8940EA0985 for ; Tue, 23 Aug 2022 12:46:49 +0000 (UTC) X-FDA: 79830831738.20.C213ACE Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf22.hostedemail.com (Postfix) with ESMTP id 43A8EC0036 for ; Tue, 23 Aug 2022 12:46:48 +0000 (UTC) Received: from dggpemm500022.china.huawei.com (unknown [172.30.72.55]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4MBpmg2d4Cz1N7LJ; Tue, 23 Aug 2022 20:43:15 +0800 (CST) Received: from dggpemm100009.china.huawei.com (7.185.36.113) by dggpemm500022.china.huawei.com (7.185.36.162) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Tue, 23 Aug 2022 20:46:44 +0800 Received: from [10.174.179.24] (10.174.179.24) by dggpemm100009.china.huawei.com (7.185.36.113) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Tue, 23 Aug 2022 20:46:43 +0800 Subject: Re: [PATCH -next v2] mm, proc: collect percpu free pages into the free pages To: Michal Hocko , Andrew Morton References: <20220822023311.909316-1-liushixin2@huawei.com> <20220822033354.952849-1-liushixin2@huawei.com> <20220822141207.24ff7252913a62f80ea55e90@linux-foundation.org> CC: Greg Kroah-Hartman , huang ying , Aaron Lu , Dave Hansen , Jesper Dangaard Brouer , Vlastimil Babka , Kemi Wang , "Kefeng Wang" , , From: Liu Shixin Message-ID: <6b2977fc-1e4a-f3d4-db24-7c4699e0773f@huawei.com> Date: Tue, 23 Aug 2022 20:46:43 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.179.24] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemm100009.china.huawei.com (7.185.36.113) X-CFilter-Loop: Reflected ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf22.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=liushixin2@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661258809; a=rsa-sha256; cv=none; b=GAGQrzzrgrujyVhwZSMJXw07fx3C+JrF2zbbnAEYTUKbkWtqqthCw++VpMHfSqxr3Ajvfl pjhkAFA3Xw8UhrB4F9yoa2u/Ny8aSq+EzsaUA58aLFd2Dcxp/FcpVHFtMwFtMJIrGpeez+ QkkQUHxr1hMkDErM7DCTits+qvY1XPU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661258809; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LFcRzkpUMeDK7HpFn5hmCXNssBvVmQZ8yq9wvpKMB4w=; b=B+bDkifgkWujVAtOnrQ/ayB2YXCoVkFJkhpFdoRg0Zl0Fx9XKyewewGkVb26CfJcNot6bk idiT+gOnmwA5CuYI1QuFoZ+kAWqj6ufJvkOMG+YjJscrvSQpBDYxMGNA7ImmzU52nkhQZu 4H/JWD8T9osy6aDOJ3LTFh8DCa0EK4Y= X-Stat-Signature: 4kfrid4xry96ir7nnywwem5s6ixafcea X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 43A8EC0036 Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf22.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=liushixin2@huawei.com X-HE-Tag: 1661258808-796312 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022/8/23 15:50, Michal Hocko wrote: > On Mon 22-08-22 14:12:07, Andrew Morton wrote: >> On Mon, 22 Aug 2022 11:33:54 +0800 Liu Shixin wrote: >> >>> The page on pcplist could be used, but not counted into memory free or >>> avaliable, and pcp_free is only showed by show_mem() for now. Since commit >>> d8a759b57035 ("mm, page_alloc: double zone's batchsize"), there is a >>> significant decrease in the display of free memory, with a large number >>> of cpus and zones, the number of pages in the percpu list can be very >>> large, so it is better to let user to know the pcp count. >>> >>> On a machine with 3 zones and 72 CPUs. Before commit d8a759b57035, the >>> maximum amount of pages in the pcp lists was theoretically 162MB(3*72*768KB). >>> After the patch, the lists can hold 324MB. It has been observed to be 114MB >>> in the idle state after system startup in practice(increased 80 MB). >>> >> Seems reasonable. > I have asked in the previous incarnation of the patch but haven't really > received any answer[1]. Is this a _real_ problem? The absolute amount of > memory could be perceived as a lot but is this really noticeable wrt > overall memory on those systems? This may not obvious when the memory is sufficient. However, as products monitor the memory to plan it. The change has caused warning. We have also considered using /proc/zoneinfo to calculate the total number of pcplists. However, we think it is more appropriate to add the total number of pcplists to free and available pages. After all, this part is also free pages. > Also the patch is accounting these pcp caches as free memory but that > can be misleading as this memory is not readily available for use in > general. E.g. MemAvailable is documented as: > An estimate of how much memory is available for starting new > applications, without swapping. > but pcp caches are drained only after direct reclaim fails which can > imply a lot of reclaim and runtime disruption. Maybe it makes more sense to add it only to free? Or handle it like page cache? > > [1] http://lkml.kernel.org/r/YwMv1A1rVNZQuuOo@dhcp22.suse.cz > >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index 033f1e26d15b..f89928d3ad4e 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -5853,6 +5853,26 @@ static unsigned long nr_free_zone_pages(int offset) >>> return sum; >>> } >>> >>> +static unsigned long nr_free_zone_pcplist_pages(struct zone *zone) >>> +{ >>> + unsigned long sum = 0; >>> + int cpu; >>> + >>> + for_each_online_cpu(cpu) >>> + sum += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count; >>> + return sum; >>> +} >>> + >>> +static unsigned long nr_free_pcplist_pages(void) >>> +{ >>> + unsigned long sum = 0; >>> + struct zone *zone; >>> + >>> + for_each_zone(zone) >>> + sum += nr_free_zone_pcplist_pages(zone); >>> + return sum; >>> +} >> Prevention of races against zone/node hotplug? > Memory hotplug doesn't remove nodes nor its zones. >