From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 399B7C4321E for ; Tue, 29 Nov 2022 13:26:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6992A6B0073; Tue, 29 Nov 2022 08:26:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 64A0C6B0075; Tue, 29 Nov 2022 08:26:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 512BE6B0073; Tue, 29 Nov 2022 08:26:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 382066B0074 for ; Tue, 29 Nov 2022 08:26:49 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 53DCA160F30 for ; Tue, 29 Nov 2022 13:19:12 +0000 (UTC) X-FDA: 80186535744.11.C8CA984 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf10.hostedemail.com (Postfix) with ESMTP id 44AB2C0014 for ; Tue, 29 Nov 2022 13:19:09 +0000 (UTC) Received: from dggpeml500023.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4NM2w010vqzRpQ7; Tue, 29 Nov 2022 21:18:24 +0800 (CST) Received: from dggpeml500005.china.huawei.com (7.185.36.59) by dggpeml500023.china.huawei.com (7.185.36.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 29 Nov 2022 21:19:03 +0800 Received: from [10.174.178.155] (10.174.178.155) by dggpeml500005.china.huawei.com (7.185.36.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 29 Nov 2022 21:19:02 +0800 Subject: Re: [QUESTION] memcg page_counter seems broken in MADV_DONTNEED with THP enabled To: Michal Hocko , Yang Shi CC: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "akpm@linux-foundation.org" , , , , , , , , Matthew Wilcox , , "Wangkefeng (OS Kernel Lab)" , "zhangxiaoxu (A)" , , Lu Jialin References: <8a2f2644-71d0-05d7-49d8-878aafa99652@huawei.com> From: Yongqiang Liu Message-ID: Date: Tue, 29 Nov 2022 21:19:02 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="gbk"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.178.155] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpeml500005.china.huawei.com (7.185.36.59) X-CFilter-Loop: Reflected ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of liuyongqiang13@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=liuyongqiang13@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669727950; a=rsa-sha256; cv=none; b=6T+sdf3bGIVtwojuUPaFPvXYFQb1vhJyEq/Lh/e96ciOyadFgvMZ5fvEhe3h1c6Em6InoF q8AweeOn8I68VA8n7qaOCN2z7eSV1MrTPr4WVefnOMnxzPvN5hqjVWU9urDWa+wbKuWDuv v4ePlwJo69FJFCuX/dE7WFhHT9QanXY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669727950; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uf50nkS5Z2Juv2+mMo1r4Wdz2OmOQv/FPTNV5oU5e6Y=; b=CTu7VyBggoefTPBvXPzbHafdO10v11u2VoxtcD+AJOU8eHbffbVpCMsUv6RBnvVevapc8i /a9BS/XtgARzfL9lg/sSmPOXwYmObsc/EBt19RjlppL310kQvEb/TuVBW9Y3Xtkbd0BSEL F4b1dLNmOrgkAR2Qj4fPmkJyN8nxcKA= Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of liuyongqiang13@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=liuyongqiang13@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 44AB2C0014 X-Stat-Signature: o1j4qfatbh8ma4rbtui6uaoeiu188ejc X-Rspam-User: X-HE-Tag: 1669727949-635079 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: ÔÚ 2022/11/29 16:10, Michal Hocko дµÀ: > On Mon 28-11-22 12:01:37, Yang Shi wrote: >> On Sat, Nov 26, 2022 at 5:10 AM Yongqiang Liu wrote: >>> Hi, >>> >>> We use mm_counter to how much a process physical memory used. Meanwhile, >>> page_counter of a memcg is used to count how much a cgroup physical >>> memory used. >>> If a cgroup only contains a process, they looks almost the same. But with >>> THP enabled, sometimes memory.usage_in_bytes in memcg may be twice or >>> more than rss >>> in proc/[pid]/smaps_rollup as follow: > [...] >>> node_page_stat which shows in meminfo was also decreased. the >>> __split_huge_pmd >>> seems free no physical memory unless the total THP was free.I am >>> confused which >>> one is the true physical memory used of a process. >> This should be caused by the deferred split of THP. When MADV_DONTNEED >> is called on the partial of the map, the huge PMD is split, but the >> THP itself will not be split until the memory pressure is hit (global >> or memcg limit). So the unmapped sub pages are actually not freed >> until that point. So the mm counter is decreased due to the zapping >> but the physical pages are not actually freed then uncharged from >> memcg. > Yes, and this is not really bound to THP. Consider a page cache. It can > be accessed via syscalls when it doesn't correspondent to rss at all > while it is still charged to a memcg. Or it can be mapped and then later > unmapped so it disappear from rss while it is still charged until it > gets reclaimed by the memory pressure. Or it can be an in-memory object > that is not bound to any process life time (e.g. tmpfs). Or it can be a > kernel memory charged to a memcg which is not covered by rss because it > is either not mapped or it is unknown to rss counters. Thanks ! it's very nice to me.