From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD4CBC46467 for ; Sat, 26 Nov 2022 13:10:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E17576B0071; Sat, 26 Nov 2022 08:09:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DC7606B0073; Sat, 26 Nov 2022 08:09:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C90656B0074; Sat, 26 Nov 2022 08:09:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B85D16B0071 for ; Sat, 26 Nov 2022 08:09:59 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8C40F1A01E4 for ; Sat, 26 Nov 2022 13:09:59 +0000 (UTC) X-FDA: 80175626118.25.3C49B3B Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf04.hostedemail.com (Postfix) with ESMTP id 898D640002 for ; Sat, 26 Nov 2022 13:09:57 +0000 (UTC) Received: from dggpeml500024.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4NKBrn1y0LzHvsT; Sat, 26 Nov 2022 21:09:13 +0800 (CST) Received: from dggpeml500005.china.huawei.com (7.185.36.59) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sat, 26 Nov 2022 21:09:52 +0800 Received: from [10.174.178.155] (10.174.178.155) by dggpeml500005.china.huawei.com (7.185.36.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sat, 26 Nov 2022 21:09:51 +0800 From: Yongqiang Liu Subject: [QUESTION] memcg page_counter seems broken in MADV_DONTNEED with THP enabled To: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" CC: "akpm@linux-foundation.org" , , , , , , , , , Matthew Wilcox , , "Wangkefeng (OS Kernel Lab)" , "zhangxiaoxu (A)" , , Yongqiang Liu , Lu Jialin Message-ID: <8a2f2644-71d0-05d7-49d8-878aafa99652@huawei.com> Date: Sat, 26 Nov 2022 21:09:51 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.178.155] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpeml500005.china.huawei.com (7.185.36.59) X-CFilter-Loop: Reflected ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669468199; a=rsa-sha256; cv=none; b=O2VbeD1sPVQAeUlZXa3m4+MuczbGzW0T66eJrhXnSLgeXdC6ZR3q5bDvi2GO6/TwXNwyD5 5LIZ7inE9WRR96IWxwmX24pJ9zvYzIoC50ombk0k9XiLtfHKs9qC8Vy0peZ1LeJZ4XAPRg 3AxGc3Fu0+eSskn5iSdUzZZQwXpHh4M= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=none; spf=pass (imf04.hostedemail.com: domain of liuyongqiang13@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=liuyongqiang13@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669468199; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=TbEF7bwAMuuZyJ7UBHPhMmvJrNg07+/iIajVS0VQrV8=; b=eK3YwB4JAFRvQBCCwETq7XL3/wF37lT2GgDVGQTo0eJqArRTP99Gg9VEvytarJRRVtqzpQ 8zMu1zmg4ewZIFhv9cZu2d0ogTtYBjHplKveSp26kIzPbYUpfsqEdu/yiKkdK4gLCvyAUx QfG6VatsOq5Q+2Kz6GCuqvtpf87DIv4= X-Stat-Signature: 7btcxxb4qz7q1n17zj6rgh78gm9f77n5 X-Rspamd-Queue-Id: 898D640002 X-Rspam-User: Authentication-Results: imf04.hostedemail.com; dkim=none; spf=pass (imf04.hostedemail.com: domain of liuyongqiang13@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=liuyongqiang13@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com X-Rspamd-Server: rspam10 X-HE-Tag: 1669468197-874237 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, We use mm_counter to how much a process physical memory used. Meanwhile, page_counter of a memcg is used to count how much a cgroup physical memory used. If a cgroup only contains a process, they looks almost the same. But with THP enabled, sometimes memory.usage_in_bytes in memcg may be twice or more than rss in proc/[pid]/smaps_rollup as follow: [root@localhost sda]# cat /sys/fs/cgroup/memory/test/memory.usage_in_bytes 1080930304 [root@localhost sda]# cat /sys/fs/cgroup/memory/test/cgroup.procs 1290 [root@localhost sda]# cat /proc/1290/smaps_rollup 55ba80600000-ffffffffff601000 ---p 00000000 00:00 0                      [rollup] Rss:              500648 kB Pss:              498337 kB Shared_Clean:       2732 kB Shared_Dirty:          0 kB Private_Clean:       364 kB Private_Dirty:    497552 kB Referenced:       500648 kB Anonymous:        492016 kB LazyFree:              0 kB AnonHugePages:    129024 kB ShmemPmdMapped:        0 kB Shared_Hugetlb:        0 kB Private_Hugetlb:       0 kB Swap:                  0 kB SwapPss:               0 kB Locked:                0 kB THPeligible:    0 I have found the differences was because that __split_huge_pmd decrease the mm_counter but page_counter in memcg was not decreased with refcount of a head page is not zero. Here are the follows: do_madvise   madvise_dontneed_free     zap_page_range       unmap_single_vma         zap_pud_range           zap_pmd_range             __split_huge_pmd               __split_huge_pmd_locked                 __mod_lruvec_page_state             zap_pte_range                add_mm_rss_vec                   add_mm_counter                    -> decrease the mm_counter       tlb_finish_mmu         arch_tlb_finish_mmu           tlb_flush_mmu_free             free_pages_and_swap_cache               release_pages                 folio_put_testzero(page)            -> not zero, skip                   continue;                 __folio_put_large                   free_transhuge_page                     free_compound_page                       mem_cgroup_uncharge                         page_counter_uncharge        -> decrease the page_counter node_page_stat which shows in meminfo was also decreased. the __split_huge_pmd seems free no physical memory unless the total THP was free.I am confused which one is the true physical memory used of a process. Kind regards, Yongqiang Liu