From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E4AFC52D71 for ; Fri, 9 Aug 2024 09:00:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A42DF6B009C; Fri, 9 Aug 2024 05:00:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F30E6B009E; Fri, 9 Aug 2024 05:00:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9093C6B009F; Fri, 9 Aug 2024 05:00:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 70F856B009C for ; Fri, 9 Aug 2024 05:00:28 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E6E581A0AEE for ; Fri, 9 Aug 2024 09:00:27 +0000 (UTC) X-FDA: 82432110894.02.91AB48A Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf01.hostedemail.com (Postfix) with ESMTP id EDBCA4001E for ; Fri, 9 Aug 2024 09:00:25 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf01.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723193961; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZxwCQ7fB4XGjCEXZRiZqwBK+lDvRDtMIkynsf5oitmc=; b=8QqwXEjownY2Y6LDNjIB93UT9KEUdeanKj7q3aUDpbtPsyoufyQhITcFN+1YgUFp7T0HM4 c2QwkyUrCdgDpv4JTaMjn7mPMHoAESYz0NrxVpJkB7EOkeFP/DjfxTJCBmY7Ken0nLywq0 2jw9KDv+fqXpkadLeQtijY2F0OE5xTk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723193961; a=rsa-sha256; cv=none; b=gtZzuuKxHseYCbzN/W5knUbv4/0OapE61bBJ63j2pj96Q1qrEToBPvU/cQzvpByFjdGBTK QkeB9LK6BmgYp6AZHurYGVoBxYHXs0Xi+FfAFsQO07hK20E80VOyJpj7cJd6C9+aGT6pyt iwl7EyoXPUdxK8JDvLDj8r09C0KC4qg= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf01.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 00A49FEC; Fri, 9 Aug 2024 02:00:51 -0700 (PDT) Received: from [10.57.95.64] (unknown [10.57.95.64]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 21D0E3F6A8; Fri, 9 Aug 2024 02:00:22 -0700 (PDT) Message-ID: <7e55d6eb-3384-4cc0-80ea-880ef2175121@arm.com> Date: Fri, 9 Aug 2024 10:00:21 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC 1/2] mm: collect the number of anon large folios Content-Language: en-GB To: David Hildenbrand , Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, linux-mm@kvack.org Cc: chrisl@kernel.org, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, ioworker0@gmail.com, baolin.wang@linux.alibaba.com, ziy@nvidia.com, hanchuanhua@oppo.com, Barry Song References: <20240808010457.228753-1-21cnbao@gmail.com> <20240808010457.228753-2-21cnbao@gmail.com> <537d7a30-2ad8-4c31-9ad3-ad86f1a7b519@redhat.com> From: Ryan Roberts In-Reply-To: <537d7a30-2ad8-4c31-9ad3-ad86f1a7b519@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: EDBCA4001E X-Stat-Signature: a4tcw4a95xg67iu367som1kgttphedgk X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1723194025-322405 X-HE-Meta: U2FsdGVkX19XpZzYZoNzHYYu0kUUvuXo3zoMtIhWOVhxIUnFK8rGf/kJZ+O56yDABo28dpkH1tnq+7iFiPKjAJhkE2TjSkqsnFIGak0LcCpFnfkyo6jHqF634az3nOrJNUK/ZPXuQaD8qo/bN4l5TxFROIzlJLMcOPKV0ejo/K6ywtpcuh6k+Q6jV+Voe6fRfU6G0/qGX4EvAsm3CPqSap92CeTyZWlqobh93mAE4coWrte1Dd5UwfIzIoxHdqyop/z2cxXw7mL55p+RBtZgl27Pc5mHlTPck5Vf1qiXvDUfFejaWiYo41n5tsTD/26KysI61dDNFlb0nuiAe+M0zZztkjjFaYJQNBIRlRkW89q8+V0DCSuitekRpQ1MZJyIABsVfZIWxzC2l7NaU+T8yaOVJtFPhmwW6Y34Y4x3LWiyLE5tGaS6/Kkh4hlLrLJlqqyUwcIODV14mvNVcEl0AljX1TxPcCE8o00oSWfY/K3ged1ZEzPECCDjECfiTlqRlbeVrMcMM0QcFoDkpRx7CapcUrtdBiFO5NiHZKf2zwqQ/56rG306n3exQY8/slPsnZfpIxbvOjfNVX1HtAfR8tUL6B8J6BG0edV6HAXmIblkqzDZcJ+wdhoOqxHkfJGdsTaCWBqYL7ZE8ZAXPr3d7DOr5jChy1lBj8ti+XinsIU8VVlO1LZK1j1PmDtfi7Orh4erDtH9OJJL4ifxxuPkSL8wXU0jVHO9/XmcMGEH+MPRY3YCZL6uw8PQUf4CiebDSqmeSciaQFK1/oCmhLxCTLVWMync5+v4mtXRH6bWaDQlkGlst7skJ4fc9PB9HjGHEBNiiVu8JME0+OvYnUQSXf7yxAvXCn0NPeOwjCs3om76O250QbQG0IhR5sFbEMDlXqqKXAmYj9C0bH8bDC2E3VgOqE+JKBUVAblU3GMpSxrNxiVycI1zGsgG9Eo9p+HXaMXj8SNDuPTNteocYhs GzrRehbp 3Jyrh7sxNkX59H4SIfF18UExYAMHGU1l/0xxPO6y5jHAEM1eIs87abimuluBf56O6sFIVkzBqex4SpSAuOjaSapTGOWvBQe/k5jk0hOWCpPYXjPtCIi6vLWUz7N2j9RNyq4ZODgO4zdEkowgSJpF9SL2ezWHx2YQCV6I/BRe3YBeR0K8g667Z80g2wVFkI1hrr0yyj37x2lmNycYAJwDK8Rien7en3M7BhTkEOMHmwavycIk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 09/08/2024 09:39, David Hildenbrand wrote: > On 09.08.24 10:13, Ryan Roberts wrote: >> On 08/08/2024 02:04, Barry Song wrote: >>> From: Barry Song >>> >>> When a new anonymous mTHP is added to the rmap, we increase the count. >>> We reduce the count whenever an mTHP is completely unmapped. >>> >>> Signed-off-by: Barry Song >>> --- >>>   Documentation/admin-guide/mm/transhuge.rst |  5 +++++ >>>   include/linux/huge_mm.h                    | 15 +++++++++++++-- >>>   mm/huge_memory.c                           |  2 ++ >>>   mm/rmap.c                                  |  3 +++ >>>   4 files changed, 23 insertions(+), 2 deletions(-) >>> >>> diff --git a/Documentation/admin-guide/mm/transhuge.rst >>> b/Documentation/admin-guide/mm/transhuge.rst >>> index 058485daf186..715f181543f6 100644 >>> --- a/Documentation/admin-guide/mm/transhuge.rst >>> +++ b/Documentation/admin-guide/mm/transhuge.rst >>> @@ -527,6 +527,11 @@ split_deferred >>>           it would free up some memory. Pages on split queue are going to >>>           be split under memory pressure, if splitting is possible. >>>   +anon_num >>> +       the number of anon huge pages we have in the whole system. >>> +       These huge pages could be still entirely mapped and have partially >>> +       unmapped and unused subpages. >> >> nit: "entirely mapped and have partially unmapped and unused subpages" -> >> "entirely mapped or have partially unmapped/unused subpages" >> >>> + >>>   As the system ages, allocating huge pages may be expensive as the >>>   system uses memory compaction to copy data around memory to free a >>>   huge page for use. There are some counters in ``/proc/vmstat`` to help >>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>> index e25d9ebfdf89..294c348fe3cc 100644 >>> --- a/include/linux/huge_mm.h >>> +++ b/include/linux/huge_mm.h >>> @@ -281,6 +281,7 @@ enum mthp_stat_item { >>>       MTHP_STAT_SPLIT, >>>       MTHP_STAT_SPLIT_FAILED, >>>       MTHP_STAT_SPLIT_DEFERRED, >>> +    MTHP_STAT_NR_ANON, >>>       __MTHP_STAT_COUNT >>>   }; >>>   @@ -291,14 +292,24 @@ struct mthp_stat { >>>   #ifdef CONFIG_SYSFS >>>   DECLARE_PER_CPU(struct mthp_stat, mthp_stats); >>>   -static inline void count_mthp_stat(int order, enum mthp_stat_item item) >>> +static inline void mod_mthp_stat(int order, enum mthp_stat_item item, int >>> delta) >>>   { >>>       if (order <= 0 || order > PMD_ORDER) >>>           return; >>>   -    this_cpu_inc(mthp_stats.stats[order][item]); >>> +    this_cpu_add(mthp_stats.stats[order][item], delta); >>> +} >>> + >>> +static inline void count_mthp_stat(int order, enum mthp_stat_item item) >>> +{ >>> +    mod_mthp_stat(order, item, 1); >>>   } >>> + >>>   #else >>> +static inline void mod_mthp_stat(int order, enum mthp_stat_item item, int >>> delta) >>> +{ >>> +} >>> + >>>   static inline void count_mthp_stat(int order, enum mthp_stat_item item) >>>   { >>>   } >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>> index 697fcf89f975..b6bc2a3791e3 100644 >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -578,6 +578,7 @@ DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, >>> MTHP_STAT_SHMEM_FALLBACK_CHARGE); >>>   DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT); >>>   DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED); >>>   DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); >>> +DEFINE_MTHP_STAT_ATTR(anon_num, MTHP_STAT_NR_ANON); >>>     static struct attribute *stats_attrs[] = { >>>       &anon_fault_alloc_attr.attr, >>> @@ -591,6 +592,7 @@ static struct attribute *stats_attrs[] = { >>>       &split_attr.attr, >>>       &split_failed_attr.attr, >>>       &split_deferred_attr.attr, >>> +    &anon_num_attr.attr, >>>       NULL, >>>   }; >>>   diff --git a/mm/rmap.c b/mm/rmap.c >>> index 901950200957..2b722f26224c 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1467,6 +1467,7 @@ void folio_add_new_anon_rmap(struct folio *folio, >>> struct vm_area_struct *vma, >>>       } >>>         __folio_mod_stat(folio, nr, nr_pmdmapped); >>> +    mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1); >>>   } >>>     static __always_inline void __folio_add_file_rmap(struct folio *folio, >>> @@ -1582,6 +1583,8 @@ static __always_inline void __folio_remove_rmap(struct >>> folio *folio, >>>           list_empty(&folio->_deferred_list)) >>>           deferred_split_folio(folio); >>>       __folio_mod_stat(folio, -nr, -nr_pmdmapped); >>> +    if (folio_test_anon(folio) && !atomic_read(mapped)) >> >> Agree that atomic_read() is dodgy here. >> >> Not sure I fully understand why David prefers to do the unaccounting at >> free-time though? It feels unbalanced to me to increment when first mapped but >> decrement when freed. Surely its safer to either use alloc/free or use first >> map/last map? > > Doing it when we set/clear folio->mapping is straight forward. > > Anon folios currently come to live when we first map them, and they stay that > way until we free them. > > In the future, we'll have to move that anon handling further out, when if have > to allocate anon-specific memdesc ahead of time, then, it will be clued to that > lifetime. > >> >> If using alloc/free isn't there a THP constructor/destructor that prepares the >> deferred list? (My memory may be failing me). Could we use that? > > Likely the deconstructor could work as well. Not sure if that is any better than > the freeing path where folio->mapping currently gets cleared. > > The generic constructor certainly won't work right now. That's not where the > "anon" part comes to live. > > Let's take a look how NR_FILE_THPS is handled: > > __filemap_add_folio() increments it -- when we set folio->mapping > __filemap_remove_folio() (->filemap_unaccount_folio) decrements it -- after > which we usually call page_cache_delete() to set folio->mapping = NULL; > OK got it, thanks!