From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90C16C3DA4A for ; Fri, 9 Aug 2024 08:27:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D5516B008C; Fri, 9 Aug 2024 04:27:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 25EA06B0092; Fri, 9 Aug 2024 04:27:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FF606B0095; Fri, 9 Aug 2024 04:27:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E65F96B008C for ; Fri, 9 Aug 2024 04:27:27 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AC8401A0ACD for ; Fri, 9 Aug 2024 08:27:27 +0000 (UTC) X-FDA: 82432027734.07.FCEF5D9 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf29.hostedemail.com (Postfix) with ESMTP id D52B3120008 for ; Fri, 9 Aug 2024 08:27:25 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723192012; a=rsa-sha256; cv=none; b=VGTbpVMDYnAkmjk6VhxUy8wlnYeT5nNTMW7c31++3XFJWjjP2uqGYaAWF4eTOwyOmTO21K Gsuj25spdovOrbkTM849Q+BiPe00QP0B4bg32G8r2UvVI03zkeEMJp+2GJ2x4TJIkf8eHD 2K4Wvac5qHqOCWP6MJcIlgA7da6t5+s= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723192012; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vyw2DN/Fgi2w4kiw1vv9joOVEoYD33ZO9PtKXH2HYDw=; b=GBq909949Yi7d8G+QujDBgc2h3VxAQpK2gN8fHDdQFf8Tgx96KsSgUs1TPdFDD7one4ArK U90Uq6HjFC+FG2+6iTG9eHzu/8NcM73dHM77yiJvrS14/7ylm+qGqSvhvr969VojO8fmjq aGCoHLI0oM05ihvovgWQBLq2U6o1a0k= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 12EF4FEC; Fri, 9 Aug 2024 01:27:51 -0700 (PDT) Received: from [10.57.95.64] (unknown [10.57.95.64]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8F4E93F6A8; Fri, 9 Aug 2024 01:27:23 -0700 (PDT) Message-ID: <1222cd76-e732-4238-9413-61843249c1e8@arm.com> Date: Fri, 9 Aug 2024 09:27:22 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC 1/2] mm: collect the number of anon large folios Content-Language: en-GB From: Ryan Roberts To: Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, linux-mm@kvack.org Cc: chrisl@kernel.org, david@redhat.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, ioworker0@gmail.com, baolin.wang@linux.alibaba.com, ziy@nvidia.com, hanchuanhua@oppo.com, Barry Song References: <20240808010457.228753-1-21cnbao@gmail.com> <20240808010457.228753-2-21cnbao@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: p6h1jkozbcmippwdacfb5c99g7xqfqie X-Rspamd-Queue-Id: D52B3120008 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1723192045-809406 X-HE-Meta: U2FsdGVkX1+BnclW87Rn9xZ52eHN65Cvbogdl3jVSV0XFY7objRTb3bPeEAEXL4MQij0fvi6GOXMv1kd4cNVBShlmtmWLlt7vdaq6icjYlEhvXBOvWRzF6R1F4tVNiggKz/lhzhQv7IQKiiHFjANDROxQvLl7rj6pm/ByQJlbwtVunF+Y6QgUJHdiWRWXCJw3MS4qCHtCAxElq3rG69Nz4zuMXOJ7R7OuMa/qLxYmGQtOX3FsbqVzoMg4Lun7QXnY4G2/JAn8Ow6kNEJepl3kLurGqywUgCK5bsW6ta6rzoNFgFMyMvHVkpSWnHTJJlfEJ9QkdcxYpTtWbCzdaJGN1nzf6srKNZO0UMMVV3QnbnUCcOE8ZvYGhBgpVSVzK+YcQ5oafEbhJVqcVGG8sE9KHKQU8fe/XjEYQbgxhUP6BI8wC5jA/Cw0vwimmvHSw8S1+lbUezjSfRA32jygvECqIi9vVFpWfgQ1P9QsHJROphe0S830GRPZSRkxl/0TuwQqW0f1z3CdiR3Ha5YImaic4qzijw3E1/VYNRxhYusLILLln63uQ65YQHKOlHIBQTkAR31vegg8gkFxFWc79euYH6wXMyK+XZ6gpwBfQGiK/PzG6HIuVOlV3hhHUKH6RezGF5ggs5/3zEb0Q4GTtd3U97Xa4Nh2K2p9ffBxa6/ssyKUtzfS/IRBznT32MpgnDOWjAQ/VbB85A+/baQq0OJbc5PyZErmO05jTEB0I9DcMKUoUwjVTHWGTmF69fOfDkPkDYjY+H9Wk+2nXXmL77jGo4ezzPmTggTSiMTTHxgp1GiVCnlnQ7ShoE1F+HU1HNUTvJvQ32hRmZqQenQX+RQvg4guH3DXV756GmbH/FQ8/a3aZh6jiXC87s3V8K6vkXr7hlw528kZ52EY/D4eDFfHL2N1Ns5MqilOQ8uaRw5dWUxfoH/o1iUjZaAnnGVP3CF5zth0txZsSGxTrF8hHY tKH/Xd+D TwFwo3O0Y1A38OR+pwSiicXo5/WD6ShYXzgXKfMbtG7uRVbaJU3COCCV89n+5wVxQYFUtcq1Aou+BsNFqTd+P8eK9MBdyRh31IfuvUtqOiTklLFu2mGdUswKRx2rtsJouI0reXxlUmzHJ4UY5CGp3ilaV9800W2omaN6fZS5XhHdbTr0gw0bhVN63hpFNN2GhXEZWq134o0Fma2oJb0Q+BjU3z51aD+FoIhILftE0pLJwFcVDSyzYUTriGQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 09/08/2024 09:13, Ryan Roberts wrote: > On 08/08/2024 02:04, Barry Song wrote: >> From: Barry Song >> >> When a new anonymous mTHP is added to the rmap, we increase the count. >> We reduce the count whenever an mTHP is completely unmapped. >> >> Signed-off-by: Barry Song >> --- >> Documentation/admin-guide/mm/transhuge.rst | 5 +++++ >> include/linux/huge_mm.h | 15 +++++++++++++-- >> mm/huge_memory.c | 2 ++ >> mm/rmap.c | 3 +++ >> 4 files changed, 23 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst >> index 058485daf186..715f181543f6 100644 >> --- a/Documentation/admin-guide/mm/transhuge.rst >> +++ b/Documentation/admin-guide/mm/transhuge.rst >> @@ -527,6 +527,11 @@ split_deferred >> it would free up some memory. Pages on split queue are going to >> be split under memory pressure, if splitting is possible. >> >> +anon_num >> + the number of anon huge pages we have in the whole system. >> + These huge pages could be still entirely mapped and have partially >> + unmapped and unused subpages. > > nit: "entirely mapped and have partially unmapped and unused subpages" -> > "entirely mapped or have partially unmapped/unused subpages" > >> + >> As the system ages, allocating huge pages may be expensive as the >> system uses memory compaction to copy data around memory to free a >> huge page for use. There are some counters in ``/proc/vmstat`` to help >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index e25d9ebfdf89..294c348fe3cc 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -281,6 +281,7 @@ enum mthp_stat_item { >> MTHP_STAT_SPLIT, >> MTHP_STAT_SPLIT_FAILED, >> MTHP_STAT_SPLIT_DEFERRED, >> + MTHP_STAT_NR_ANON, >> __MTHP_STAT_COUNT >> }; >> >> @@ -291,14 +292,24 @@ struct mthp_stat { >> #ifdef CONFIG_SYSFS >> DECLARE_PER_CPU(struct mthp_stat, mthp_stats); >> >> -static inline void count_mthp_stat(int order, enum mthp_stat_item item) >> +static inline void mod_mthp_stat(int order, enum mthp_stat_item item, int delta) >> { >> if (order <= 0 || order > PMD_ORDER) >> return; >> >> - this_cpu_inc(mthp_stats.stats[order][item]); >> + this_cpu_add(mthp_stats.stats[order][item], delta); >> +} >> + >> +static inline void count_mthp_stat(int order, enum mthp_stat_item item) >> +{ >> + mod_mthp_stat(order, item, 1); >> } >> + >> #else >> +static inline void mod_mthp_stat(int order, enum mthp_stat_item item, int delta) >> +{ >> +} >> + >> static inline void count_mthp_stat(int order, enum mthp_stat_item item) >> { >> } >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 697fcf89f975..b6bc2a3791e3 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -578,6 +578,7 @@ DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE); >> DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT); >> DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED); >> DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); >> +DEFINE_MTHP_STAT_ATTR(anon_num, MTHP_STAT_NR_ANON); Why are the user-facing and internal names different? Perhaps it would be clearer to call this nr_anon in sysfs? >> >> static struct attribute *stats_attrs[] = { >> &anon_fault_alloc_attr.attr, >> @@ -591,6 +592,7 @@ static struct attribute *stats_attrs[] = { >> &split_attr.attr, >> &split_failed_attr.attr, >> &split_deferred_attr.attr, >> + &anon_num_attr.attr, >> NULL, >> }; >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 901950200957..2b722f26224c 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1467,6 +1467,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, >> } >> >> __folio_mod_stat(folio, nr, nr_pmdmapped); >> + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1); >> } >> >> static __always_inline void __folio_add_file_rmap(struct folio *folio, >> @@ -1582,6 +1583,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >> list_empty(&folio->_deferred_list)) >> deferred_split_folio(folio); >> __folio_mod_stat(folio, -nr, -nr_pmdmapped); >> + if (folio_test_anon(folio) && !atomic_read(mapped)) > > Agree that atomic_read() is dodgy here. > > Not sure I fully understand why David prefers to do the unaccounting at > free-time though? It feels unbalanced to me to increment when first mapped but > decrement when freed. Surely its safer to either use alloc/free or use first > map/last map? > > If using alloc/free isn't there a THP constructor/destructor that prepares the > deferred list? (My memory may be failing me). Could we use that? Additionally, if we wanted to extend (eventually) to track the number of shmem and file mthps in additional counters, could we also account using similar folio free-time hooks? If not, it might be an argument to account in rmap_unmap to be consistent for all? > >> + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, -1); >> >> /* >> * It would be tidy to reset folio_test_anon mapping when fully >