From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDBB5C3DA4A for ; Fri, 9 Aug 2024 08:41:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 59F0C6B0098; Fri, 9 Aug 2024 04:41:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54E446B009A; Fri, 9 Aug 2024 04:41:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C937900002; Fri, 9 Aug 2024 04:41:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1D3D66B0098 for ; Fri, 9 Aug 2024 04:41:03 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9D7D2A7959 for ; Fri, 9 Aug 2024 08:41:02 +0000 (UTC) X-FDA: 82432061964.15.BC0C68F Received: from mail-vs1-f49.google.com (mail-vs1-f49.google.com [209.85.217.49]) by imf15.hostedemail.com (Postfix) with ESMTP id BE76AA0020 for ; Fri, 9 Aug 2024 08:41:00 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iftP7ouh; spf=pass (imf15.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.49 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723192794; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yyhhln8LdxGbaSUipVO6QkS3X9KbQAWUVQTlCes9cL8=; b=eYfxyKkqHAe8TRTKfREGnV6LHLK24k3a6StATE5ojW4hSc0NZ9UXsvYt2bSY1xMKEzvjBh m49mIquAkFN0/26cuq7O853guJmknf8ySOTGgQSzaBlBOZbvSNt04DCbAybciDCvmVaTVc Kl9KVw14pJZ1i0dO62o8lFGmKGHyWkE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723192794; a=rsa-sha256; cv=none; b=k+pGGNJbzkglMcibttF4o6LSpXgswxN81TtvWfC3uzV4fKkVsYWkmQaC1hhXJQgea3Y5Y5 8yBhEfX5RvtRqIHcwT01BD7m1EJ7fwRPbcJpNS01A9eWiudfK1Bd/DZQgRS9Rd6Iv3fE6L 2LMlPLzffFf4wElp7zT8X3qK3f/0fSw= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iftP7ouh; spf=pass (imf15.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.49 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vs1-f49.google.com with SMTP id ada2fe7eead31-4929f9a28c7so660401137.1 for ; Fri, 09 Aug 2024 01:41:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723192860; x=1723797660; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yyhhln8LdxGbaSUipVO6QkS3X9KbQAWUVQTlCes9cL8=; b=iftP7ouhAz4+kARH7PelKwMVzqB20T9MrbpatKheHRUNKQj1bna0n9qzx1QnyjRbfw MpW7bWO/WNgHYa51zKr1X7w4+8i87Ql6lUn4aIChmMtG6d1+dyvNTgHU7cKH5SEujEeP DuSxxKHRB9vyWe0/X3gQkNR+ZNsL2isms1Nkl47Vgt4FOEzw6Nj5SFiscI0e4JVrEEm/ pWpna10zFaG4xLOe6P9pygnMCSS/apfrwLTo2iIzIiJju8Hw0hma7nTeu9qvEHPz5Gao sjhDwEJ5VnGrZIKihIWfGPpTaDTia5ZGitjPu+VsJDIDnW1CEn4bRUkjWBV/rysHFca3 yPZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723192860; x=1723797660; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yyhhln8LdxGbaSUipVO6QkS3X9KbQAWUVQTlCes9cL8=; b=B+oT5ROs1EzU/QOMqeAzrtUnDBrrOHUo3Qg3hxk3F3UjZHP8RM36LPyvtLhfCi5v4U pCuvAzULthIFz1jOhkq01MVw54LjTj9SqBY/Yc0YXzrM66wIAGc5kejBBcHARz21SRZi kdENTtxQOuvnZYFvvnNTFiB76MWmq5yAqmDnLIRzU2HVi67X1miyYcAurg5vvMKkydBJ 8FcVYp9hv7/ilQxlzfdrTcwB9Z5ieW2WnC6Asb5HBTumG49T+X8yHGYqimiGlsqs1v4m r885yzPjNtIc76Vq15rLBeqw1U1qixm86sJhlpTztn+RJK3iKIRVN9Ec65NvoWwkFsUm TGrQ== X-Forwarded-Encrypted: i=1; AJvYcCVNzvhtEdbFjfWJkBnw/0GXSC+hM8UHUduyn4aK+dNjlxxE9lmnKSVkA3cjP4rjCOp1njqLjt2l191tfspqTY6pjtA= X-Gm-Message-State: AOJu0YwiZYpgUwJTh41MXlr8fFCr4nJe2pSIN3FszG/MiKWSHIUlJAU3 iPqYM/HWCzNvuNxg9gGcUIjShg6VoFeu3LmhftwN+D8vLzSGas5D2zZgn+gd46lR19o7o5q8b8B VsRW43kk0ZxJ+6hyeiUFmEz/NSPw= X-Google-Smtp-Source: AGHT+IH/PjpVOgaurrZku4yroPYgH0aUSlgZSBJCHBBnT+awGDYLpvy48+rB/QrEdWv7nteIAriYF9GQNGA3/4JhK9M= X-Received: by 2002:a05:6102:442b:b0:493:c95b:4c15 with SMTP id ada2fe7eead31-495d821662cmr923877137.0.1723192859270; Fri, 09 Aug 2024 01:40:59 -0700 (PDT) MIME-Version: 1.0 References: <20240808010457.228753-1-21cnbao@gmail.com> <20240808010457.228753-2-21cnbao@gmail.com> <1222cd76-e732-4238-9413-61843249c1e8@arm.com> In-Reply-To: <1222cd76-e732-4238-9413-61843249c1e8@arm.com> From: Barry Song <21cnbao@gmail.com> Date: Fri, 9 Aug 2024 16:40:47 +0800 Message-ID: Subject: Re: [PATCH RFC 1/2] mm: collect the number of anon large folios To: Ryan Roberts Cc: akpm@linux-foundation.org, linux-mm@kvack.org, chrisl@kernel.org, david@redhat.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, ioworker0@gmail.com, baolin.wang@linux.alibaba.com, ziy@nvidia.com, hanchuanhua@oppo.com, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: yuifa7ty9qfrmefyfhirmjtb7bqxmqb1 X-Rspamd-Queue-Id: BE76AA0020 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1723192860-8454 X-HE-Meta: U2FsdGVkX19KgfqSIDnCB7FjlliC1JGYCPWekpp4ZTAsmrjyxPfs5G+TI4vHMm+USG92B1nbJ7ARQFuxIpzofpzP0gh4P0yQuioDJkJITYJHzTCIA9eAYTWjEwUPBMbtC5mYqseHu4BEaBD19Kaxd+xah3d5D9zosuBRNt3eZXrQmVwmdv/DyjHkLoqa6K+QnoPBQ+Ifpcs96lnyIufRVmy7ofFHVNlq6Tf4JnxA8XS6hg+Ii7Dfs9TT4jUxwnA+wknhAaaXq286TZQb63ccsCH56zzaNM/BB0ZWXPs79VS4TNi3pmkA+J0+0KFzFKfr2C7gP2YWLCDUHzvwY9DRl4Zag5vQZScTGlT2XGt6mCoUXuNT43eclh7kGmMbllEvaWtKpW+Z0kl4AF1uVbvMmy2OarIpTsF9+4hQUXfkfyrYmO8gv5wQJim36w3ALc/RJ+0bXYuPF23wmlhChkpOLszi6kGOCjFvvT0F2Drplnx90z26rmStunn2zkQFaaMxzIcYx6LgS38wUIRD9n+PmwZl9lX/lI/yw04ZiBkvIEhyRy3XA3I1wr3XPYpwUeQI993w2ZKdyWUDoPixemITPsa029qoCMrJHjb3Gqn0fXmswPJWTlKlb+uqc10eAk+QjU9iG11hn/g1gAlMkAhi5gUjeudcykXhtuWsFHLawoB2c/QbBidRE1/LdYn2V8Pml7rqGY8ToYGMt/FBWMP50y+bwaVB2esqBltvNSRNzU6kW4ox/nS7GCj8lUf3xdVjD/QGlqrik/kob4OASWXj+AH/500AEEp7ckWdrZgKRBqXTQa8JbRNHNOXd3/dXpUdy/MurcZGxjL6Lrmryn+rZkT31ye0wKR+zq07SPzmwrncjzqg0mSZiSPfcxnoPRBkasu/pan+QRJODfLTvk1X7QljsM/4g8nmkiaKkwhmxO9OTKNW2HNDffWmVRdVwPJSAbeCZtyRBrGdjm3FuiR VuKb60RB remedKdA4dKJGUaDX2FhWvK6L9e9J3Aso1Xvt5x28RGbNyXxlZtXeVyHLvLEFsSAJdmwFIaQ7T8fTu5cFwLIzdvkZ8gaQc7sBT7pYySjleDCnTmxpXKABnoeIKKRkmf9LJq9EsED2GXEO/jtWiXDWpco+yndxF5AiNoTR7nAVTatbaD8yf3mDDSgzGB02iMJ8UKc/oIfvGhKKtabByBfON6R5FhI3ZFB3f2XvJCdjcz2dmvO+lxe71U7Q9bxXfm+9dOE52F6ctzYFSqVkIk84xGDOzywOF3yy5GA4DzSt1DFHzf6jSvH7PGm4Xcfxx/zVUNVg8sZ1A1VhL651LwWI/PQ9QbbMNQIh2UmOX/Vc47ajfmdMFENDoX3F0EJls1Ega9XeSpYECGWILMubv3EQhF/kzcsGmU8RhiAP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 9, 2024 at 4:27=E2=80=AFPM Ryan Roberts = wrote: > > On 09/08/2024 09:13, Ryan Roberts wrote: > > On 08/08/2024 02:04, Barry Song wrote: > >> From: Barry Song > >> > >> When a new anonymous mTHP is added to the rmap, we increase the count. > >> We reduce the count whenever an mTHP is completely unmapped. > >> > >> Signed-off-by: Barry Song > >> --- > >> Documentation/admin-guide/mm/transhuge.rst | 5 +++++ > >> include/linux/huge_mm.h | 15 +++++++++++++-- > >> mm/huge_memory.c | 2 ++ > >> mm/rmap.c | 3 +++ > >> 4 files changed, 23 insertions(+), 2 deletions(-) > >> > >> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentatio= n/admin-guide/mm/transhuge.rst > >> index 058485daf186..715f181543f6 100644 > >> --- a/Documentation/admin-guide/mm/transhuge.rst > >> +++ b/Documentation/admin-guide/mm/transhuge.rst > >> @@ -527,6 +527,11 @@ split_deferred > >> it would free up some memory. Pages on split queue are going = to > >> be split under memory pressure, if splitting is possible. > >> > >> +anon_num > >> + the number of anon huge pages we have in the whole system. > >> + These huge pages could be still entirely mapped and have parti= ally > >> + unmapped and unused subpages. > > > > nit: "entirely mapped and have partially unmapped and unused subpages" = -> > > "entirely mapped or have partially unmapped/unused subpages" > > > >> + > >> As the system ages, allocating huge pages may be expensive as the > >> system uses memory compaction to copy data around memory to free a > >> huge page for use. There are some counters in ``/proc/vmstat`` to hel= p > >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >> index e25d9ebfdf89..294c348fe3cc 100644 > >> --- a/include/linux/huge_mm.h > >> +++ b/include/linux/huge_mm.h > >> @@ -281,6 +281,7 @@ enum mthp_stat_item { > >> MTHP_STAT_SPLIT, > >> MTHP_STAT_SPLIT_FAILED, > >> MTHP_STAT_SPLIT_DEFERRED, > >> + MTHP_STAT_NR_ANON, > >> __MTHP_STAT_COUNT > >> }; > >> > >> @@ -291,14 +292,24 @@ struct mthp_stat { > >> #ifdef CONFIG_SYSFS > >> DECLARE_PER_CPU(struct mthp_stat, mthp_stats); > >> > >> -static inline void count_mthp_stat(int order, enum mthp_stat_item ite= m) > >> +static inline void mod_mthp_stat(int order, enum mthp_stat_item item,= int delta) > >> { > >> if (order <=3D 0 || order > PMD_ORDER) > >> return; > >> > >> - this_cpu_inc(mthp_stats.stats[order][item]); > >> + this_cpu_add(mthp_stats.stats[order][item], delta); > >> +} > >> + > >> +static inline void count_mthp_stat(int order, enum mthp_stat_item ite= m) > >> +{ > >> + mod_mthp_stat(order, item, 1); > >> } > >> + > >> #else > >> +static inline void mod_mthp_stat(int order, enum mthp_stat_item item,= int delta) > >> +{ > >> +} > >> + > >> static inline void count_mthp_stat(int order, enum mthp_stat_item ite= m) > >> { > >> } > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >> index 697fcf89f975..b6bc2a3791e3 100644 > >> --- a/mm/huge_memory.c > >> +++ b/mm/huge_memory.c > >> @@ -578,6 +578,7 @@ DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_= STAT_SHMEM_FALLBACK_CHARGE); > >> DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT); > >> DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED); > >> DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); > >> +DEFINE_MTHP_STAT_ATTR(anon_num, MTHP_STAT_NR_ANON); > > Why are the user-facing and internal names different? Perhaps it would be > clearer to call this nr_anon in sysfs? > > >> > >> static struct attribute *stats_attrs[] =3D { > >> &anon_fault_alloc_attr.attr, > >> @@ -591,6 +592,7 @@ static struct attribute *stats_attrs[] =3D { > >> &split_attr.attr, > >> &split_failed_attr.attr, > >> &split_deferred_attr.attr, > >> + &anon_num_attr.attr, > >> NULL, > >> }; > >> > >> diff --git a/mm/rmap.c b/mm/rmap.c > >> index 901950200957..2b722f26224c 100644 > >> --- a/mm/rmap.c > >> +++ b/mm/rmap.c > >> @@ -1467,6 +1467,7 @@ void folio_add_new_anon_rmap(struct folio *folio= , struct vm_area_struct *vma, > >> } > >> > >> __folio_mod_stat(folio, nr, nr_pmdmapped); > >> + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1); > >> } > >> > >> static __always_inline void __folio_add_file_rmap(struct folio *folio= , > >> @@ -1582,6 +1583,8 @@ static __always_inline void __folio_remove_rmap(= struct folio *folio, > >> list_empty(&folio->_deferred_list)) > >> deferred_split_folio(folio); > >> __folio_mod_stat(folio, -nr, -nr_pmdmapped); > >> + if (folio_test_anon(folio) && !atomic_read(mapped)) > > > > Agree that atomic_read() is dodgy here. > > > > Not sure I fully understand why David prefers to do the unaccounting at > > free-time though? It feels unbalanced to me to increment when first map= ped but > > decrement when freed. Surely its safer to either use alloc/free or use = first > > map/last map? As long as we can account for mTHP when clearing the Anon flag for the foli= o, we should be safe. It=E2=80=99s challenging to add +1 when allocating a lar= ge folio because we don=E2=80=99t know its intended use=E2=80=94it could be for file= , anon, or shmem. > > > > If using alloc/free isn't there a THP constructor/destructor that prepa= res the > > deferred list? (My memory may be failing me). Could we use that? > > Additionally, if we wanted to extend (eventually) to track the number of = shmem > and file mthps in additional counters, could we also account using simila= r folio > free-time hooks? If not, it might be an argument to account in rmap_unmap= to be > consistent for all? I've been struggling quite a bit with rmap. Despite trying various approaches, I=E2=80=99m still occasionally seeing a negative mTHP counter after running it some hou= rs on phones. It seems that rmap is really tricky to handle. I admit that I h= ave failed on rmap :-) On the other hand, for anon folios, we have cases where they are split from order M to order N. So, we add +1 when a new anon folio is added to rmap and subtract -1 when we either split it or free it. This approach seems cle= arer to me. When we split from order M to order N, we can add 1 << (M - N) for order N. > > > > > >> + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, -1); > >> > >> /* > >> * It would be tidy to reset folio_test_anon mapping when fully Thanks Barry