From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C038AC3DA4A for ; Fri, 9 Aug 2024 07:50:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 41EFE6B0092; Fri, 9 Aug 2024 03:50:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CF5D6B0098; Fri, 9 Aug 2024 03:50:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 296226B009A; Fri, 9 Aug 2024 03:50:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 063796B0092 for ; Fri, 9 Aug 2024 03:50:06 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2D4B71609FE for ; Fri, 9 Aug 2024 07:50:06 +0000 (UTC) X-FDA: 82431933612.14.D1913F9 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf26.hostedemail.com (Postfix) with ESMTP id 61FE3140003 for ; Fri, 9 Aug 2024 07:50:04 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723189771; a=rsa-sha256; cv=none; b=SZgIcMY511Yaei1ckq+GP4hL0z1sZQoh/ONn/fXzYhWu7sJ3jilAtSk0KeD7lDiXkBl5Jx Av2PlsXZ33tQfyJLRmzKBeOywZQB3XDWzZfNLwIsBI4sj1wK8lRfoLl+/biD/Gsg6SjQSU tP+aAGRQqHa6/647jf152Hwb31pALoc= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723189771; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xmp4oxffMOIY2xak1FOhqeroQEzZ2HdNgU8EyQGrKTA=; b=NlTfv91JFcv6ny//sXRPaFqlJRE3bJTRAkZ2Rt6JMDCJGDNHUtouuXNpugiUaM4DrI2aIn jVANhfQ8fLZxMNG2sFzXZH/4HK2yizdGFzMAGUhT7h//ckmcZxj8nudRTHy9Gcw9tiFJhY zyNIWurNiedLekVZ42W+qiWoR+MEuhc= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 79DCDFEC; Fri, 9 Aug 2024 00:50:29 -0700 (PDT) Received: from [10.57.95.64] (unknown [10.57.95.64]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7E46B3F6A8; Fri, 9 Aug 2024 00:50:02 -0700 (PDT) Message-ID: <41658cce-6cd8-47bf-b9bd-657e0a362d11@arm.com> Date: Fri, 9 Aug 2024 08:50:00 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/2] mm: add per-order mTHP split counters Content-Language: en-GB To: Barry Song Cc: Lance Yang , akpm@linux-foundation.org, david@redhat.com, baolin.wang@linux.alibaba.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20240424135148.30422-1-ioworker0@gmail.com> <20240424135148.30422-2-ioworker0@gmail.com> <71fdab06-0442-4c55-811b-b38d3b024c85@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: x9dnuzpbnk8ps4n6a88ym7uizmmssi8g X-Rspamd-Queue-Id: 61FE3140003 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1723189804-137985 X-HE-Meta: U2FsdGVkX1+m302hinrK/Z8J9K+MGPMx2TajtFh5U7uTAA/l2rSWsmRZ/osPGlIGMqfampF9EkjUQkQ1HA/RPwN/7de9xMd1fLH8KMCnCoLF/2uOa1sg/cYFtLaEvDeh+kJ/0V3O/JNXSutaPf16egMY/SMocQ3tA8bVFeEPeVnia3s9YzE+UAGkhOGnifNxKVqALU4EuoOlf+G9APZ2/oZkdOqDQg6Aos0z2h2JYqIxHnQ4okLhnqgWrdj0sn7ZNPcAbpGspmr2wy22lKIHDEUkLmLqfVg1mX1tIvPYdxBvsxCiPMafJRg3dwHt2DimfJm3SmxrTM9Pto+8zfEo9gRJ7YrQmInAwZ7B+bRiGkzPK4CLe5pT6QyKP4W8mfMPhfoEB8xQACTJRwKw3E1EBV+pM/P7gD62THnAFg9l1FAMFcmHOdY4OnTljfpJTkB4Bam/oX2KulGifk7PYF7AtSsmvPXj3spE1FMWtxejnjLTb1+yY9xl+vhngXzV3jky3hnkGaryk2JU47cvNREPqQilRkc06Yj3p5/C5tX7V2ZnFGVxxO8Fs5WO/z/iuI3SF4PD7W2LOlFlXJI3TDku85ITWOzqXGINceWhO4wVrboMQ5j67sWbqJzZFXws6M61UE2k6uDn2kfmeLJx8AVIIVr5Oyqq96Ch178YNPfNhrVp/C3lwQCVVpAXmW+/LbkuidZsnZbeZXMFIMG1542Sx/X4OWlU2pbS6/8PNfrUhiuFk4gFhnM4x4VJ/IKdcYqub6MM2NteS9LvdvLbYnjiuG5bSNZxOD2DG0n7fE7/+DWv1kBZZpovFhRRMpozI33dY4U0mnr9yIhRYHMfGYuqkZJs1sKimSWDQ3B10o8BugEHmodErAiRdchKgYxl+s2Z0m8K7SBi4VdrgeO314ZTzUH4y9NjPXNCfgmFDbvMxqtYe2HaNwkjRqUvVh6SMhCZoCONMNnsZV8xAOmCoVQ mgAgTb6P o5oMVuVGtCAtGEjGRX/W26rYzbXBFoC2oVGI107WFmaiGycrv9m2gWNLwPX71D6SvoI31us1bq/tWZKVKoZngVUXloPeFubDAWTUAOIXoZTk4XfySbTChRPhcucnupXuhYZ8599hvqO30AJzT4ECru63nQP4Qqtla89nJKOOlnFUWkJTSu7eAFkOYFQjOACDpKlw0JHrQTiTzM9gpawQs8I0zxJ6LP1cw/TuQZZpsouM6ovJ4VrEgRawnoMeYg4NiNBeS/VKiEQGqyX9D7uXorWDjYgb8oH6p21a3FcJbrahxgNsroqdQY6aBLA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 08/08/2024 22:27, Barry Song wrote: > On Mon, Jul 1, 2024 at 8:16 PM Ryan Roberts wrote: >> >> On 30/06/2024 12:34, Lance Yang wrote: >>> Hi Barry, >>> >>> Thanks for following up! >>> >>> On Sun, Jun 30, 2024 at 5:48 PM Barry Song wrote: >>>> >>>> On Thu, Apr 25, 2024 at 3:41 AM Ryan Roberts wrote: >>>>> >>>>> + Barry >>>>> >>>>> On 24/04/2024 14:51, Lance Yang wrote: >>>>>> At present, the split counters in THP statistics no longer include >>>>>> PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP split >>>>>> counters to monitor the frequency of mTHP splits. This will assist >>>>>> developers in better analyzing and optimizing system performance. >>>>>> >>>>>> /sys/kernel/mm/transparent_hugepage/hugepages-/stats >>>>>> split_page >>>>>> split_page_failed >>>>>> deferred_split_page >>>>>> >>>>>> Signed-off-by: Lance Yang >>>>>> --- >>>>>> include/linux/huge_mm.h | 3 +++ >>>>>> mm/huge_memory.c | 14 ++++++++++++-- >>>>>> 2 files changed, 15 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>>>> index 56c7ea73090b..7b9c6590e1f7 100644 >>>>>> --- a/include/linux/huge_mm.h >>>>>> +++ b/include/linux/huge_mm.h >>>>>> @@ -272,6 +272,9 @@ enum mthp_stat_item { >>>>>> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, >>>>>> MTHP_STAT_ANON_SWPOUT, >>>>>> MTHP_STAT_ANON_SWPOUT_FALLBACK, >>>>>> + MTHP_STAT_SPLIT_PAGE, >>>>>> + MTHP_STAT_SPLIT_PAGE_FAILED, >>>>>> + MTHP_STAT_DEFERRED_SPLIT_PAGE, >>>>>> __MTHP_STAT_COUNT >>>>>> }; >>>>>> >>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>>>> index 055df5aac7c3..52db888e47a6 100644 >>>>>> --- a/mm/huge_memory.c >>>>>> +++ b/mm/huge_memory.c >>>>>> @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK); >>>>>> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); >>>>>> DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT); >>>>>> DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK); >>>>>> +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE); >>>>>> +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED); >>>>>> +DEFINE_MTHP_STAT_ATTR(deferred_split_page, MTHP_STAT_DEFERRED_SPLIT_PAGE); >>>>>> >>>>>> static struct attribute *stats_attrs[] = { >>>>>> &anon_fault_alloc_attr.attr, >>>>>> @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] = { >>>>>> &anon_fault_fallback_charge_attr.attr, >>>>>> &anon_swpout_attr.attr, >>>>>> &anon_swpout_fallback_attr.attr, >>>>>> + &split_page_attr.attr, >>>>>> + &split_page_failed_attr.attr, >>>>>> + &deferred_split_page_attr.attr, >>>>>> NULL, >>>>>> }; >>>>>> >>>>>> @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, >>>>>> XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order); >>>>>> struct anon_vma *anon_vma = NULL; >>>>>> struct address_space *mapping = NULL; >>>>>> - bool is_thp = folio_test_pmd_mappable(folio); >>>>>> + int order = folio_order(folio); >>>>>> int extra_pins, ret; >>>>>> pgoff_t end; >>>>>> bool is_hzp; >>>>>> @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, >>>>>> i_mmap_unlock_read(mapping); >>>>>> out: >>>>>> xas_destroy(&xas); >>>>>> - if (is_thp) >>>>>> + if (order >= HPAGE_PMD_ORDER) >>>>>> count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED); >>>>>> + count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE : >>>>>> + MTHP_STAT_SPLIT_PAGE_FAILED); >>>>>> return ret; >>>>>> } >>>>>> >>>>>> @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio) >>>>>> if (list_empty(&folio->_deferred_list)) { >>>>>> if (folio_test_pmd_mappable(folio)) >>>>>> count_vm_event(THP_DEFERRED_SPLIT_PAGE); >>>>>> + count_mthp_stat(folio_order(folio), >>>>>> + MTHP_STAT_DEFERRED_SPLIT_PAGE); >>>>> >>>>> There is a very long conversation with Barry about adding a 'global "mTHP became >>>>> partially mapped 1 or more processes" counter (inc only)', which terminates at >>>>> [1]. There is a lot of discussion about the required semantics around the need >>>>> for partial map to cover alignment and contiguity as well as whether all pages >>>>> are mapped, and to trigger once it becomes partial in at least 1 process. >>>>> >>>>> MTHP_STAT_DEFERRED_SPLIT_PAGE is giving much simpler semantics, but less >>>>> information as a result. Barry, what's your view here? I'm guessing this doesn't >>>>> quite solve what you are looking for? >>>> >>>> This doesn't quite solve what I am looking for but I still think the >>>> patch has its value. >>>> >>>> I'm looking for a solution that can: >>>> >>>> * Count the amount of memory in the system for each mTHP size. >>>> * Determine how much memory for each mTHP size is partially unmapped. >>>> >>>> For example, in a system with 16GB of memory, we might find that we have 3GB >>>> of 64KB mTHP, and within that, 512MB is partially unmapped, potentially wasting >>>> memory at this moment. I'm uncertain whether Lance is interested in >>>> this job :-) >>> >>> Nice, that's an interesting/valuable job for me ;) >>> >>> Let's do it separately, as 'split' and friends probably can’t be the >>> solution you >>> mentioned above, IMHO. >>> >>> Hmm... I don't have a good idea about the solution for now, but will >>> think it over >>> and come back to discuss it here. >> >> I have a grad starting in a couple of weeks and I had been planning to initially >> ask him to look at this to help him get up to speed on mTHP/mm stuff. But I have >> plenty of other things for him to do if Lance wants to take this :) > > Hi Ryan, Lance, > > My performance profiling is pending on the mTHP size and partially > unmapped mTHP size issues (understanding the distribution of folio > sizes within the system), so I'm not waiting for either Ryan's grad > or Lance. I've sent an RFC for this, and both of you are CC'd: > > https://lore.kernel.org/all/20240808010457.228753-1-21cnbao@gmail.com/ Yes I saw that, I'll try to give it some review today. > > Apologies for not waiting. You are still warmly welcomed to participate > in the discussion and review. No problem; after this last discussion, I assumed Lance was going to work on it so there has been no duplicated effort from my side. Glad to see it implemented. > >> >>> >>>> >>>> Counting deferred_split remains valuable as it can signal whether the system is >>>> experiencing significant partial unmapping. >>> >>> Have a nice weekend! >>> Lance >>> >>>> >>>>> >>>>> [1] https://lore.kernel.org/linux-mm/6cc7d781-884f-4d8f-a175-8609732b87eb@arm.com/ >>>>> >>>>> Thanks, >>>>> Ryan >>>>> >>>>>> list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); >>>>>> ds_queue->split_queue_len++; >>>>>> #ifdef CONFIG_MEMCG >>>>> >>>> > > Thanks > Barry