From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 636C4C2BD09 for ; Mon, 1 Jul 2024 11:06:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D05266B0089; Mon, 1 Jul 2024 07:06:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C8E1B6B00A2; Mon, 1 Jul 2024 07:06:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2DBB6B00AE; Mon, 1 Jul 2024 07:06:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 953C16B0089 for ; Mon, 1 Jul 2024 07:06:32 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id F2DFB14104C for ; Mon, 1 Jul 2024 11:06:31 +0000 (UTC) X-FDA: 82290905382.12.C5D9F7E Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf02.hostedemail.com (Postfix) with ESMTP id 21AD48001F for ; Mon, 1 Jul 2024 11:06:29 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SM9hN1M0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719831972; a=rsa-sha256; cv=none; b=P1HJCJZFxylWJ8MpjlIWq84DerPsJeW2jKze8+isRxxa3mDhjN5FN5N1cN9yWgwuB7k4JE wB/lYSqgwHPTQA49wexW/U0pNWRCvYl3lEniPvSFfcU/izd9IbXbEvZSZcXDVt6gTiAUzz w3IMeoW6kmPPoUoRaEHfH6EblOQlnDc= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SM9hN1M0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719831972; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4w+BuytsqZsO9Y4U62aqCkkMYRmrN9tWjBqBOSV8o8E=; b=WyMS3jiqUUweXGw4NLISDFFS8mE+ODFsBs1aBdeFiGdwZhMScm6qzlQQIiAOmoI1xifxZr nUdhOhv5J6ieufYDp8Joybb7HY7yIEoFz6iIS9vAGg/UFGFSWYfWnAoSxcamTbLDGmH/RU 1elrkbHj2JPuKH0BryExyijZ90HvcCI= Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-a6265d3ba8fso276257266b.0 for ; Mon, 01 Jul 2024 04:06:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719831988; x=1720436788; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4w+BuytsqZsO9Y4U62aqCkkMYRmrN9tWjBqBOSV8o8E=; b=SM9hN1M0JFv+ThCB4rtPfBqGII6q8G4/cVkbsjMkimwmkKcyW9AWdsxhEvCxJGANoh N210DG001l9CoOZBc5TZm7AjOF3/khhar+Yh/T7Apm3FTU5j/O7DG6FDV6xhYKNg3Y6L N1rH9aYjkuhFWcoPeyt6goFQvDNKAL92lJuXGU0Pyb3dHHmszyxiLMUlVSh68TTjtkX7 Eg1Wr1X5cv+lgMIPNqIGs1nL4EvhoDx9PIxbkwTbJ3e2m07mxZQf8CsEGz7zc288on01 CQ9PdsygUiSEz45tvAGqK8l13+cm/GzahqUkSfDuvtwFq75uJlkCwQxKDae9xPgzak29 d6Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719831988; x=1720436788; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4w+BuytsqZsO9Y4U62aqCkkMYRmrN9tWjBqBOSV8o8E=; b=mH1ALjY7Iti2ghASSyuGV5CCF7Zdr/Lhw1hSHsMuV5Dd5WRKWTsuFbRnwtVLTffgi8 inxvd2JcLk56pm1UUemb7DHG/3oZaDdHCJ2EtIGHJiz3FnAOvzL/9ULF91+PcsGVh+vQ qjpv26loGYp/kbMojQBQXjRZ7xFAvRyaKmfbL46yFNFqyYEUaPcuaykI235udmGjJ3VC R/hSQEzFnOMxykK2+BkxCI0y9kbt8T2RgnzGph5pB2UqirEql2UxHA/OKF2sQjQmQf/K pvWaAi1skT1jeTR/fQx+IBaeJQrk3i81psYnwMJVq1LuDMNyNN7abG/H86jWbyymDY9d dSpQ== X-Forwarded-Encrypted: i=1; AJvYcCU3HGt3QsajlSr2zPRZxNgrJNLmz6Pgusd59w557Nd3QHPYtcVve5Q33n8V1IoLi/l5xZnflBPQGtXtK4tTyz9m4vc= X-Gm-Message-State: AOJu0YxFAhGqevIes8hY0xzrPcP8N+puSqIPG3zr96UnoIAScvR2g2qE 2KoZue8t07j/QgDTy7/3RRLnEXSJH8F4efn2PzvD052Q0fFsM5P+vCPQ70gqE4CJLzkSH3guCyx Xz0i6DBSWpmLsWX9AYL8l4Tt/TAU= X-Google-Smtp-Source: AGHT+IH3AwtXoQhsT2XM4WG0pj8HTzz9DV2+IYdPpTxsPR1gEqvC7acYO+KtfcCMhgBy2tZXWynkk7ky0Y3BJIpXVAw= X-Received: by 2002:a17:906:d152:b0:a6f:ab9c:7780 with SMTP id a640c23a62f3a-a751440f01bmr345735966b.31.1719831988212; Mon, 01 Jul 2024 04:06:28 -0700 (PDT) MIME-Version: 1.0 References: <20240424135148.30422-1-ioworker0@gmail.com> <20240424135148.30422-2-ioworker0@gmail.com> <23d9f708-b1fd-4b10-b755-b7ef6aa683e8@redhat.com> In-Reply-To: <23d9f708-b1fd-4b10-b755-b7ef6aa683e8@redhat.com> From: Lance Yang Date: Mon, 1 Jul 2024 19:06:17 +0800 Message-ID: Subject: Re: [PATCH 1/2] mm: add per-order mTHP split counters To: David Hildenbrand Cc: Barry Song , Ryan Roberts , akpm@linux-foundation.org, baolin.wang@linux.alibaba.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 21AD48001F X-Stat-Signature: e36zixskz6nfn6qwxqew4sjiuft5j1ky X-Rspam-User: X-HE-Tag: 1719831989-618151 X-HE-Meta: U2FsdGVkX1+v52cy4Sp00TIUAbwwr5TjxMvpZf18QUrTYSUT9JRi4CJwYv0LdO7c+A1Xbkh0vhPxZ42OOR2DnNzf8DJNgQTwHXfCEGZpYYnmYys5W9HckUIxvbPp/0OGOQkrUUfY/cpCB5CGIkn1+rF4SyS0GQ+lBLWhmGoUQRjqBbI9g91iZmBrNnj7xhetzoSXxy5drLe5E4gimwmcsEyRj2P3/XRXH2pQDfh9CkbcjQ/dqpl6HDRIZGsloPAmkAKYPFLnAOPQNNd3ktyvU/VMAc4eRtt8baaoBz56uSIz3WWkBd96ysVXcLO2QE77GQ3bVVS7FDK+wNMl8XeF2FvulsGveo+e+kv5u5tsKxy4m7DE3g0VcWsXGVE438lSiRvyRKe9RVHaMP+2JRNo89nfaL5oF3VQS75T9zHlsKAZcD3NWOeKD4KXUgLIOBfgOdLZwz/v27ABvHn4DjZVQHDxzo+4wa3apKHNKFU0xkPUvQlchTOpuFmkA8rqHbCOjOp5TRT3QpZJA8chrYW0I0yyGiQ4xBmkM6w97LJIHXAQ4mChYQe1FF8B8FInBp0O7zO8xNF9SFlBoRsPh+RSDrhqnIIybgefF405okz6gCsxC8N1gAb2Fca+cNTWGI8VEpCrjSPXbdt67syOnHk04SLDSVW25Y6JPAd+DDHH3lAwRfNbmh+plk+7X1cgVu0BQXTbBuXHGYzGytJaHa2J/G5dQlOdexDpoZvTM7rla9H3eDyNbO0CmUr6Amlf69coeshmV83ioBIu0JEkZvUx5pE251LFxM6NO3KpCqcfAPPm19UEz6E5mylH9vu7tEUe11ggIct6ahLHuzFyImJsXBRnLdVRia80IjIvzi8q3smh3Vxea4FU9IBIlKbfnJpWzpBZXTnrSQLHtUm3HWKjeaIyEPHK8WkN0T9ApyGdGH9TACqiQLtZB4CLoJOEz6bHpp8+ldyVfRr7kqT39u1 CtyIjFUn p/hk/x6shcGUQ/d/Rz/vIzOfTlnTZvsDfQgpTaYYdEJHKP7fzj8xEFztmnOsNr3Z8mqrDXbZOkc4ndWkBZkoQ6yEZJH+W+0RUIlFd3XD/WBD51IvrAKHUejksYZOFyIsGxWKs332OSI0YFqPmexhcwjjG9YQrxBytJ/Lmu25mm/e2g8I1kRAp++8swaUfuoMjEv74DwqGfG+Ra70Puu27RjSzJ94h5rRDrcGPyDfjzQbkpBapmq6CJTeXne9zv0EnFD2FNsjPOhZLY9coyGCsaYkEKOROqTld8lB0/hZe99OMuc82cls9cnTKm06dTjtlA52auwiTIjfVJD+nj+Gqn4yW0tKNW753+giH+brj7cGgJuSO3nIGCkeIdSxyllVPuLwcJQu2m1DVuY3Gxp02bDplUg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi David, On Mon, Jul 1, 2024 at 4:56=E2=80=AFPM David Hildenbrand = wrote: > > On 30.06.24 11:48, Barry Song wrote: > > On Thu, Apr 25, 2024 at 3:41=E2=80=AFAM Ryan Roberts wrote: > >> > >> + Barry > >> > >> On 24/04/2024 14:51, Lance Yang wrote: > >>> At present, the split counters in THP statistics no longer include > >>> PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP spl= it > >>> counters to monitor the frequency of mTHP splits. This will assist > >>> developers in better analyzing and optimizing system performance. > >>> > >>> /sys/kernel/mm/transparent_hugepage/hugepages-/stats > >>> split_page > >>> split_page_failed > >>> deferred_split_page > >>> > >>> Signed-off-by: Lance Yang > >>> --- > >>> include/linux/huge_mm.h | 3 +++ > >>> mm/huge_memory.c | 14 ++++++++++++-- > >>> 2 files changed, 15 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >>> index 56c7ea73090b..7b9c6590e1f7 100644 > >>> --- a/include/linux/huge_mm.h > >>> +++ b/include/linux/huge_mm.h > >>> @@ -272,6 +272,9 @@ enum mthp_stat_item { > >>> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, > >>> MTHP_STAT_ANON_SWPOUT, > >>> MTHP_STAT_ANON_SWPOUT_FALLBACK, > >>> + MTHP_STAT_SPLIT_PAGE, > >>> + MTHP_STAT_SPLIT_PAGE_FAILED, > >>> + MTHP_STAT_DEFERRED_SPLIT_PAGE, > >>> __MTHP_STAT_COUNT > >>> }; > >>> > >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>> index 055df5aac7c3..52db888e47a6 100644 > >>> --- a/mm/huge_memory.c > >>> +++ b/mm/huge_memory.c > >>> @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_S= TAT_ANON_FAULT_FALLBACK); > >>> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FA= ULT_FALLBACK_CHARGE); > >>> DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT); > >>> DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_F= ALLBACK); > >>> +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE); > >>> +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED= ); > >>> +DEFINE_MTHP_STAT_ATTR(deferred_split_page, MTHP_STAT_DEFERRED_SPLIT_= PAGE); > >>> > >>> static struct attribute *stats_attrs[] =3D { > >>> &anon_fault_alloc_attr.attr, > >>> @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] =3D { > >>> &anon_fault_fallback_charge_attr.attr, > >>> &anon_swpout_attr.attr, > >>> &anon_swpout_fallback_attr.attr, > >>> + &split_page_attr.attr, > >>> + &split_page_failed_attr.attr, > >>> + &deferred_split_page_attr.attr, > >>> NULL, > >>> }; > >>> > >>> @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct pag= e *page, struct list_head *list, > >>> XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, ne= w_order); > >>> struct anon_vma *anon_vma =3D NULL; > >>> struct address_space *mapping =3D NULL; > >>> - bool is_thp =3D folio_test_pmd_mappable(folio); > >>> + int order =3D folio_order(folio); > >>> int extra_pins, ret; > >>> pgoff_t end; > >>> bool is_hzp; > >>> @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct pa= ge *page, struct list_head *list, > >>> i_mmap_unlock_read(mapping); > >>> out: > >>> xas_destroy(&xas); > >>> - if (is_thp) > >>> + if (order >=3D HPAGE_PMD_ORDER) > >>> count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_= FAILED); > >>> + count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE : > >>> + MTHP_STAT_SPLIT_PAGE_FAILED); > >>> return ret; > >>> } > >>> > >>> @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio) > >>> if (list_empty(&folio->_deferred_list)) { > >>> if (folio_test_pmd_mappable(folio)) > >>> count_vm_event(THP_DEFERRED_SPLIT_PAGE); > >>> + count_mthp_stat(folio_order(folio), > >>> + MTHP_STAT_DEFERRED_SPLIT_PAGE); > >> > >> There is a very long conversation with Barry about adding a 'global "m= THP became > >> partially mapped 1 or more processes" counter (inc only)', which termi= nates at > >> [1]. There is a lot of discussion about the required semantics around = the need > >> for partial map to cover alignment and contiguity as well as whether a= ll pages > >> are mapped, and to trigger once it becomes partial in at least 1 proce= ss. > >> > >> MTHP_STAT_DEFERRED_SPLIT_PAGE is giving much simpler semantics, but le= ss > >> information as a result. Barry, what's your view here? I'm guessing th= is doesn't > >> quite solve what you are looking for? > > > > This doesn't quite solve what I am looking for but I still think the > > patch has its value. > > > > I'm looking for a solution that can: > > > > * Count the amount of memory in the system for each mTHP size. > > * Determine how much memory for each mTHP size is partially unmappe= d. > > > > For example, in a system with 16GB of memory, we might find that we hav= e 3GB > > of 64KB mTHP, and within that, 512MB is partially unmapped, potentially= wasting > > memory at this moment. I'm uncertain whether Lance is interested in > > this job :-) > > > > Counting deferred_split remains valuable as it can signal whether the s= ystem is > > experiencing significant partial unmapping. > > I'll note that, especially without subpage mapcounts, in the future we > won't have that information (how much is currently mapped) readily > available in all cases. To obtain that information on demand, we'd have > to scan page tables or walk the rmap. Thanks for pointing that out! > > Something to keep in mind: we don't want to introduce counters that will > be expensive to maintain longterm. I'll keep that in mind as we move forward with any new implementations. Thanks, Lance > > -- > Cheers, > > David / dhildenb >