From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 03A52C30653
	for <linux-mm@archiver.kernel.org>; Mon,  1 Jul 2024 11:43:34 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 5D38B6B0088; Mon,  1 Jul 2024 07:43:34 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 5834D6B008A; Mon,  1 Jul 2024 07:43:34 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 44AF16B008C; Mon,  1 Jul 2024 07:43:34 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 289076B0088
	for <linux-mm@kvack.org>; Mon,  1 Jul 2024 07:43:34 -0400 (EDT)
Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id A8911C1B97
	for <linux-mm@kvack.org>; Mon,  1 Jul 2024 11:43:33 +0000 (UTC)
X-FDA: 82290998706.09.B5F5C33
Received: from mail-ua1-f48.google.com (mail-ua1-f48.google.com [209.85.222.48])
	by imf22.hostedemail.com (Postfix) with ESMTP id 03D80C0013
	for <linux-mm@kvack.org>; Mon,  1 Jul 2024 11:43:30 +0000 (UTC)
Authentication-Results: imf22.hostedemail.com;
	dkim=none;
	dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none);
	spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=21cnbao@gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719834193; a=rsa-sha256;
	cv=none;
	b=yzDuC63Iev+IZM5QgrFJtRwJr97A3Gjlbd+HGGEOfXVHUNRN+ox9nQxY4dJP2YfswG6lO4
	sC2+OR762JvDJXHc6NVunFDegErh7F9RNlKQ/+L/Uogv2N0gyYzStJ83/0Tt48Sj3FtZIs
	EniyOURUjSDKT2GtKqeVK7Omb+unoe0=
ARC-Authentication-Results: i=1;
	imf22.hostedemail.com;
	dkim=none;
	dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none);
	spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=21cnbao@gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1719834193;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=e3G2470eEcFw6UmTCJCkRFBtzP0Jm2JNeBFsu5sbpgU=;
	b=No59j8Uhk1kQtRwCtBULoZOoE87T59uFprODd2B5KU9pLrKINtSVFPXAI1c/QDUf16uvki
	jw08xCeMXWUDB4PeSmirEcAbU9W51zlD6S2QWy3DDc66XAPHzK6SKmq2Y6B6QsCP57iq0y
	JZMvpWNrrUKGu5u5Knz05zj3Qj7ttgQ=
Received: by mail-ua1-f48.google.com with SMTP id a1e0cc1a2514c-80fc528a4ccso730919241.0
        for <linux-mm@kvack.org>; Mon, 01 Jul 2024 04:43:30 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1719834210; x=1720439010;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=e3G2470eEcFw6UmTCJCkRFBtzP0Jm2JNeBFsu5sbpgU=;
        b=OayV0JUJ2MznQGbRaXj7Dv+icA7aZ1Q+y3n73y1XacktH8s4jY1PKxma35xmz3Eeit
         4RuAPV3K/3DUHu241r5A5uIqSzxE3OrgbIVHWJwbeHSW+H3JoXhXghQOfEHxA3G8mD24
         fLq/s36tyLzowRPEIdf1feoCmgzaG5rwNBpwe448puIGH0tkF10+NViCB1MmQG51b7FQ
         j/XFZESzm7D2prp0VAx/xfp80qGJYxcSqx8LafBh8hbklVc/eKJbdHR6NveblYUKq15o
         PL83Au96+ETr8Q6tclRSXbAM19P7wBePKYAsBWzYFcjsK8US+8xShiyi4kK8OQGNWWfa
         ArQw==
X-Forwarded-Encrypted: i=1; AJvYcCUuHE6EzTMN/zUNV76dDKiLZbg5Tg25nLXlBeOiqqAsAjPIC/7AntbgkbLbhbtMwlhafEgtw7vVmyoyb/lwV6OvF9I=
X-Gm-Message-State: AOJu0YwsjWEzm3MahBYzAjMvqjWy/QDlR15kMKsYU01g2KN2t96a4wC7
	lefy3Q4csLti5lE9d7kYXW544Jw6ldnQfZwkAlXjcpJNS2ugs7hEKspucXoD2ifCvWWMpE3TGW5
	tFBR3lvRnbmheFmQsXjzTbiHGH6M=
X-Google-Smtp-Source: AGHT+IEDZS7gSYGlmjbUcIFIwK0quGafVLdoo7CxJsDY1cT+P4Ph9iT34Fl4Ad+wmGxvTxQfG4tdB/0AfRSRNpFvpHQ=
X-Received: by 2002:a05:6102:290b:b0:48f:a4ee:6d73 with SMTP id
 ada2fe7eead31-48faf09ba65mr6548706137.11.1719834209928; Mon, 01 Jul 2024
 04:43:29 -0700 (PDT)
MIME-Version: 1.0
References: <20240424135148.30422-1-ioworker0@gmail.com> <20240424135148.30422-2-ioworker0@gmail.com>
 <a0f57d90-a556-4b19-a925-a82a81fbb067@arm.com> <CAGsJ_4xSKWXGY9TPS_kvhp7FALH16cyVnZu5FkHy3nN_hsZ_kQ@mail.gmail.com>
 <23d9f708-b1fd-4b10-b755-b7ef6aa683e8@redhat.com>
In-Reply-To: <23d9f708-b1fd-4b10-b755-b7ef6aa683e8@redhat.com>
From: Barry Song <baohua@kernel.org>
Date: Mon, 1 Jul 2024 23:43:18 +1200
Message-ID: <CAGsJ_4zz9KKpz51hgmLEv0v=rh1niB1DWqeEPrRrgRVO_0o+-A@mail.gmail.com>
Subject: Re: [PATCH 1/2] mm: add per-order mTHP split counters
To: David Hildenbrand <david@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>, Lance Yang <ioworker0@gmail.com>, akpm@linux-foundation.org, 
	baolin.wang@linux.alibaba.com, linux-kernel@vger.kernel.org, 
	linux-mm@kvack.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam12
X-Rspamd-Queue-Id: 03D80C0013
X-Stat-Signature: es1nck7j58o4pbkcreaondjc8xjdrpcc
X-Rspam-User: 
X-HE-Tag: 1719834210-117274
X-HE-Meta: U2FsdGVkX1+/9PEWb9khOrW5rWHaIYw4UhujeLvfzKKVcv3Oc8540QSd51cgN1ZpzsXwr9pqx8QSTnvZWqG6UhD7sRBwZTkdISim759PIFyFyBqNfj3KUJIMPI6VAY/J5hnzak/k3QBcSH9KVJOJ0FuJOLrhQmEkOo0MksEjqqThoadB1mH7FCGCVSO6J+Hz01eut5ef/cRzU1ky7uGLX+Vv0XgML4+gplarQZYloPAdkotFK5pSzZBUByy7PvGk/6EWpZzzgMkaiCbBc2ZbpGBXxpVrRoY3vXNB8h9dA7LF7fARMHo1NxLn5uYSfH6qiz9vMPmFsXqDafm1SnxTMIHMEdh87faK1JCkNNMjHHrpSENC6A0/buBjinw+y5iH1KqQQ0q3v/rhtDTYgDoYi+XIWiW6y2X3NBTJw0w3Ofg8UbxZSTmJXI2C+Np9rzSGNqZggeXXHbaBe2JQ83jXS2Bt8J+qj/dyfbXBlu2POSE7QRGIdjDJCTpXGzwoWcmMvStOr2wHdb2lf8iSBtxO6e3V6G9OOF7s5LYWpWO103nOX4nw+3dVlwDplI+CL/uUndef8fec91uuD/DDLKmMefBbvbDcBnPy2xCVlcl5ryf8H452VNdX8RjR7nxzAUWqojFk+eicql1JUENV1YHrydMxk1pagORRY+zDK7qvAs7V/WZRQkbQ7OMmFpDqMTfwwyCf13MlKzmcOWurQD/GcM0KsZYfTDyzHcZl0aOddwv1sErVe3eIM4QrVERrxQUKZeIyk8Sig1UTvlnbV5TA9pmd9qKI2QtlJ/PlmQA9voeCOpSVKIkTw0hVpp/BUtcagvHt151COmYCiTQ2/Vd7nYr5i1+4YW8t9qVkG3cF6ye/ZY4mDuHmOVbEiCwWgXUK1kYs2+oe8KFzrnYYoi8ZgnNdu2HjtcPOSyMSnrYFl7mP6VFybMAncsjFp4JmXPZolDxqg/Oq0qhDXhXaa8e
 bOqGnq9u
 unBnIkG2WSMXZpyQrRHBGy+8k71CrhbdIbGoJjl4T3veG1uIzc2Fh8dEtSXbtyM/BQ3hMTSsxfu2orI8uMaPXf7zaMl5bsdnJiU5uZ3Te47QYcyXC2r0iBFAJYz38hOlW0smavtz8IReX7XEBCZjGSxRa908H3OexMpi2jYEUqrWDdIkBY1IImzgTeOhM+YwXAzcL4yk9ZOBBTT0lBVFt4VBEhwwmDhxcqXDE8h1pqxPDIZYFGsTOTo7OtGbnZGZSVPyq2L8rRowZ98zMZ0ojhQGLaxlz1Jk5A33sPX6+v+p2k7cMzjap13ijz3z7COKv9vrg7ogaIXic4N7Upi5WTCaNX8qnLJKW0aw1bQnNf1P/UscZJ50DuLPJKzSNaTHCfClW
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Mon, Jul 1, 2024 at 8:56=E2=80=AFPM David Hildenbrand <david@redhat.com>=
 wrote:
>
> On 30.06.24 11:48, Barry Song wrote:
> > On Thu, Apr 25, 2024 at 3:41=E2=80=AFAM Ryan Roberts <ryan.roberts@arm.=
com> wrote:
> >>
> >> + Barry
> >>
> >> On 24/04/2024 14:51, Lance Yang wrote:
> >>> At present, the split counters in THP statistics no longer include
> >>> PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP spl=
it
> >>> counters to monitor the frequency of mTHP splits. This will assist
> >>> developers in better analyzing and optimizing system performance.
> >>>
> >>> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> >>>          split_page
> >>>          split_page_failed
> >>>          deferred_split_page
> >>>
> >>> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> >>> ---
> >>>   include/linux/huge_mm.h |  3 +++
> >>>   mm/huge_memory.c        | 14 ++++++++++++--
> >>>   2 files changed, 15 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >>> index 56c7ea73090b..7b9c6590e1f7 100644
> >>> --- a/include/linux/huge_mm.h
> >>> +++ b/include/linux/huge_mm.h
> >>> @@ -272,6 +272,9 @@ enum mthp_stat_item {
> >>>        MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
> >>>        MTHP_STAT_ANON_SWPOUT,
> >>>        MTHP_STAT_ANON_SWPOUT_FALLBACK,
> >>> +     MTHP_STAT_SPLIT_PAGE,
> >>> +     MTHP_STAT_SPLIT_PAGE_FAILED,
> >>> +     MTHP_STAT_DEFERRED_SPLIT_PAGE,
> >>>        __MTHP_STAT_COUNT
> >>>   };
> >>>
> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>> index 055df5aac7c3..52db888e47a6 100644
> >>> --- a/mm/huge_memory.c
> >>> +++ b/mm/huge_memory.c
> >>> @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_S=
TAT_ANON_FAULT_FALLBACK);
> >>>   DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FA=
ULT_FALLBACK_CHARGE);
> >>>   DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
> >>>   DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_F=
ALLBACK);
> >>> +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE);
> >>> +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED=
);
> >>> +DEFINE_MTHP_STAT_ATTR(deferred_split_page, MTHP_STAT_DEFERRED_SPLIT_=
PAGE);
> >>>
> >>>   static struct attribute *stats_attrs[] =3D {
> >>>        &anon_fault_alloc_attr.attr,
> >>> @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] =3D {
> >>>        &anon_fault_fallback_charge_attr.attr,
> >>>        &anon_swpout_attr.attr,
> >>>        &anon_swpout_fallback_attr.attr,
> >>> +     &split_page_attr.attr,
> >>> +     &split_page_failed_attr.attr,
> >>> +     &deferred_split_page_attr.attr,
> >>>        NULL,
> >>>   };
> >>>
> >>> @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct pag=
e *page, struct list_head *list,
> >>>        XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, ne=
w_order);
> >>>        struct anon_vma *anon_vma =3D NULL;
> >>>        struct address_space *mapping =3D NULL;
> >>> -     bool is_thp =3D folio_test_pmd_mappable(folio);
> >>> +     int order =3D folio_order(folio);
> >>>        int extra_pins, ret;
> >>>        pgoff_t end;
> >>>        bool is_hzp;
> >>> @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct pa=
ge *page, struct list_head *list,
> >>>                i_mmap_unlock_read(mapping);
> >>>   out:
> >>>        xas_destroy(&xas);
> >>> -     if (is_thp)
> >>> +     if (order >=3D HPAGE_PMD_ORDER)
> >>>                count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_=
FAILED);
> >>> +     count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE :
> >>> +                                   MTHP_STAT_SPLIT_PAGE_FAILED);
> >>>        return ret;
> >>>   }
> >>>
> >>> @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio)
> >>>        if (list_empty(&folio->_deferred_list)) {
> >>>                if (folio_test_pmd_mappable(folio))
> >>>                        count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> >>> +             count_mthp_stat(folio_order(folio),
> >>> +                             MTHP_STAT_DEFERRED_SPLIT_PAGE);
> >>
> >> There is a very long conversation with Barry about adding a 'global "m=
THP became
> >> partially mapped 1 or more processes" counter (inc only)', which termi=
nates at
> >> [1]. There is a lot of discussion about the required semantics around =
the need
> >> for partial map to cover alignment and contiguity as well as whether a=
ll pages
> >> are mapped, and to trigger once it becomes partial in at least 1 proce=
ss.
> >>
> >> MTHP_STAT_DEFERRED_SPLIT_PAGE is giving much simpler semantics, but le=
ss
> >> information as a result. Barry, what's your view here? I'm guessing th=
is doesn't
> >> quite solve what you are looking for?
> >
> > This doesn't quite solve what I am looking for but I still think the
> > patch has its value.
> >
> > I'm looking for a solution that can:
> >
> >    *  Count the amount of memory in the system for each mTHP size.
> >    *  Determine how much memory for each mTHP size is partially unmappe=
d.
> >
> > For example, in a system with 16GB of memory, we might find that we hav=
e 3GB
> > of 64KB mTHP, and within that, 512MB is partially unmapped, potentially=
 wasting
> > memory at this moment.  I'm uncertain whether Lance is interested in
> > this job :-)
> >
> > Counting deferred_split remains valuable as it can signal whether the s=
ystem is
> > experiencing significant partial unmapping.
>
> I'll note that, especially without subpage mapcounts, in the future we
> won't have that information (how much is currently mapped) readily
> available in all cases. To obtain that information on demand, we'd have
> to scan page tables or walk the rmap.

I'd like to keep things simple. We can ignore the details about how
the folio is partially
unmapped. For example, whether 15 out of 16 subpages are unmapped or just 1=
 is
unmapped doesn't matter. When we add a folio to the deferred_list, we
increase the
count by 1. When we remove a folio from the deferred_list (for any
reason, such as
a real split), we decrease the count by 1.

If we find that (partial_unmap mTHP * 100) / all mTHP for this size
shows a large
number, it is a strong indication that userspace needs some tuning.

>
> Something to keep in mind: we don't want to introduce counters that will
> be expensive to maintain longterm.
>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry