Re: [PATCH v2 1/2] mm: collect the number of anon large folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Barry Song <21cnbao@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	 baolin.wang@linux.alibaba.com, chrisl@kernel.org,
	hanchuanhua@oppo.com,  ioworker0@gmail.com,
	kaleshsingh@google.com, kasong@tencent.com,
	 linux-kernel@vger.kernel.org, ryan.roberts@arm.com,
	v-songbaohua@oppo.com,  ziy@nvidia.com, yuanshuai@oppo.com
Subject: Re: [PATCH v2 1/2] mm: collect the number of anon large folios
Date: Thu, 22 Aug 2024 21:21:06 +1200	[thread overview]
Message-ID: <CAGsJ_4wyj7U2z_XnbgEsavEkpkNO8=kVfMesxhtNbcQ=H3dzXw@mail.gmail.com> (raw)
In-Reply-To: <954f80c8-5bc3-44f5-a361-32073cbbd764@redhat.com>

On Thu, Aug 22, 2024 at 8:59 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 22.08.24 10:44, Barry Song wrote:
> > On Thu, Aug 22, 2024 at 12:52 PM Barry Song <21cnbao@gmail.com> wrote:
> >>
> >> On Thu, Aug 22, 2024 at 5:34 AM David Hildenbrand <david@redhat.com> wrote:
> >>>
> >>> On 12.08.24 00:49, Barry Song wrote:
> >>>> From: Barry Song <v-songbaohua@oppo.com>
> >>>>
> >>>> Anon large folios come from three places:
> >>>> 1. new allocated large folios in PF, they will call folio_add_new_anon_rmap()
> >>>> for rmap;
> >>>> 2. a large folio is split into multiple lower-order large folios;
> >>>> 3. a large folio is migrated to a new large folio.
> >>>>
> >>>> In all above three counts, we increase nr_anon by 1;
> >>>>
> >>>> Anon large folios might go either because of be split or be put
> >>>> to free, in these cases, we reduce the count by 1.
> >>>>
> >>>> Folios that have been added to the swap cache but have not yet received
> >>>> an anon mapping won't be counted. This is consistent with the AnonPages
> >>>> statistics in /proc/meminfo.
> >>>
> >>> Thinking out loud, I wonder if we want to have something like that for
> >>> any anon folios (including small ones).
> >>>
> >>> Assume we longterm-pinned an anon folio and unmapped/zapped it. It would
> >>> be quite interesting to see that these are actually anon pages still
> >>> consuming memory. Same with memory leaks, when an anon folio doesn't get
> >>> freed for some reason.
> >>>
> >>> The whole "AnonPages" counter thingy is just confusing, it only counts
> >>> what's currently mapped ... so we'd want something different.
> >>>
> >>> But it's okay to start with large folios only, there we have a new
> >>> interface without that legacy stuff :)
> >>
> >> We have two options to do this:
> >> 1. add a new separate nr_anon_unmapped interface which
> >> counts unmapped anon memory only
> >> 2. let the nr_anon count both mapped and unmapped anon
> >> folios.
> >>
> >> I would assume 1 is clearer as right now AnonPages have been
> >> there for years. and counting all mapped and unmapped together,
> >> we are still lacking an approach to find out anon memory leak
> >> problem you mentioned.
> >>
> >> We are right now comparing nr_anon(including mapped folios only)
> >> with AnonPages to get the distribution of different folio sizes in
> >> performance profiling.
> >>
> >> unmapped_nr_anon should be normally always quite small. otherwise,
> >> something must be wrong.
> >>
> >>>
> >>>>
> >>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> >>>> ---
> >>>>    Documentation/admin-guide/mm/transhuge.rst |  5 +++++
> >>>>    include/linux/huge_mm.h                    | 15 +++++++++++++--
> >>>>    mm/huge_memory.c                           | 13 ++++++++++---
> >>>>    mm/migrate.c                               |  4 ++++
> >>>>    mm/page_alloc.c                            |  5 ++++-
> >>>>    mm/rmap.c                                  |  1 +
> >>>>    6 files changed, 37 insertions(+), 6 deletions(-)
> >>>>
> >>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> >>>> index 058485daf186..9fdfb46e4560 100644
> >>>> --- a/Documentation/admin-guide/mm/transhuge.rst
> >>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
> >>>> @@ -527,6 +527,11 @@ split_deferred
> >>>>            it would free up some memory. Pages on split queue are going to
> >>>>            be split under memory pressure, if splitting is possible.
> >>>>
> >>>> +nr_anon
> >>>> +       the number of anon huge pages we have in the whole system.
> >>>
> >>> "transparent ..." otherwise people might confuse it with anon hugetlb
> >>> "huge pages" ... :)
> >>>
> >>> I briefly tried coming up with a better name than "nr_anon" but failed.
> >>>
> >>>
> >>
> >> if we might have unmapped_anon counter later, maybe rename it to
> >> nr_anon_mapped? and the new interface we will have in the future
> >> might be nr_anon_unmapped?
>
> We really shouldn't be using the mapped/unmapped terminology here ... we
> allocated pages and turned them into anonymous folios. At some point we
> free them. That's the lifecycle.
>
> >
> > On second thought, this might be incorrect as well. Concepts like 'anon',
> > 'shmem', and 'file' refer to states after mapping. If an 'anon' has been
> > unmapped but is still pinned and not yet freed, it isn't technically an
> > 'anon' anymore?
>
> It's just not mapped, and cannot get mapped, anymore. In the memdesc
> world, we'd be freeing the "struct anon" or "struct folio" once the last
> refcount goes to 0, not once (e.g., temporarily during a failed
> migration?) unmapped.
>
> The important part to me would be: this is memory that was allocated for
> anonymous memory, and it's still around for some reason and not getting
> freed. Usually, we would expect anon memory to get freed fairly quickly
> once unmapped. Except when there are long-term pinnings or other types
> of memory leaks.
>
> You could happily continue using these anon pages via vmsplice() or
> similar, even thought he original page table mapping was torn down.
>
> >
> > On the other hand, implementing nr_anon_unmapped could be extremely
> > tricky. I have no idea how to implement it as we are losing those mapping
> > flags.
>
> folio_mapcount() can tell you efficiently whether a folio is mapped or
> not -- and that information will stay for all eternity as long as we
> have any mapcounts :) . It cannot tell "how many" pages of a large folio
> are mapped, but at least "is any page of this large folio mapped".

Exactly. AnonPages decreases by -1 when removed from the rmap,
whereas nr_anon decreases by -1 when an anon folio is freed. So,
I would assume nr_anon includes those pinned and unmapped anon
folios but AnonPages doesn't.

If there's a significant amount of 'leaked' anon, we should consider
having a separate counter for them. For instance, if nr_anon is
100,000 and pinned/unmapped pages account for 50%, then nr_anon
alone doesn’t effectively reflect the system's state.

to implement that, it seems we do need to detect the moment mapcount==0
and the moment of freeing anon?

when mapcount==0 in rmap
      unmapped_pinned_anon++;

when free
      unmapped_pinned_anon--;

Anyway, it seems this is a separate job.

>
> >
> > However, a page that is read-ahead but not yet mapped can still become
> > an anon, which seems slightly less tricky to count though seems still
> > difficult - except anon pages, shmem can be also swapped-backed?
>
> Yes. I'm sure there would be ways to achieve it, but I am not sure if
> it's worth the churn. These pages can be reclaimed easily (I would
> assume? They are not even mapped and were never accessible via GUP), and
> can certainly not have any longterm pinnings or similar. There are more
> like "cached things that could become an anon folio".

Exactly. If no one maps the pages for an extended period, I assume
the LRU will reclaim them as well.

>
> --
> Cheers,
>
> David / dhildenb

Thanks
Barry

next prev parent reply	other threads:[~2024-08-22  9:21 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-11 22:49 [PATCH v2 0/2] mm: collect the number of anon mTHP Barry Song
2024-08-11 22:49 ` [PATCH v2 1/2] mm: collect the number of anon large folios Barry Song
2024-08-21 21:34   ` David Hildenbrand
2024-08-22  0:52     ` Barry Song
2024-08-22  8:44       ` Barry Song
2024-08-22  8:59         ` David Hildenbrand
2024-08-22  9:21           ` Barry Song [this message]
2024-08-22 10:01             ` David Hildenbrand
2024-08-22 10:12               ` Barry Song
2024-08-11 22:49 ` [PATCH v2 2/2] mm: collect the number of anon large folios on split_deferred list Barry Song
2024-08-21 21:39   ` David Hildenbrand
2024-08-21 22:01     ` Barry Song
2024-08-21 22:10       ` David Hildenbrand
2024-08-18  7:58 ` [PATCH v2 0/2] mm: collect the number of anon mTHP Barry Song
2024-08-19  2:44   ` Usama Arif
2024-08-19  8:28   ` David Hildenbrand
2024-08-19  8:33     ` Barry Song
2024-08-19  8:52       ` Barry Song
2024-08-19 14:22         ` Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGsJ_4wyj7U2z_XnbgEsavEkpkNO8=kVfMesxhtNbcQ=H3dzXw@mail.gmail.com' \
    --to=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hanchuanhua@oppo.com \
    --cc=ioworker0@gmail.com \
    --cc=kaleshsingh@google.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=v-songbaohua@oppo.com \
    --cc=yuanshuai@oppo.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox