From: Barry Song <21cnbao@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
baolin.wang@linux.alibaba.com, chrisl@kernel.org,
hanchuanhua@oppo.com, ioworker0@gmail.com,
kaleshsingh@google.com, kasong@tencent.com,
linux-kernel@vger.kernel.org, ryan.roberts@arm.com,
v-songbaohua@oppo.com, ziy@nvidia.com, yuanshuai@oppo.com
Subject: Re: [PATCH v2 1/2] mm: collect the number of anon large folios
Date: Thu, 22 Aug 2024 22:12:35 +1200 [thread overview]
Message-ID: <CAGsJ_4xjvid+JTbDwJJ9P4PkC2t6SWxyOkLq30-gbicNM6BSLw@mail.gmail.com> (raw)
In-Reply-To: <a7d537df-7899-42fc-b9ef-66733105abbe@redhat.com>
On Thu, Aug 22, 2024 at 10:01 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 22.08.24 11:21, Barry Song wrote:
> > On Thu, Aug 22, 2024 at 8:59 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 22.08.24 10:44, Barry Song wrote:
> >>> On Thu, Aug 22, 2024 at 12:52 PM Barry Song <21cnbao@gmail.com> wrote:
> >>>>
> >>>> On Thu, Aug 22, 2024 at 5:34 AM David Hildenbrand <david@redhat.com> wrote:
> >>>>>
> >>>>> On 12.08.24 00:49, Barry Song wrote:
> >>>>>> From: Barry Song <v-songbaohua@oppo.com>
> >>>>>>
> >>>>>> Anon large folios come from three places:
> >>>>>> 1. new allocated large folios in PF, they will call folio_add_new_anon_rmap()
> >>>>>> for rmap;
> >>>>>> 2. a large folio is split into multiple lower-order large folios;
> >>>>>> 3. a large folio is migrated to a new large folio.
> >>>>>>
> >>>>>> In all above three counts, we increase nr_anon by 1;
> >>>>>>
> >>>>>> Anon large folios might go either because of be split or be put
> >>>>>> to free, in these cases, we reduce the count by 1.
> >>>>>>
> >>>>>> Folios that have been added to the swap cache but have not yet received
> >>>>>> an anon mapping won't be counted. This is consistent with the AnonPages
> >>>>>> statistics in /proc/meminfo.
> >>>>>
> >>>>> Thinking out loud, I wonder if we want to have something like that for
> >>>>> any anon folios (including small ones).
> >>>>>
> >>>>> Assume we longterm-pinned an anon folio and unmapped/zapped it. It would
> >>>>> be quite interesting to see that these are actually anon pages still
> >>>>> consuming memory. Same with memory leaks, when an anon folio doesn't get
> >>>>> freed for some reason.
> >>>>>
> >>>>> The whole "AnonPages" counter thingy is just confusing, it only counts
> >>>>> what's currently mapped ... so we'd want something different.
> >>>>>
> >>>>> But it's okay to start with large folios only, there we have a new
> >>>>> interface without that legacy stuff :)
> >>>>
> >>>> We have two options to do this:
> >>>> 1. add a new separate nr_anon_unmapped interface which
> >>>> counts unmapped anon memory only
> >>>> 2. let the nr_anon count both mapped and unmapped anon
> >>>> folios.
> >>>>
> >>>> I would assume 1 is clearer as right now AnonPages have been
> >>>> there for years. and counting all mapped and unmapped together,
> >>>> we are still lacking an approach to find out anon memory leak
> >>>> problem you mentioned.
> >>>>
> >>>> We are right now comparing nr_anon(including mapped folios only)
> >>>> with AnonPages to get the distribution of different folio sizes in
> >>>> performance profiling.
> >>>>
> >>>> unmapped_nr_anon should be normally always quite small. otherwise,
> >>>> something must be wrong.
> >>>>
> >>>>>
> >>>>>>
> >>>>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> >>>>>> ---
> >>>>>> Documentation/admin-guide/mm/transhuge.rst | 5 +++++
> >>>>>> include/linux/huge_mm.h | 15 +++++++++++++--
> >>>>>> mm/huge_memory.c | 13 ++++++++++---
> >>>>>> mm/migrate.c | 4 ++++
> >>>>>> mm/page_alloc.c | 5 ++++-
> >>>>>> mm/rmap.c | 1 +
> >>>>>> 6 files changed, 37 insertions(+), 6 deletions(-)
> >>>>>>
> >>>>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> >>>>>> index 058485daf186..9fdfb46e4560 100644
> >>>>>> --- a/Documentation/admin-guide/mm/transhuge.rst
> >>>>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
> >>>>>> @@ -527,6 +527,11 @@ split_deferred
> >>>>>> it would free up some memory. Pages on split queue are going to
> >>>>>> be split under memory pressure, if splitting is possible.
> >>>>>>
> >>>>>> +nr_anon
> >>>>>> + the number of anon huge pages we have in the whole system.
> >>>>>
> >>>>> "transparent ..." otherwise people might confuse it with anon hugetlb
> >>>>> "huge pages" ... :)
> >>>>>
> >>>>> I briefly tried coming up with a better name than "nr_anon" but failed.
> >>>>>
> >>>>>
> >>>>
> >>>> if we might have unmapped_anon counter later, maybe rename it to
> >>>> nr_anon_mapped? and the new interface we will have in the future
> >>>> might be nr_anon_unmapped?
> >>
> >> We really shouldn't be using the mapped/unmapped terminology here ... we
> >> allocated pages and turned them into anonymous folios. At some point we
> >> free them. That's the lifecycle.
> >>
> >>>
> >>> On second thought, this might be incorrect as well. Concepts like 'anon',
> >>> 'shmem', and 'file' refer to states after mapping. If an 'anon' has been
> >>> unmapped but is still pinned and not yet freed, it isn't technically an
> >>> 'anon' anymore?
> >>
> >> It's just not mapped, and cannot get mapped, anymore. In the memdesc
> >> world, we'd be freeing the "struct anon" or "struct folio" once the last
> >> refcount goes to 0, not once (e.g., temporarily during a failed
> >> migration?) unmapped.
> >>
> >> The important part to me would be: this is memory that was allocated for
> >> anonymous memory, and it's still around for some reason and not getting
> >> freed. Usually, we would expect anon memory to get freed fairly quickly
> >> once unmapped. Except when there are long-term pinnings or other types
> >> of memory leaks.
> >>
> >> You could happily continue using these anon pages via vmsplice() or
> >> similar, even thought he original page table mapping was torn down.
> >>
> >>>
> >>> On the other hand, implementing nr_anon_unmapped could be extremely
> >>> tricky. I have no idea how to implement it as we are losing those mapping
> >>> flags.
> >>
> >> folio_mapcount() can tell you efficiently whether a folio is mapped or
> >> not -- and that information will stay for all eternity as long as we
> >> have any mapcounts :) . It cannot tell "how many" pages of a large folio
> >> are mapped, but at least "is any page of this large folio mapped".
> >
> > Exactly. AnonPages decreases by -1 when removed from the rmap,
> > whereas nr_anon decreases by -1 when an anon folio is freed. So,
> > I would assume nr_anon includes those pinned and unmapped anon
> > folios but AnonPages doesn't.
>
> Right, note how internally it is called "NR_ANON_MAPPED", but we ended
> up calling it "AnonPages". But that's rather a legacy interface we
> cannot change (fix) that easily. At least not without a config option.
>
> At some point it might indeed be interesting to have "nr_anon_mapped",
> here, but that would correspond to "is any part of this large folio
> mapped". For debugging purposes in the future, that might be indeed
> interesting.
>
> "nr_anon": anon allocations (until freed -> +1)
> "nr_anon_mapped": anon allocations that are mapped (any part mapped -> +1)
> "nr_anon_partially_mapped": anon allocations that was detected to be
> partially mapped at some point -> +1
>
> If a folio is in the swapcache, I would still want to see that it is an
> anon allocation lurking around in the system. Like we do with pagecache
> pages. *There* we do have the difference between "allocated" and
> "mapped" already.
>
> So likely, calling it "nr_anon" here, and tracking it on an allocation
> level, is good enough for now and future proof.
Right. I plan to send v3 tomorrow to at least unblock Usama's series,
in case he wants to rebase on top of it.
>
> >
> > If there's a significant amount of 'leaked' anon, we should consider
> > having a separate counter for them. For instance, if nr_anon is
> > 100,000 and pinned/unmapped pages account for 50%, then nr_anon
> > alone doesn’t effectively reflect the system's state.
>
> Right, but if you stare at the system you could tell that a significant
> amount of memory is still getting consumed through existing/previous
> anon mappings. Depends on how valuable that distinction really is.
>
> --
> Cheers,
>
> David / dhildenb
>
Thanks
Barry
next prev parent reply other threads:[~2024-08-22 10:12 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-11 22:49 [PATCH v2 0/2] mm: collect the number of anon mTHP Barry Song
2024-08-11 22:49 ` [PATCH v2 1/2] mm: collect the number of anon large folios Barry Song
2024-08-21 21:34 ` David Hildenbrand
2024-08-22 0:52 ` Barry Song
2024-08-22 8:44 ` Barry Song
2024-08-22 8:59 ` David Hildenbrand
2024-08-22 9:21 ` Barry Song
2024-08-22 10:01 ` David Hildenbrand
2024-08-22 10:12 ` Barry Song [this message]
2024-08-11 22:49 ` [PATCH v2 2/2] mm: collect the number of anon large folios on split_deferred list Barry Song
2024-08-21 21:39 ` David Hildenbrand
2024-08-21 22:01 ` Barry Song
2024-08-21 22:10 ` David Hildenbrand
2024-08-18 7:58 ` [PATCH v2 0/2] mm: collect the number of anon mTHP Barry Song
2024-08-19 2:44 ` Usama Arif
2024-08-19 8:28 ` David Hildenbrand
2024-08-19 8:33 ` Barry Song
2024-08-19 8:52 ` Barry Song
2024-08-19 14:22 ` Usama Arif
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGsJ_4xjvid+JTbDwJJ9P4PkC2t6SWxyOkLq30-gbicNM6BSLw@mail.gmail.com \
--to=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hanchuanhua@oppo.com \
--cc=ioworker0@gmail.com \
--cc=kaleshsingh@google.com \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryan.roberts@arm.com \
--cc=v-songbaohua@oppo.com \
--cc=yuanshuai@oppo.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox