From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 21853C3DA4A
	for <linux-mm@archiver.kernel.org>; Thu, 22 Aug 2024 10:12:51 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 903B46B01F9; Thu, 22 Aug 2024 06:12:50 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 88A376B01FE; Thu, 22 Aug 2024 06:12:50 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 703D86B0202; Thu, 22 Aug 2024 06:12:50 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 4D6166B01F9
	for <linux-mm@kvack.org>; Thu, 22 Aug 2024 06:12:50 -0400 (EDT)
Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id E0672C0551
	for <linux-mm@kvack.org>; Thu, 22 Aug 2024 10:12:49 +0000 (UTC)
X-FDA: 82479467658.24.546E9CA
Received: from mail-vk1-f170.google.com (mail-vk1-f170.google.com [209.85.221.170])
	by imf24.hostedemail.com (Postfix) with ESMTP id 185D2180026
	for <linux-mm@kvack.org>; Thu, 22 Aug 2024 10:12:47 +0000 (UTC)
Authentication-Results: imf24.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b="bN/LyzAq";
	spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1724321486;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=AmLdGfPjTGzFQMk4/RbDajyGMJHiO3CYEilHtnJ6xfY=;
	b=VY/hLHSPgEALR1aEyHLAENgfa3VaEBqTAJ7WJ7o30VndJmezAFmrdlvmES8+MECBOY0ow0
	wcnu60nvaLyV4FVcFIg7t4rLtJLPJntsL5Erq+z43eV5+x1pdDbGRGVtGJzd7yJXd3c7xv
	wQxUxjrKcJPU3JHKtw5k9MIcx1D99+8=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724321486; a=rsa-sha256;
	cv=none;
	b=aMFJxE5iR2kuTNaEAP+5xSIjOf58uEdaL1rMCuOE1liihrv1+e3NtIkeH+mPuvRa8VdLj5
	cdyVSMeaxsIjaAdHLRgRPjSlxw9ew4jtd7wxEzHlGsSmz719+7AKBvaSp+BH7G/KOPwbOg
	WS1TdEBMF9qXAcEU1rgw/c6TzvfRaTk=
ARC-Authentication-Results: i=1;
	imf24.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b="bN/LyzAq";
	spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
Received: by mail-vk1-f170.google.com with SMTP id 71dfb90a1353d-4fcefbd6bc4so294436e0c.1
        for <linux-mm@kvack.org>; Thu, 22 Aug 2024 03:12:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1724321567; x=1724926367; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=AmLdGfPjTGzFQMk4/RbDajyGMJHiO3CYEilHtnJ6xfY=;
        b=bN/LyzAqUUQqd0BqHNFQGzcSp+jr9Xe6AqiTlkNt1jTV09yWBs6VE60V9mvkBypG3i
         pE4Yo7/+clyksk74rXlf6XPISYHVQdy1B22QE0mA4drt5a0aWLBOjNjtHUnujrW6xU+h
         sHWrjMdw7eMauhOVJPufOOLf26oc6dSvemaAw9AqG5KEcPwBB+LT6ac9moO7USpWGmsm
         VN7IWGGHCD2VcKJM+lDRSxZ+VFUWTilBcTN01CPDbk0m8+Qrst3cLGygQ66O2DuWgoyh
         U1rkzZbTqsaOdtLDJ0M6kE0m35+LTEdzlFO61CYnTj93jzrWJ+LpnBPNt8tCWShCOeJj
         +THQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1724321567; x=1724926367;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=AmLdGfPjTGzFQMk4/RbDajyGMJHiO3CYEilHtnJ6xfY=;
        b=XK8zuy+UpY70n8acgTOg5t26uUwlA1nSmktPbqufeMglzltGaobzhX5lDm7nRc/YAc
         gAml3SLsoGku6MexhP0QLCkdlqxyZlhcSWUYLadMtMvYVwRYGzxXL4GfGYfwBHgx+LOy
         KG+bXbJv3Xf4WOu/h/OduJNpriKDSJtrDy5WlrcY6ftTSUaGXrRmPaKGmg09zirUzPP4
         OMSRQBNGGQL7Tv+/xv51Mfp8u7QtDUkOuP9eJp7RcbdG+ka9BNeXxcsBZ+p/CWEFIZh/
         U0FEDQ5/Xc88uLGrMMybgu8FZThgKvs1uwOm4tjWIbQkx4zLPQImfG6/VHOiiyUi20zy
         s/Pw==
X-Forwarded-Encrypted: i=1; AJvYcCWMt738BhYt4Iy8l4kKTVXefh63CA/VNEEIsXGUZpYbiemyTy319vCWADajD7TBfOtY6bHQ5vma8g==@kvack.org
X-Gm-Message-State: AOJu0Yxvp+xD30+LcyzY05WkNOPmHFBGQUQVBbpSzXrJKE7iemHjjic+
	1nj1WaWMbgGbNON0TxobrHhBFcfoRm+EGLBfdAeRCNrGB1EFdZrrKJCRhbTXRLft9XFcdOKZRf7
	IeZ3EeqDrC+Afhs5gVNtCjlXvinw=
X-Google-Smtp-Source: AGHT+IEDrAQ5iKq70g8o15r9pKALcTZDZE54C4ReJhnkom1WT14GekqrJWsI11Dc2tvFyJcqeb/fR/oG9++UQd8JoQE=
X-Received: by 2002:a05:6122:a05:b0:4f5:2276:136d with SMTP id
 71dfb90a1353d-4fcf1977d47mr6139028e0c.1.1724321567042; Thu, 22 Aug 2024
 03:12:47 -0700 (PDT)
MIME-Version: 1.0
References: <20240811224940.39876-1-21cnbao@gmail.com> <20240811224940.39876-2-21cnbao@gmail.com>
 <3572ae2e-2141-4a70-99da-850b2e7ade41@redhat.com> <CAGsJ_4w9gg=z6KgAZ4Tur+t-ZhpXdvmq4A5tOQiUZLeuPFSupg@mail.gmail.com>
 <CAGsJ_4yqf4KNvsg1P47cAz+bniZFVcUWPkdjYTqji91CgnrrfQ@mail.gmail.com>
 <954f80c8-5bc3-44f5-a361-32073cbbd764@redhat.com> <CAGsJ_4wyj7U2z_XnbgEsavEkpkNO8=kVfMesxhtNbcQ=H3dzXw@mail.gmail.com>
 <a7d537df-7899-42fc-b9ef-66733105abbe@redhat.com>
In-Reply-To: <a7d537df-7899-42fc-b9ef-66733105abbe@redhat.com>
From: Barry Song <21cnbao@gmail.com>
Date: Thu, 22 Aug 2024 22:12:35 +1200
Message-ID: <CAGsJ_4xjvid+JTbDwJJ9P4PkC2t6SWxyOkLq30-gbicNM6BSLw@mail.gmail.com>
Subject: Re: [PATCH v2 1/2] mm: collect the number of anon large folios
To: David Hildenbrand <david@redhat.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org, 
	baolin.wang@linux.alibaba.com, chrisl@kernel.org, hanchuanhua@oppo.com, 
	ioworker0@gmail.com, kaleshsingh@google.com, kasong@tencent.com, 
	linux-kernel@vger.kernel.org, ryan.roberts@arm.com, v-songbaohua@oppo.com, 
	ziy@nvidia.com, yuanshuai@oppo.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Stat-Signature: 3gr7hypf7h5go1zzxukni8e5mbauityg
X-Rspamd-Queue-Id: 185D2180026
X-Rspam-User: 
X-Rspamd-Server: rspam08
X-HE-Tag: 1724321567-236013
X-HE-Meta: U2FsdGVkX1/GBCTJpKqr4Tusr1WYbSN6vUO0FYK/VnevNesmR/z6EYTnhnJSliiIAlH9o/XZo2xkZSgIw2HUFqmrXdNCwzjfCzNjPV56wk+7wZ1PSoVIW23X6DYQ+CinKqZA5dZm+vFVMz5Pl8NqxhbcHNCYgzu6xqOAIaITK14xgcOefosfOs7uqZv+BcZ+62WbV7uM2/7LhHHDI1D/oC2TD7Y/qE1s+DEFxub2AThxhxDiNgSq0PvOCgwEYM2ksnu+Loa4yYrmAkfxbkTtnMKDJaMlzUo1noIho4ysV3VsKPFqs2+VvCcT8gvlI7FrZAoAzYb7rG1AX+J27SVQdo8y0D8B9ZL9lx7U2eiVW9yMZG8R3sJyy/2Sh1s09zCjbq9nHhONg9vORwRyx2d1V/IxRZ132L/8G/o44ME5cpZ26/L4nOi80BjvbXjkyK2fn0Xm+PMq67J6z6cujZPSDaXzJ9UbEeodEqGc5nf2eJcuj2yS7Xx3G/ZI6K9XtIuiiyHi26W1h7Fmx/p/+UJqlP4BnJAtklAIgNPvUHKG2bzPywIhC+1BWVUhCYiTk/MGj1zBNGtNaqyUJrVg01LETXVCOgQc6vjauGj6hdffJe+iLZASK+sdGWtUC2Tqmj4Jt1QYJFYuXSjtdw7O9EjMo5+SmmcMbl60Ny5+YUmX+Pu86Cm5ixYupiMtzXBvP3odqRiVwqA+MlOa2qIYUuLRzoJcXMDSY0DVavfW0BSyaKUi7z4VVNvpO77Jc7I0Q+6pZ9R1ta0Ev7skpzkdSUfooHxFZ8nef2PPWkcKEgFkouqq0OSFnt1Rf1Lqj510la9iCRErFHTESscBqf9NWgemQIvRwMqQuO/gUzWnOOctzU69a6SL7BDitxoufDb13UpjkO/hst4qYbbMb41Nn1A+8ClAwhAAP2vtsZIyaIbyH2xPQeesqbPjvKxspgB839Ui+ciEpa8NhJPmE79xtyY
 OG4QqxQZ
 4jJRYRRtaRVzVUDztI1H4KEB9ak5wSuUhWaFeB+EJMGe516LVfHo6T6EOTgFhjWWNS+C6pQwUZXYod7gFJh8zH7hH6mpB1VAu3I6ypAqyQQETgm1VSJNG+uhuGDCdiuEPfia2YiI/zvkoDXp0ho4uL7+6xSjimUjfe4HXbUAgvazL04YDjW2kLB+MyL/qpT1Yhgv76BD277SYwxvxwn0RZyBMmZYvbayIFnZbSVDE9UW6hqyqT8ATUbf4DUkQ0+u60NZz76qVxy7Apsbf6/1m79AbuWBF96xS6bRQHjXIXm6N9aqZJvFe3Fldx0vBFVPR0ZXM65L2gN0hGGkfkWCsdABdkoTY5vXMYRKMfdPblK0MaoDoR4tZ5+zrcNrdEqaA9PN9B0002LdMZnFijeGiIh7QTg==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Thu, Aug 22, 2024 at 10:01=E2=80=AFPM David Hildenbrand <david@redhat.co=
m> wrote:
>
> On 22.08.24 11:21, Barry Song wrote:
> > On Thu, Aug 22, 2024 at 8:59=E2=80=AFPM David Hildenbrand <david@redhat=
.com> wrote:
> >>
> >> On 22.08.24 10:44, Barry Song wrote:
> >>> On Thu, Aug 22, 2024 at 12:52=E2=80=AFPM Barry Song <21cnbao@gmail.co=
m> wrote:
> >>>>
> >>>> On Thu, Aug 22, 2024 at 5:34=E2=80=AFAM David Hildenbrand <david@red=
hat.com> wrote:
> >>>>>
> >>>>> On 12.08.24 00:49, Barry Song wrote:
> >>>>>> From: Barry Song <v-songbaohua@oppo.com>
> >>>>>>
> >>>>>> Anon large folios come from three places:
> >>>>>> 1. new allocated large folios in PF, they will call folio_add_new_=
anon_rmap()
> >>>>>> for rmap;
> >>>>>> 2. a large folio is split into multiple lower-order large folios;
> >>>>>> 3. a large folio is migrated to a new large folio.
> >>>>>>
> >>>>>> In all above three counts, we increase nr_anon by 1;
> >>>>>>
> >>>>>> Anon large folios might go either because of be split or be put
> >>>>>> to free, in these cases, we reduce the count by 1.
> >>>>>>
> >>>>>> Folios that have been added to the swap cache but have not yet rec=
eived
> >>>>>> an anon mapping won't be counted. This is consistent with the Anon=
Pages
> >>>>>> statistics in /proc/meminfo.
> >>>>>
> >>>>> Thinking out loud, I wonder if we want to have something like that =
for
> >>>>> any anon folios (including small ones).
> >>>>>
> >>>>> Assume we longterm-pinned an anon folio and unmapped/zapped it. It =
would
> >>>>> be quite interesting to see that these are actually anon pages stil=
l
> >>>>> consuming memory. Same with memory leaks, when an anon folio doesn'=
t get
> >>>>> freed for some reason.
> >>>>>
> >>>>> The whole "AnonPages" counter thingy is just confusing, it only cou=
nts
> >>>>> what's currently mapped ... so we'd want something different.
> >>>>>
> >>>>> But it's okay to start with large folios only, there we have a new
> >>>>> interface without that legacy stuff :)
> >>>>
> >>>> We have two options to do this:
> >>>> 1. add a new separate nr_anon_unmapped interface which
> >>>> counts unmapped anon memory only
> >>>> 2. let the nr_anon count both mapped and unmapped anon
> >>>> folios.
> >>>>
> >>>> I would assume 1 is clearer as right now AnonPages have been
> >>>> there for years. and counting all mapped and unmapped together,
> >>>> we are still lacking an approach to find out anon memory leak
> >>>> problem you mentioned.
> >>>>
> >>>> We are right now comparing nr_anon(including mapped folios only)
> >>>> with AnonPages to get the distribution of different folio sizes in
> >>>> performance profiling.
> >>>>
> >>>> unmapped_nr_anon should be normally always quite small. otherwise,
> >>>> something must be wrong.
> >>>>
> >>>>>
> >>>>>>
> >>>>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> >>>>>> ---
> >>>>>>     Documentation/admin-guide/mm/transhuge.rst |  5 +++++
> >>>>>>     include/linux/huge_mm.h                    | 15 +++++++++++++-=
-
> >>>>>>     mm/huge_memory.c                           | 13 ++++++++++---
> >>>>>>     mm/migrate.c                               |  4 ++++
> >>>>>>     mm/page_alloc.c                            |  5 ++++-
> >>>>>>     mm/rmap.c                                  |  1 +
> >>>>>>     6 files changed, 37 insertions(+), 6 deletions(-)
> >>>>>>
> >>>>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Document=
ation/admin-guide/mm/transhuge.rst
> >>>>>> index 058485daf186..9fdfb46e4560 100644
> >>>>>> --- a/Documentation/admin-guide/mm/transhuge.rst
> >>>>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
> >>>>>> @@ -527,6 +527,11 @@ split_deferred
> >>>>>>             it would free up some memory. Pages on split queue are=
 going to
> >>>>>>             be split under memory pressure, if splitting is possib=
le.
> >>>>>>
> >>>>>> +nr_anon
> >>>>>> +       the number of anon huge pages we have in the whole system.
> >>>>>
> >>>>> "transparent ..." otherwise people might confuse it with anon huget=
lb
> >>>>> "huge pages" ... :)
> >>>>>
> >>>>> I briefly tried coming up with a better name than "nr_anon" but fai=
led.
> >>>>>
> >>>>>
> >>>>
> >>>> if we might have unmapped_anon counter later, maybe rename it to
> >>>> nr_anon_mapped? and the new interface we will have in the future
> >>>> might be nr_anon_unmapped?
> >>
> >> We really shouldn't be using the mapped/unmapped terminology here ... =
we
> >> allocated pages and turned them into anonymous folios. At some point w=
e
> >> free them. That's the lifecycle.
> >>
> >>>
> >>> On second thought, this might be incorrect as well. Concepts like 'an=
on',
> >>> 'shmem', and 'file' refer to states after mapping. If an 'anon' has b=
een
> >>> unmapped but is still pinned and not yet freed, it isn't technically =
an
> >>> 'anon' anymore?
> >>
> >> It's just not mapped, and cannot get mapped, anymore. In the memdesc
> >> world, we'd be freeing the "struct anon" or "struct folio" once the la=
st
> >> refcount goes to 0, not once (e.g., temporarily during a failed
> >> migration?) unmapped.
> >>
> >> The important part to me would be: this is memory that was allocated f=
or
> >> anonymous memory, and it's still around for some reason and not gettin=
g
> >> freed. Usually, we would expect anon memory to get freed fairly quickl=
y
> >> once unmapped. Except when there are long-term pinnings or other types
> >> of memory leaks.
> >>
> >> You could happily continue using these anon pages via vmsplice() or
> >> similar, even thought he original page table mapping was torn down.
> >>
> >>>
> >>> On the other hand, implementing nr_anon_unmapped could be extremely
> >>> tricky. I have no idea how to implement it as we are losing those map=
ping
> >>> flags.
> >>
> >> folio_mapcount() can tell you efficiently whether a folio is mapped or
> >> not -- and that information will stay for all eternity as long as we
> >> have any mapcounts :) . It cannot tell "how many" pages of a large fol=
io
> >> are mapped, but at least "is any page of this large folio mapped".
> >
> > Exactly. AnonPages decreases by -1 when removed from the rmap,
> > whereas nr_anon decreases by -1 when an anon folio is freed. So,
> > I would assume nr_anon includes those pinned and unmapped anon
> > folios but AnonPages doesn't.
>
> Right, note how internally it is called "NR_ANON_MAPPED", but we ended
> up calling it "AnonPages". But that's rather a legacy interface we
> cannot change (fix) that easily. At least not without a config option.
>
> At some point it might indeed be interesting to have "nr_anon_mapped",
> here, but that would correspond to "is any part of this large folio
> mapped". For debugging purposes in the future, that might be indeed
> interesting.
>
> "nr_anon": anon allocations (until freed -> +1)
> "nr_anon_mapped": anon allocations that are mapped (any part mapped -> +1=
)
> "nr_anon_partially_mapped": anon allocations that was detected to be
> partially mapped at some point -> +1
>
> If a folio is in the swapcache, I would still want to see that it is an
> anon allocation lurking around in the system. Like we do with pagecache
> pages. *There* we do have the difference between "allocated" and
> "mapped" already.
>
> So likely, calling it "nr_anon" here, and tracking it on an allocation
> level, is good enough for now and future proof.

Right. I plan to send v3 tomorrow to at least unblock Usama's series,
in case he wants to rebase on top of it.

>
> >
> > If there's a significant amount of 'leaked' anon, we should consider
> > having a separate counter for them. For instance, if nr_anon is
> > 100,000 and pinned/unmapped pages account for 50%, then nr_anon
> > alone doesn=E2=80=99t effectively reflect the system's state.
>
> Right, but if you stare at the system you could tell that a significant
> amount of memory is still getting consumed through existing/previous
> anon mappings. Depends on how valuable that distinction really is.
>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry