From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21853C3DA4A for ; Thu, 22 Aug 2024 10:12:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 903B46B01F9; Thu, 22 Aug 2024 06:12:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 88A376B01FE; Thu, 22 Aug 2024 06:12:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 703D86B0202; Thu, 22 Aug 2024 06:12:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4D6166B01F9 for ; Thu, 22 Aug 2024 06:12:50 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E0672C0551 for ; Thu, 22 Aug 2024 10:12:49 +0000 (UTC) X-FDA: 82479467658.24.546E9CA Received: from mail-vk1-f170.google.com (mail-vk1-f170.google.com [209.85.221.170]) by imf24.hostedemail.com (Postfix) with ESMTP id 185D2180026 for ; Thu, 22 Aug 2024 10:12:47 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="bN/LyzAq"; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724321486; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AmLdGfPjTGzFQMk4/RbDajyGMJHiO3CYEilHtnJ6xfY=; b=VY/hLHSPgEALR1aEyHLAENgfa3VaEBqTAJ7WJ7o30VndJmezAFmrdlvmES8+MECBOY0ow0 wcnu60nvaLyV4FVcFIg7t4rLtJLPJntsL5Erq+z43eV5+x1pdDbGRGVtGJzd7yJXd3c7xv wQxUxjrKcJPU3JHKtw5k9MIcx1D99+8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724321486; a=rsa-sha256; cv=none; b=aMFJxE5iR2kuTNaEAP+5xSIjOf58uEdaL1rMCuOE1liihrv1+e3NtIkeH+mPuvRa8VdLj5 cdyVSMeaxsIjaAdHLRgRPjSlxw9ew4jtd7wxEzHlGsSmz719+7AKBvaSp+BH7G/KOPwbOg WS1TdEBMF9qXAcEU1rgw/c6TzvfRaTk= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="bN/LyzAq"; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vk1-f170.google.com with SMTP id 71dfb90a1353d-4fcefbd6bc4so294436e0c.1 for ; Thu, 22 Aug 2024 03:12:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724321567; x=1724926367; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AmLdGfPjTGzFQMk4/RbDajyGMJHiO3CYEilHtnJ6xfY=; b=bN/LyzAqUUQqd0BqHNFQGzcSp+jr9Xe6AqiTlkNt1jTV09yWBs6VE60V9mvkBypG3i pE4Yo7/+clyksk74rXlf6XPISYHVQdy1B22QE0mA4drt5a0aWLBOjNjtHUnujrW6xU+h sHWrjMdw7eMauhOVJPufOOLf26oc6dSvemaAw9AqG5KEcPwBB+LT6ac9moO7USpWGmsm VN7IWGGHCD2VcKJM+lDRSxZ+VFUWTilBcTN01CPDbk0m8+Qrst3cLGygQ66O2DuWgoyh U1rkzZbTqsaOdtLDJ0M6kE0m35+LTEdzlFO61CYnTj93jzrWJ+LpnBPNt8tCWShCOeJj +THQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724321567; x=1724926367; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AmLdGfPjTGzFQMk4/RbDajyGMJHiO3CYEilHtnJ6xfY=; b=XK8zuy+UpY70n8acgTOg5t26uUwlA1nSmktPbqufeMglzltGaobzhX5lDm7nRc/YAc gAml3SLsoGku6MexhP0QLCkdlqxyZlhcSWUYLadMtMvYVwRYGzxXL4GfGYfwBHgx+LOy KG+bXbJv3Xf4WOu/h/OduJNpriKDSJtrDy5WlrcY6ftTSUaGXrRmPaKGmg09zirUzPP4 OMSRQBNGGQL7Tv+/xv51Mfp8u7QtDUkOuP9eJp7RcbdG+ka9BNeXxcsBZ+p/CWEFIZh/ U0FEDQ5/Xc88uLGrMMybgu8FZThgKvs1uwOm4tjWIbQkx4zLPQImfG6/VHOiiyUi20zy s/Pw== X-Forwarded-Encrypted: i=1; AJvYcCWMt738BhYt4Iy8l4kKTVXefh63CA/VNEEIsXGUZpYbiemyTy319vCWADajD7TBfOtY6bHQ5vma8g==@kvack.org X-Gm-Message-State: AOJu0Yxvp+xD30+LcyzY05WkNOPmHFBGQUQVBbpSzXrJKE7iemHjjic+ 1nj1WaWMbgGbNON0TxobrHhBFcfoRm+EGLBfdAeRCNrGB1EFdZrrKJCRhbTXRLft9XFcdOKZRf7 IeZ3EeqDrC+Afhs5gVNtCjlXvinw= X-Google-Smtp-Source: AGHT+IEDrAQ5iKq70g8o15r9pKALcTZDZE54C4ReJhnkom1WT14GekqrJWsI11Dc2tvFyJcqeb/fR/oG9++UQd8JoQE= X-Received: by 2002:a05:6122:a05:b0:4f5:2276:136d with SMTP id 71dfb90a1353d-4fcf1977d47mr6139028e0c.1.1724321567042; Thu, 22 Aug 2024 03:12:47 -0700 (PDT) MIME-Version: 1.0 References: <20240811224940.39876-1-21cnbao@gmail.com> <20240811224940.39876-2-21cnbao@gmail.com> <3572ae2e-2141-4a70-99da-850b2e7ade41@redhat.com> <954f80c8-5bc3-44f5-a361-32073cbbd764@redhat.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Thu, 22 Aug 2024 22:12:35 +1200 Message-ID: Subject: Re: [PATCH v2 1/2] mm: collect the number of anon large folios To: David Hildenbrand Cc: akpm@linux-foundation.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, hanchuanhua@oppo.com, ioworker0@gmail.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, v-songbaohua@oppo.com, ziy@nvidia.com, yuanshuai@oppo.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 3gr7hypf7h5go1zzxukni8e5mbauityg X-Rspamd-Queue-Id: 185D2180026 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1724321567-236013 X-HE-Meta: U2FsdGVkX1/GBCTJpKqr4Tusr1WYbSN6vUO0FYK/VnevNesmR/z6EYTnhnJSliiIAlH9o/XZo2xkZSgIw2HUFqmrXdNCwzjfCzNjPV56wk+7wZ1PSoVIW23X6DYQ+CinKqZA5dZm+vFVMz5Pl8NqxhbcHNCYgzu6xqOAIaITK14xgcOefosfOs7uqZv+BcZ+62WbV7uM2/7LhHHDI1D/oC2TD7Y/qE1s+DEFxub2AThxhxDiNgSq0PvOCgwEYM2ksnu+Loa4yYrmAkfxbkTtnMKDJaMlzUo1noIho4ysV3VsKPFqs2+VvCcT8gvlI7FrZAoAzYb7rG1AX+J27SVQdo8y0D8B9ZL9lx7U2eiVW9yMZG8R3sJyy/2Sh1s09zCjbq9nHhONg9vORwRyx2d1V/IxRZ132L/8G/o44ME5cpZ26/L4nOi80BjvbXjkyK2fn0Xm+PMq67J6z6cujZPSDaXzJ9UbEeodEqGc5nf2eJcuj2yS7Xx3G/ZI6K9XtIuiiyHi26W1h7Fmx/p/+UJqlP4BnJAtklAIgNPvUHKG2bzPywIhC+1BWVUhCYiTk/MGj1zBNGtNaqyUJrVg01LETXVCOgQc6vjauGj6hdffJe+iLZASK+sdGWtUC2Tqmj4Jt1QYJFYuXSjtdw7O9EjMo5+SmmcMbl60Ny5+YUmX+Pu86Cm5ixYupiMtzXBvP3odqRiVwqA+MlOa2qIYUuLRzoJcXMDSY0DVavfW0BSyaKUi7z4VVNvpO77Jc7I0Q+6pZ9R1ta0Ev7skpzkdSUfooHxFZ8nef2PPWkcKEgFkouqq0OSFnt1Rf1Lqj510la9iCRErFHTESscBqf9NWgemQIvRwMqQuO/gUzWnOOctzU69a6SL7BDitxoufDb13UpjkO/hst4qYbbMb41Nn1A+8ClAwhAAP2vtsZIyaIbyH2xPQeesqbPjvKxspgB839Ui+ciEpa8NhJPmE79xtyY OG4QqxQZ 4jJRYRRtaRVzVUDztI1H4KEB9ak5wSuUhWaFeB+EJMGe516LVfHo6T6EOTgFhjWWNS+C6pQwUZXYod7gFJh8zH7hH6mpB1VAu3I6ypAqyQQETgm1VSJNG+uhuGDCdiuEPfia2YiI/zvkoDXp0ho4uL7+6xSjimUjfe4HXbUAgvazL04YDjW2kLB+MyL/qpT1Yhgv76BD277SYwxvxwn0RZyBMmZYvbayIFnZbSVDE9UW6hqyqT8ATUbf4DUkQ0+u60NZz76qVxy7Apsbf6/1m79AbuWBF96xS6bRQHjXIXm6N9aqZJvFe3Fldx0vBFVPR0ZXM65L2gN0hGGkfkWCsdABdkoTY5vXMYRKMfdPblK0MaoDoR4tZ5+zrcNrdEqaA9PN9B0002LdMZnFijeGiIh7QTg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 22, 2024 at 10:01=E2=80=AFPM David Hildenbrand wrote: > > On 22.08.24 11:21, Barry Song wrote: > > On Thu, Aug 22, 2024 at 8:59=E2=80=AFPM David Hildenbrand wrote: > >> > >> On 22.08.24 10:44, Barry Song wrote: > >>> On Thu, Aug 22, 2024 at 12:52=E2=80=AFPM Barry Song <21cnbao@gmail.co= m> wrote: > >>>> > >>>> On Thu, Aug 22, 2024 at 5:34=E2=80=AFAM David Hildenbrand wrote: > >>>>> > >>>>> On 12.08.24 00:49, Barry Song wrote: > >>>>>> From: Barry Song > >>>>>> > >>>>>> Anon large folios come from three places: > >>>>>> 1. new allocated large folios in PF, they will call folio_add_new_= anon_rmap() > >>>>>> for rmap; > >>>>>> 2. a large folio is split into multiple lower-order large folios; > >>>>>> 3. a large folio is migrated to a new large folio. > >>>>>> > >>>>>> In all above three counts, we increase nr_anon by 1; > >>>>>> > >>>>>> Anon large folios might go either because of be split or be put > >>>>>> to free, in these cases, we reduce the count by 1. > >>>>>> > >>>>>> Folios that have been added to the swap cache but have not yet rec= eived > >>>>>> an anon mapping won't be counted. This is consistent with the Anon= Pages > >>>>>> statistics in /proc/meminfo. > >>>>> > >>>>> Thinking out loud, I wonder if we want to have something like that = for > >>>>> any anon folios (including small ones). > >>>>> > >>>>> Assume we longterm-pinned an anon folio and unmapped/zapped it. It = would > >>>>> be quite interesting to see that these are actually anon pages stil= l > >>>>> consuming memory. Same with memory leaks, when an anon folio doesn'= t get > >>>>> freed for some reason. > >>>>> > >>>>> The whole "AnonPages" counter thingy is just confusing, it only cou= nts > >>>>> what's currently mapped ... so we'd want something different. > >>>>> > >>>>> But it's okay to start with large folios only, there we have a new > >>>>> interface without that legacy stuff :) > >>>> > >>>> We have two options to do this: > >>>> 1. add a new separate nr_anon_unmapped interface which > >>>> counts unmapped anon memory only > >>>> 2. let the nr_anon count both mapped and unmapped anon > >>>> folios. > >>>> > >>>> I would assume 1 is clearer as right now AnonPages have been > >>>> there for years. and counting all mapped and unmapped together, > >>>> we are still lacking an approach to find out anon memory leak > >>>> problem you mentioned. > >>>> > >>>> We are right now comparing nr_anon(including mapped folios only) > >>>> with AnonPages to get the distribution of different folio sizes in > >>>> performance profiling. > >>>> > >>>> unmapped_nr_anon should be normally always quite small. otherwise, > >>>> something must be wrong. > >>>> > >>>>> > >>>>>> > >>>>>> Signed-off-by: Barry Song > >>>>>> --- > >>>>>> Documentation/admin-guide/mm/transhuge.rst | 5 +++++ > >>>>>> include/linux/huge_mm.h | 15 +++++++++++++-= - > >>>>>> mm/huge_memory.c | 13 ++++++++++--- > >>>>>> mm/migrate.c | 4 ++++ > >>>>>> mm/page_alloc.c | 5 ++++- > >>>>>> mm/rmap.c | 1 + > >>>>>> 6 files changed, 37 insertions(+), 6 deletions(-) > >>>>>> > >>>>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Document= ation/admin-guide/mm/transhuge.rst > >>>>>> index 058485daf186..9fdfb46e4560 100644 > >>>>>> --- a/Documentation/admin-guide/mm/transhuge.rst > >>>>>> +++ b/Documentation/admin-guide/mm/transhuge.rst > >>>>>> @@ -527,6 +527,11 @@ split_deferred > >>>>>> it would free up some memory. Pages on split queue are= going to > >>>>>> be split under memory pressure, if splitting is possib= le. > >>>>>> > >>>>>> +nr_anon > >>>>>> + the number of anon huge pages we have in the whole system. > >>>>> > >>>>> "transparent ..." otherwise people might confuse it with anon huget= lb > >>>>> "huge pages" ... :) > >>>>> > >>>>> I briefly tried coming up with a better name than "nr_anon" but fai= led. > >>>>> > >>>>> > >>>> > >>>> if we might have unmapped_anon counter later, maybe rename it to > >>>> nr_anon_mapped? and the new interface we will have in the future > >>>> might be nr_anon_unmapped? > >> > >> We really shouldn't be using the mapped/unmapped terminology here ... = we > >> allocated pages and turned them into anonymous folios. At some point w= e > >> free them. That's the lifecycle. > >> > >>> > >>> On second thought, this might be incorrect as well. Concepts like 'an= on', > >>> 'shmem', and 'file' refer to states after mapping. If an 'anon' has b= een > >>> unmapped but is still pinned and not yet freed, it isn't technically = an > >>> 'anon' anymore? > >> > >> It's just not mapped, and cannot get mapped, anymore. In the memdesc > >> world, we'd be freeing the "struct anon" or "struct folio" once the la= st > >> refcount goes to 0, not once (e.g., temporarily during a failed > >> migration?) unmapped. > >> > >> The important part to me would be: this is memory that was allocated f= or > >> anonymous memory, and it's still around for some reason and not gettin= g > >> freed. Usually, we would expect anon memory to get freed fairly quickl= y > >> once unmapped. Except when there are long-term pinnings or other types > >> of memory leaks. > >> > >> You could happily continue using these anon pages via vmsplice() or > >> similar, even thought he original page table mapping was torn down. > >> > >>> > >>> On the other hand, implementing nr_anon_unmapped could be extremely > >>> tricky. I have no idea how to implement it as we are losing those map= ping > >>> flags. > >> > >> folio_mapcount() can tell you efficiently whether a folio is mapped or > >> not -- and that information will stay for all eternity as long as we > >> have any mapcounts :) . It cannot tell "how many" pages of a large fol= io > >> are mapped, but at least "is any page of this large folio mapped". > > > > Exactly. AnonPages decreases by -1 when removed from the rmap, > > whereas nr_anon decreases by -1 when an anon folio is freed. So, > > I would assume nr_anon includes those pinned and unmapped anon > > folios but AnonPages doesn't. > > Right, note how internally it is called "NR_ANON_MAPPED", but we ended > up calling it "AnonPages". But that's rather a legacy interface we > cannot change (fix) that easily. At least not without a config option. > > At some point it might indeed be interesting to have "nr_anon_mapped", > here, but that would correspond to "is any part of this large folio > mapped". For debugging purposes in the future, that might be indeed > interesting. > > "nr_anon": anon allocations (until freed -> +1) > "nr_anon_mapped": anon allocations that are mapped (any part mapped -> +1= ) > "nr_anon_partially_mapped": anon allocations that was detected to be > partially mapped at some point -> +1 > > If a folio is in the swapcache, I would still want to see that it is an > anon allocation lurking around in the system. Like we do with pagecache > pages. *There* we do have the difference between "allocated" and > "mapped" already. > > So likely, calling it "nr_anon" here, and tracking it on an allocation > level, is good enough for now and future proof. Right. I plan to send v3 tomorrow to at least unblock Usama's series, in case he wants to rebase on top of it. > > > > > If there's a significant amount of 'leaked' anon, we should consider > > having a separate counter for them. For instance, if nr_anon is > > 100,000 and pinned/unmapped pages account for 50%, then nr_anon > > alone doesn=E2=80=99t effectively reflect the system's state. > > Right, but if you stare at the system you could tell that a significant > amount of memory is still getting consumed through existing/previous > anon mappings. Depends on how valuable that distinction really is. > > -- > Cheers, > > David / dhildenb > Thanks Barry