From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB357C5320E for ; Thu, 22 Aug 2024 09:21:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3080C80020; Thu, 22 Aug 2024 05:21:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B6138001E; Thu, 22 Aug 2024 05:21:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 131D580020; Thu, 22 Aug 2024 05:21:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E69598001E for ; Thu, 22 Aug 2024 05:21:20 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 994D5161447 for ; Thu, 22 Aug 2024 09:21:20 +0000 (UTC) X-FDA: 82479337920.13.63AF274 Received: from mail-vs1-f52.google.com (mail-vs1-f52.google.com [209.85.217.52]) by imf16.hostedemail.com (Postfix) with ESMTP id D6B73180009 for ; Thu, 22 Aug 2024 09:21:18 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hYJyCI3d; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.52 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724318419; a=rsa-sha256; cv=none; b=wvfC9PmnUKcLYxHiIlUK8ZwUNAv35tGo9T40sNit6bWelu9CY2p9pOo3fTSuC5ijXquO8r aStkU3izD8hjUF05GqCflCYRL/CeRfl0MdMX2Y13oDXkKtjgeOSud1Iw2NKHsde5B8JbBz XmuYb836ewxGn/CuwFRX3wxm3Ah/PZU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hYJyCI3d; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.52 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724318419; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M5OPoCeLJftITQysp2N+imC7KIo1hVOMatkO69d2rR8=; b=XOiA+4qZUwaQqmvHCmTpkVOe8ePW2m9fOQw6vNLirXubj239bBjnSE+PcabL9dHulCrYzz 9nhZo9jI6vOtjKIXrmhA/mlk4F97rB9WEJu/QQopkdQ/x6H+sL5sUh6yfuKA5fE2/4Vg5u 5krfjjDPX/eZzBi4lL80ssxo6TY3f6s= Received: by mail-vs1-f52.google.com with SMTP id ada2fe7eead31-498cd1112c3so161718137.0 for ; Thu, 22 Aug 2024 02:21:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724318478; x=1724923278; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=M5OPoCeLJftITQysp2N+imC7KIo1hVOMatkO69d2rR8=; b=hYJyCI3dtU2oa62k2k8SPHXR0CYioUSiQyhFoT0wAtAsV1n41KKrxeiYoIJnxRZpm4 kHAm5ywb+LqNXg1H8d3OGLV6Guca17NL8/Zruhn9+qQHILgLNrHUySUVporEnEgglLGi 6dhuBGiHOm1Tz1GaKZtrhcUZ3rCE5tRGwsyKnXn5+4sOiyiAbkDOS5L33ttAMOgBI6+6 71LyvRmHfvwOjzjVmu/vZipF/Z7CHNk6w2+1bNYiKe65ZYwRqoAOOU9zlQ9Zf8H9u6LH APCvTxNiIMjvYu1Riwmk+aLZMiHnDQ9v0kQVpkFjM1a7yvrXER25yphgxZgcaSSJEaig 60bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724318478; x=1724923278; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=M5OPoCeLJftITQysp2N+imC7KIo1hVOMatkO69d2rR8=; b=IUuTWZXu2p97X0uN0fkJzBWR+RfhLTD0PV/EQAaOuvFMsdJnRMIx4kvgvYh4lEjH+p Zc9m8YoyDE+Avv5JvCgM8lBpQz9ntVfXTENf/TImpAoCTOL2XR9UFnPUV9N8NEnX3oes X5AyS2IF17WhK5K9JrVOGAOCeBB4D+FL8KrygVx7Sr/SzTA8Ej6mE4Ad27f9GlEL98/n B9trUJG0mfHT+FXZk8fTKc1Zx8lXxYiUCk7NwvcjOrY58AB2EIFczgE1Fhc1rO+waoj5 8JN0780NAg5eYhfPqlahrfB5OPd3PnEXAOm82ydo6bfornFd1JUcjqwULI/TTw6YUuOq ppoA== X-Forwarded-Encrypted: i=1; AJvYcCWb2T+2nvSkGl38viK6RmQ8ExJdL74BTN/CXXllVoNrYBB0khogGTZE3xuzcNgCXSVAmiJpmdWOdQ==@kvack.org X-Gm-Message-State: AOJu0Yy8cc5x5KajHYXmpuR235ExBCJfVksRmZVZjNMpiHUcQDOlrrCr /KLFsazUI6nHok8kKb/n6vLOk/78552gLR7/ZaZyGbudnC9LnHIox6hU4eJ9HfgadwF83/9G7KR 98vwCwDQ5sKgRntMkPrqeYfavbvQ= X-Google-Smtp-Source: AGHT+IEu2zKxX/CRwD27r8BAS3GwSTQPt7JCdGXQbl9OP8kPF8iRMqNufKsIEYinyNIOEzsbuXQzLl4agCs9noXilR0= X-Received: by 2002:a05:6102:3383:b0:498:cf81:3a40 with SMTP id ada2fe7eead31-498e6eccd8amr1082182137.17.1724318477822; Thu, 22 Aug 2024 02:21:17 -0700 (PDT) MIME-Version: 1.0 References: <20240811224940.39876-1-21cnbao@gmail.com> <20240811224940.39876-2-21cnbao@gmail.com> <3572ae2e-2141-4a70-99da-850b2e7ade41@redhat.com> <954f80c8-5bc3-44f5-a361-32073cbbd764@redhat.com> In-Reply-To: <954f80c8-5bc3-44f5-a361-32073cbbd764@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Thu, 22 Aug 2024 21:21:06 +1200 Message-ID: Subject: Re: [PATCH v2 1/2] mm: collect the number of anon large folios To: David Hildenbrand Cc: akpm@linux-foundation.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, hanchuanhua@oppo.com, ioworker0@gmail.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, v-songbaohua@oppo.com, ziy@nvidia.com, yuanshuai@oppo.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: D6B73180009 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: tbedp64qgpnwrx19pd43dq69dynz97hi X-HE-Tag: 1724318478-4209 X-HE-Meta: U2FsdGVkX1/Upg0urFbwhiUZieOXAe8ElKKz+IQSGwl89MZB++G038/ndPbiPXO8R6JACKHzw3rNtLyvJI4PTIa3OlG0HuVJOL5iAGo66VoJ52p1QN08QMd8agri0xLZ2S19WvIyzGCcfjw762MUp3hFQi4jvJlhNOTsPTtUBg99PSIccyiuSil70a92j4cwTFTx6FHOBQ/39yS1ArpHYNpWvzCpQajO7OWsjfoRp6StGlpveWYhj6q5uXSszgbJyK07hzmJhDudWfe9EI1k5L7E6RFJWPHGl2XC8ARP6HbQZ6l4rSL8js1+Gns/37sxsH/XC4ECjZ7IsqYrajPhwA4mp23jtNAOKv54CHYv+KPBgkYRq7tdZ+9d62nb6P1zN0sZnvCaAcEnayhWPMpMg7wIXz6mmyeMudyFFtCyxH9x0ptacdAHcmxn9fs95FperXqd+iQ9viHcVM0gyYkDkflXSV/ckfBNDLmOkQeytYQ2VSdg8VIVWrsNwhe2en+8opgKTQvBiu9+OdI7EBWPOiGfUE3f2Issikl/0YyMQGk58RXFZeFHVPSKPZADDQDP/3vhbFBKJCW6qh4XuyozEkn8FYH7JUI2gZqe1BwHORyNY1Kyu1XnrCVi95j4HjwwYXCkMcESOEYRyFglBvbzmOFf6KfIUV2n6wqrZayDneQZhvW2hqP4V4YXaEYqabSjjuBTDMfFexW+x5eMT9z5vL2RT6SfUemGks6ealFn3BbwhCPnPrRe/H23ZV2FsWUd3+gIonWXzzZNdRqLZ3WHsdLbJWKtgElNf7jU8zQQHQSGWdgZerJkZQ0dJ4sa5xut10aUAn+I5f4uHo2QQv0KXXqeS5t+Xj3a9Tt1sGCnw5TUPZv57XwVrXN8VQOWWaBRxvHUYlmne5iTgtN28jHlgYNytgbf1SNH3LnXrISgPf8K8dWFXB7bqlHdGq6+wGCROUwxb1QNM7S6LnydUfK Q40458yq i6gRnUDP6IV8mNodehIpG14HmdK1hSj9RrXVEE3sdbztfa3/OMFrWpN/Nnyg/vQFhgAGvItSAt1mYgi6aj+zP8BWBTMdsMrWTrsEToy0HUm9+EpRxwnXIBFNMyiCpH0ylqMMArYf8iRAz4M2WlYoJ0CPo5l7BwiTVfuZr6SrTYB0MX2NI6PDRNlBG41PJIlumWmVyRnTLb0FZn4v/Q1aaXWz97RNO+6cCvQAKrTQvcqridByv8YxXBbZECvHL9FrrRPKkyYDOne4mYWxQSISIv5fudnDtZq2SEeG56TqwcsK5zofGijx+uDkRLoAHC7ptqNGaxG08Z0+GhzdAnb8eFGATN5kba0DlrzpFUVzxx8G4B/5V+r4p+WBdYW//SAFqHJGYkU265VS/kbyIE4rFfnrIwA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 22, 2024 at 8:59=E2=80=AFPM David Hildenbrand wrote: > > On 22.08.24 10:44, Barry Song wrote: > > On Thu, Aug 22, 2024 at 12:52=E2=80=AFPM Barry Song <21cnbao@gmail.com>= wrote: > >> > >> On Thu, Aug 22, 2024 at 5:34=E2=80=AFAM David Hildenbrand wrote: > >>> > >>> On 12.08.24 00:49, Barry Song wrote: > >>>> From: Barry Song > >>>> > >>>> Anon large folios come from three places: > >>>> 1. new allocated large folios in PF, they will call folio_add_new_an= on_rmap() > >>>> for rmap; > >>>> 2. a large folio is split into multiple lower-order large folios; > >>>> 3. a large folio is migrated to a new large folio. > >>>> > >>>> In all above three counts, we increase nr_anon by 1; > >>>> > >>>> Anon large folios might go either because of be split or be put > >>>> to free, in these cases, we reduce the count by 1. > >>>> > >>>> Folios that have been added to the swap cache but have not yet recei= ved > >>>> an anon mapping won't be counted. This is consistent with the AnonPa= ges > >>>> statistics in /proc/meminfo. > >>> > >>> Thinking out loud, I wonder if we want to have something like that fo= r > >>> any anon folios (including small ones). > >>> > >>> Assume we longterm-pinned an anon folio and unmapped/zapped it. It wo= uld > >>> be quite interesting to see that these are actually anon pages still > >>> consuming memory. Same with memory leaks, when an anon folio doesn't = get > >>> freed for some reason. > >>> > >>> The whole "AnonPages" counter thingy is just confusing, it only count= s > >>> what's currently mapped ... so we'd want something different. > >>> > >>> But it's okay to start with large folios only, there we have a new > >>> interface without that legacy stuff :) > >> > >> We have two options to do this: > >> 1. add a new separate nr_anon_unmapped interface which > >> counts unmapped anon memory only > >> 2. let the nr_anon count both mapped and unmapped anon > >> folios. > >> > >> I would assume 1 is clearer as right now AnonPages have been > >> there for years. and counting all mapped and unmapped together, > >> we are still lacking an approach to find out anon memory leak > >> problem you mentioned. > >> > >> We are right now comparing nr_anon(including mapped folios only) > >> with AnonPages to get the distribution of different folio sizes in > >> performance profiling. > >> > >> unmapped_nr_anon should be normally always quite small. otherwise, > >> something must be wrong. > >> > >>> > >>>> > >>>> Signed-off-by: Barry Song > >>>> --- > >>>> Documentation/admin-guide/mm/transhuge.rst | 5 +++++ > >>>> include/linux/huge_mm.h | 15 +++++++++++++-- > >>>> mm/huge_memory.c | 13 ++++++++++--- > >>>> mm/migrate.c | 4 ++++ > >>>> mm/page_alloc.c | 5 ++++- > >>>> mm/rmap.c | 1 + > >>>> 6 files changed, 37 insertions(+), 6 deletions(-) > >>>> > >>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentat= ion/admin-guide/mm/transhuge.rst > >>>> index 058485daf186..9fdfb46e4560 100644 > >>>> --- a/Documentation/admin-guide/mm/transhuge.rst > >>>> +++ b/Documentation/admin-guide/mm/transhuge.rst > >>>> @@ -527,6 +527,11 @@ split_deferred > >>>> it would free up some memory. Pages on split queue are go= ing to > >>>> be split under memory pressure, if splitting is possible. > >>>> > >>>> +nr_anon > >>>> + the number of anon huge pages we have in the whole system. > >>> > >>> "transparent ..." otherwise people might confuse it with anon hugetlb > >>> "huge pages" ... :) > >>> > >>> I briefly tried coming up with a better name than "nr_anon" but faile= d. > >>> > >>> > >> > >> if we might have unmapped_anon counter later, maybe rename it to > >> nr_anon_mapped? and the new interface we will have in the future > >> might be nr_anon_unmapped? > > We really shouldn't be using the mapped/unmapped terminology here ... we > allocated pages and turned them into anonymous folios. At some point we > free them. That's the lifecycle. > > > > > On second thought, this might be incorrect as well. Concepts like 'anon= ', > > 'shmem', and 'file' refer to states after mapping. If an 'anon' has bee= n > > unmapped but is still pinned and not yet freed, it isn't technically an > > 'anon' anymore? > > It's just not mapped, and cannot get mapped, anymore. In the memdesc > world, we'd be freeing the "struct anon" or "struct folio" once the last > refcount goes to 0, not once (e.g., temporarily during a failed > migration?) unmapped. > > The important part to me would be: this is memory that was allocated for > anonymous memory, and it's still around for some reason and not getting > freed. Usually, we would expect anon memory to get freed fairly quickly > once unmapped. Except when there are long-term pinnings or other types > of memory leaks. > > You could happily continue using these anon pages via vmsplice() or > similar, even thought he original page table mapping was torn down. > > > > > On the other hand, implementing nr_anon_unmapped could be extremely > > tricky. I have no idea how to implement it as we are losing those mappi= ng > > flags. > > folio_mapcount() can tell you efficiently whether a folio is mapped or > not -- and that information will stay for all eternity as long as we > have any mapcounts :) . It cannot tell "how many" pages of a large folio > are mapped, but at least "is any page of this large folio mapped". Exactly. AnonPages decreases by -1 when removed from the rmap, whereas nr_anon decreases by -1 when an anon folio is freed. So, I would assume nr_anon includes those pinned and unmapped anon folios but AnonPages doesn't. If there's a significant amount of 'leaked' anon, we should consider having a separate counter for them. For instance, if nr_anon is 100,000 and pinned/unmapped pages account for 50%, then nr_anon alone doesn=E2=80=99t effectively reflect the system's state. to implement that, it seems we do need to detect the moment mapcount=3D=3D0 and the moment of freeing anon? when mapcount=3D=3D0 in rmap unmapped_pinned_anon++; when free unmapped_pinned_anon--; Anyway, it seems this is a separate job. > > > > > However, a page that is read-ahead but not yet mapped can still become > > an anon, which seems slightly less tricky to count though seems still > > difficult - except anon pages, shmem can be also swapped-backed? > > Yes. I'm sure there would be ways to achieve it, but I am not sure if > it's worth the churn. These pages can be reclaimed easily (I would > assume? They are not even mapped and were never accessible via GUP), and > can certainly not have any longterm pinnings or similar. There are more > like "cached things that could become an anon folio". Exactly. If no one maps the pages for an extended period, I assume the LRU will reclaim them as well. > > -- > Cheers, > > David / dhildenb Thanks Barry