linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Chris Mason <clm@meta.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>, Zi Yan <ziy@nvidia.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Qi Zheng <qi.zheng@linux.dev>,
	hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com,
	muchun.song@linux.dev, david@kernel.org,
	lorenzo.stoakes@oracle.com, harry.yoo@oracle.com,
	imran.f.khan@oracle.com, kamalesh.babulal@oracle.com,
	axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com,
	chenridong@huaweicloud.com, mkoutny@suse.com,
	akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com,
	apais@linux.microsoft.com, lance.yang@linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org, Qi Zheng <zhengqi.arch@bytedance.com>,
	Chris Mason <clm@fb.com>
Subject: Re: [PATCH v2 00/28] Eliminate Dying Memory Cgroup
Date: Tue, 30 Dec 2025 20:51:19 +0000	[thread overview]
Message-ID: <aVQ7RwxRaXC5kAG2@casper.infradead.org> (raw)
In-Reply-To: <59098b4f-c3bf-4b6c-80fb-604e6e1c755e@meta.com>

On Tue, Dec 30, 2025 at 02:18:51PM -0500, Chris Mason wrote:
> >>>> I just think you should do a preliminary review of the AI ​​review results
> >>>> instead of sending them out directly. Otherwise, if everyone does this,
> >>>> the community will be full of bots.
> 
> I do think it's awkward to dump the whole review output for the patch
> series in a single message.  It looks like there's a sudden jump to XML?
> It's better to reply to the individual patches with the comments
> inline, which I think is where Roman is trying to go long term.

I don't know what Roman's trying to do long-term, but his email
that started this thread was so badly written that it was offensive.
Had it been sent to me, I would have responded in the style of Arkell
v Pressdram.

> With BPF, it looks more like this:
> https://lore.kernel.org/bpf/?q=AI+reviewed+your+patch

That's actually useful.

> >>>> 2. Looking at the mm prompt: https://github.com/masoncl/review-prompts/blob/main/mm.md , are you sure the patterns are all right?
> >> 	a. Page/Folio States, Large folios require per-page state tracking for
> >> 		Reference counts. I thought we want to get rid of per page refcount.
> 
> Early in prompt development I hand picked a few hundred patches from
> 6.16 fixing bugs, and I iterated on these adding subsystem knowledge to
> catch the known bugs.  That's where that rule came from, but as you say
> there's a risk this information gets old.  Do we want to get rid of per
> page refcounts or have we done it?  (more on that at the bottom of the
> email).

There is no such thing as a per-page reference count.  Any attempt to
access the page reference count redirects to the folio refcount.  This
has been the case since 2016 (four years before folios existed).  See
commit ddc58f27f9ee.

We do want to git rid of calls to get_page() and put_page() for a
variety of reasons that will be long and painful to write out.

> As an example of how I'd fix the prompt if the per page state tracking
> were causing problems (and if we didn't want to just remove it), I asked
> claude to analyze how it is still used.  The output is below, I'd double
> check things as best I could, shorten into prompt form and send to the
> list for review.
> 
> Per-Page Tracking in Large Folios - Analysis
> =============================================
> 
> Based on analysis of mm/*.c files and commit history, MM-004's claim is
> still partially true - large folios do need per-page tracking for some
> bits, though recent work has significantly reduced this.
> 
> 
> Bits That Still Require Per-Page Tracking
> ------------------------------------------
> 
> 1. PG_hwpoison (include/linux/page-flags.h:118)
> 
>    Defined as PAGEFLAG(HWPoison, hwpoison, PF_ANY), this flag is set on
>    individual pages within a large folio when hardware memory corruption
>    is detected.
> 
>    The folio_test_has_hwpoisoned() flag on the second page indicates at
>    least one subpage is poisoned, but does not identify which one.
> 
>    When splitting a large folio, page_range_has_hwpoisoned() in
>    mm/huge_memory.c:3467 iterates through pages checking PageHWPoison()
>    for each:
> 
>        static bool page_range_has_hwpoisoned(struct page *page, long nr_pages)
>        {
>            for (; nr_pages; page++, nr_pages--)
>                if (PageHWPoison(page))
>                    return true;
>            return false;
>        }
> 
>    Used in rmap code (mm/rmap.c:1990, 2070, 2473) to check individual
>    subpages when unmapping or migrating.
> 
> 2. PG_anon_exclusive (include/linux/page-flags.h:146)
> 
>    Per the comment at include/linux/page-flags.h:139-145:
> 
>        "Depending on the way an anonymous folio can be mapped into a page
>        table (e.g., single PMD/PUD/CONT of the head page vs. PTE-mapped
>        THP), PG_anon_exclusive may be set only for the head page or for
>        tail pages of an anonymous folio. For now, we only expect it to be
>        set on tail pages for PTE-mapped THP."
> 
>    Used at mm/rmap.c:1408-1416: when RMAP_EXCLUSIVE flag is set for
>    PTE-level mappings, it iterates through each page:
> 
>        for (i = 0; i < nr_pages; i++)
>            SetPageAnonExclusive(page + i);
> 
>    HugeTLB stores this on head page only (see PageAnonExclusive() at
>    include/linux/page-flags.h:1153-1162), but PTE-mapped THP needs
>    per-page tracking.
> 
> 
> Recent Changes - Per-Page Mapcount Removed
> ------------------------------------------
> 
> Commit 749492229e3bd ("mm: stop maintaining the per-page mapcount of
> large folios") by David Hildenbrand (March 2025) introduced
> CONFIG_NO_PAGE_MAPCOUNT which:
> 
>   - Stops maintaining per-page mapcounts in tail pages of large folios
>   - Tail page mapcount is now always logically 0 (-1 value)
>   - Removed _nr_pages_mapped tracking
> 
> This was a significant simplification, but it does not affect the
> per-page flag tracking described above.
> 
> 
> Flags Stored in Second Page Only (Not Per-Page)
> -----------------------------------------------
> 
> These are stored in the first tail page (FOLIO_SECOND_PAGE) and apply to
> the entire folio, not individual pages:
> 
>   - PG_has_hwpoisoned  - indicates some page in folio is poisoned
>   - PG_large_rmappable - folio is rmappable
>   - PG_partially_mapped - folio is partially mapped
> 
> See PAGE_FLAGS_SECOND definition at include/linux/page-flags.h:1218-1220.
> 
> 
> Conclusion
> ----------
> 
> While per-page mapcount has been eliminated, PG_hwpoison and
> PG_anon_exclusive (for PTE-mapped THP) still require per-page tracking
> in large folios. MM-004's claim remains valid for these specific bits.
> 
> Key source files:
>   - include/linux/page-flags.h (flag definitions and accessors)
>   - mm/huge_memory.c (folio split handling)
>   - mm/rmap.c (reverse mapping with per-page exclusive tracking)

This is pretty good and yet dangerously wrong in some missed nuances.
Which probably summarises the state of the art nicely ;-)

To start with, all flags marked as PF_ANY are set on individual pages
rather than only the folio.  So that's currently:

PAGEFLAG(Private, private, PF_ANY)
PAGEFLAG(HWPoison, hwpoison, PF_ANY)
PAGEFLAG(VmemmapSelfHosted, vmemmap_self_hosted, PF_ANY)
__SETPAGEFLAG(Head, head, PF_ANY)
        return test_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags.f);

Now, PG_private is a flag we're trying to get rid of -- it should be
identical to (folio->private != NULL), so I haven't made any effort
to convert that from being PF_ANY.  I'm not too unhappy that your chatbot
doesn't talk about PG_private, but a more full answer would include
mention of this.

PG_hwpoison and PG_anon_exclusive will remain per-page state in a
memdesc world, and there's a plan to handle those, so there's no need to
eliminate them.

PG_vmemmap_self_hosted is a very, very internal flag.  It's OK to not
know about it.

PG_head has to remain per-page state for now for obvious reasons ;-)
In a memdesc word, there will be no way to ask if a page is the first
page of an allocation, so this flag will not be needed.

I believe there are some subtleties around PG_hwpoison and hugetlb that
are not fully captured above, but I'm not convinced of my ability to
state definitely what they currently are, so I'll leve that for somebody
else to do.

---

Looking through your prompts, there are definitely some conditions that
could be profitably added.  For example, pages which are mapped into
page tables must be PG_uptodate (we have various assertions in the MM
code that this is true and they occasionally trigger).


  reply	other threads:[~2025-12-30 20:51 UTC|newest]

Thread overview: 149+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-17  7:27 Qi Zheng
2025-12-17  7:27 ` [PATCH v2 01/28] mm: memcontrol: remove dead code of checking parent memory cgroup Qi Zheng
2025-12-18 23:31   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 02/28] mm: workingset: use folio_lruvec() in workingset_refault() Qi Zheng
2025-12-18 23:32   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 03/28] mm: rename unlock_page_lruvec_irq and its variants Qi Zheng
2025-12-18  9:00   ` David Hildenbrand (Red Hat)
2025-12-18 23:34   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 04/28] mm: vmscan: prepare for the refactoring the move_folios_to_lru() Qi Zheng
2025-12-17 21:13   ` Johannes Weiner
2025-12-18  9:04   ` David Hildenbrand (Red Hat)
2025-12-18  9:31     ` Qi Zheng
2025-12-18 23:39   ` Shakeel Butt
2025-12-25  3:45   ` Chen Ridong
2025-12-17  7:27 ` [PATCH v2 05/28] mm: vmscan: refactor move_folios_to_lru() Qi Zheng
2025-12-19  0:04   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 06/28] mm: memcontrol: allocate object cgroup for non-kmem case Qi Zheng
2025-12-17 21:22   ` Johannes Weiner
2025-12-18  6:25     ` Qi Zheng
2025-12-19  0:23   ` Shakeel Butt
2025-12-25  6:23   ` Chen Ridong
2025-12-17  7:27 ` [PATCH v2 07/28] mm: memcontrol: return root object cgroup for root memory cgroup Qi Zheng
2025-12-17 21:28   ` Johannes Weiner
2025-12-19  0:39   ` Shakeel Butt
2025-12-26  1:03   ` Chen Ridong
2025-12-26  3:10     ` Muchun Song
2025-12-26  3:50       ` Chen Ridong
2025-12-26  3:58         ` Chen Ridong
2025-12-17  7:27 ` [PATCH v2 08/28] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio() Qi Zheng
2025-12-17 21:45   ` Johannes Weiner
2025-12-18  6:31     ` Qi Zheng
2025-12-19  2:09     ` Shakeel Butt
2025-12-19  3:53       ` Johannes Weiner
2025-12-19  3:56         ` Johannes Weiner
2025-12-17  7:27 ` [PATCH v2 09/28] buffer: prevent memory cgroup release in folio_alloc_buffers() Qi Zheng
2025-12-17 21:45   ` Johannes Weiner
2025-12-19  2:14   ` Shakeel Butt
2025-12-26  2:01     ` Chen Ridong
2025-12-17  7:27 ` [PATCH v2 10/28] writeback: prevent memory cgroup release in writeback module Qi Zheng
2025-12-17 22:08   ` Johannes Weiner
2025-12-19  2:30   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 11/28] mm: memcontrol: prevent memory cgroup release in count_memcg_folio_events() Qi Zheng
2025-12-17 22:11   ` Johannes Weiner
2025-12-19 23:31   ` Shakeel Butt
2025-12-26  2:12   ` Chen Ridong
2025-12-17  7:27 ` [PATCH v2 12/28] mm: page_io: prevent memory cgroup release in page_io module Qi Zheng
2025-12-17 22:12   ` Johannes Weiner
2025-12-19 23:44   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 13/28] mm: migrate: prevent memory cgroup release in folio_migrate_mapping() Qi Zheng
2025-12-17 22:14   ` Johannes Weiner
2025-12-18  9:09   ` David Hildenbrand (Red Hat)
2025-12-18  9:36     ` Qi Zheng
2025-12-18  9:43       ` David Hildenbrand (Red Hat)
2025-12-18 11:40         ` Qi Zheng
2025-12-18 11:56           ` David Hildenbrand (Red Hat)
2025-12-18 13:00             ` Qi Zheng
2025-12-18 13:04               ` David Hildenbrand (Red Hat)
2025-12-18 13:16                 ` Qi Zheng
2025-12-19  4:12                   ` Harry Yoo
2025-12-19  6:18                     ` David Hildenbrand (Red Hat)
2025-12-18 14:26     ` Johannes Weiner
2025-12-22  3:42       ` Qi Zheng
2025-12-30 20:07       ` David Hildenbrand (Red Hat)
2025-12-19 23:51   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 14/28] mm: mglru: prevent memory cgroup release in mglru Qi Zheng
2025-12-17 22:18   ` Johannes Weiner
2025-12-18  6:50     ` Qi Zheng
2025-12-20  0:58     ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 15/28] mm: memcontrol: prevent memory cgroup release in mem_cgroup_swap_full() Qi Zheng
2025-12-17 22:21   ` Johannes Weiner
2025-12-20  1:05   ` Shakeel Butt
2025-12-22  4:02     ` Qi Zheng
2025-12-26  2:29     ` Chen Ridong
2025-12-17  7:27 ` [PATCH v2 16/28] mm: workingset: prevent memory cgroup release in lru_gen_eviction() Qi Zheng
2025-12-17 22:23   ` Johannes Weiner
2025-12-20  1:06   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 17/28] mm: thp: prevent memory cgroup release in folio_split_queue_lock{_irqsave}() Qi Zheng
2025-12-17 22:27   ` Johannes Weiner
2025-12-20  1:11     ` Shakeel Butt
2025-12-22  3:33       ` Qi Zheng
2025-12-18  9:10   ` David Hildenbrand (Red Hat)
2025-12-17  7:27 ` [PATCH v2 18/28] mm: zswap: prevent memory cgroup release in zswap_compress() Qi Zheng
2025-12-17 22:27   ` Johannes Weiner
2025-12-20  1:14   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 19/28] mm: workingset: prevent lruvec release in workingset_refault() Qi Zheng
2025-12-17 22:30   ` Johannes Weiner
2025-12-18  6:57     ` Qi Zheng
2025-12-17  7:27 ` [PATCH v2 20/28] mm: zswap: prevent lruvec release in zswap_folio_swapin() Qi Zheng
2025-12-17 22:33   ` Johannes Weiner
2025-12-18  7:09     ` Qi Zheng
2025-12-18 13:02       ` Johannes Weiner
2025-12-20  1:23   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 21/28] mm: swap: prevent lruvec release in lru_gen_clear_refs() Qi Zheng
2025-12-17 22:34   ` Johannes Weiner
2025-12-20  1:24   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 22/28] mm: workingset: prevent lruvec release in workingset_activation() Qi Zheng
2025-12-17 22:36   ` Johannes Weiner
2025-12-20  1:25   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 23/28] mm: memcontrol: prepare for reparenting LRU pages for lruvec lock Qi Zheng
2025-12-18 13:00   ` Johannes Weiner
2025-12-18 13:17     ` Qi Zheng
2025-12-20  2:03   ` Shakeel Butt
2025-12-23  6:14     ` Qi Zheng
2025-12-17  7:27 ` [PATCH v2 24/28] mm: vmscan: prepare for reparenting traditional LRU folios Qi Zheng
2025-12-18 13:32   ` Johannes Weiner
2025-12-22  3:55     ` Qi Zheng
2025-12-17  7:27 ` [PATCH v2 25/28] mm: vmscan: prepare for reparenting MGLRU folios Qi Zheng
2025-12-17  7:27 ` [PATCH v2 26/28] mm: memcontrol: refactor memcg_reparent_objcgs() Qi Zheng
2025-12-18 13:45   ` Johannes Weiner
2025-12-22  3:56     ` Qi Zheng
2025-12-17  7:27 ` [PATCH v2 27/28] mm: memcontrol: eliminate the problem of dying memory cgroup for LRU folios Qi Zheng
2025-12-18 14:06   ` Johannes Weiner
2025-12-22  3:59     ` Qi Zheng
2025-12-17  7:27 ` [PATCH v2 28/28] mm: lru: add VM_WARN_ON_ONCE_FOLIO to lru maintenance helpers Qi Zheng
2025-12-18 14:07   ` Johannes Weiner
2025-12-23 20:04 ` [PATCH v2 00/28] Eliminate Dying Memory Cgroup Yosry Ahmed
2025-12-23 23:20   ` Shakeel Butt
2025-12-24  0:07     ` Yosry Ahmed
2025-12-24  0:36       ` Shakeel Butt
2025-12-24  0:43         ` Yosry Ahmed
2025-12-24  0:58           ` Shakeel Butt
2025-12-29  9:42             ` Qi Zheng
2025-12-29 10:52               ` Michal Koutný
2025-12-29  7:48     ` Qi Zheng
2025-12-29  9:35       ` Harry Yoo
2025-12-29  9:46         ` Qi Zheng
2025-12-29 10:53         ` Michal Koutný
2025-12-24  8:43   ` Harry Yoo
2025-12-24 14:51     ` Yosry Ahmed
2025-12-26 11:24       ` Harry Yoo
2025-12-30  1:36 ` Roman Gushchin
2025-12-30  2:44   ` Qi Zheng
2025-12-30  4:20     ` Roman Gushchin
2025-12-30  4:25       ` Qi Zheng
2025-12-30  4:48         ` Shakeel Butt
2025-12-30 16:46           ` Zi Yan
2025-12-30 18:13             ` Shakeel Butt
2025-12-30 19:18               ` Chris Mason
2025-12-30 20:51                 ` Matthew Wilcox [this message]
2025-12-30 21:10                   ` Chris Mason
2025-12-30 22:30                     ` Roman Gushchin
2025-12-30 22:03                   ` Roman Gushchin
2025-12-30 21:07                 ` Zi Yan
2025-12-30 19:34             ` Roman Gushchin
2025-12-30 21:13               ` Zi Yan
2025-12-30  4:01   ` Shakeel Butt
2025-12-30  4:11     ` Roman Gushchin
2025-12-30 18:36       ` Shakeel Butt
2025-12-30 20:47         ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aVQ7RwxRaXC5kAG2@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=apais@linux.microsoft.com \
    --cc=axelrasmussen@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=clm@fb.com \
    --cc=clm@meta.com \
    --cc=david@kernel.org \
    --cc=hamzamahfooz@linux.microsoft.com \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=hughd@google.com \
    --cc=imran.f.khan@oracle.com \
    --cc=kamalesh.babulal@oracle.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=qi.zheng@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox