From: Barry Song <21cnbao@gmail.com>
To: Leno Hou <lenohou@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Jialing Wang <wjl.linux@gmail.com>,
Yafang Shao <laoar.shao@gmail.com>, Yu Zhao <yuzhao@google.com>,
Kairui Song <ryncsn@gmail.com>, Bingfang Guo <bfguo@icloud.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4] mm/mglru: fix cgroup OOM during MGLRU state switching
Date: Thu, 19 Mar 2026 05:29:18 +0800 [thread overview]
Message-ID: <CAGsJ_4yZEOS+Ht38hqeUGnzLktuVUfvCgusqS4JQCb34eYLXYQ@mail.gmail.com> (raw)
In-Reply-To: <4807e460-054c-49ed-9792-f5000d7b3820@gmail.com>
On Wed, Mar 18, 2026 at 8:56 PM Leno Hou <lenohou@gmail.com> wrote:
>
> On 3/18/26 4:30 PM, Barry Song wrote:
> > On Wed, Mar 18, 2026 at 4:17 PM Leno Hou <lenohou@gmail.com> wrote:
> [...]
> >>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>>> index 33287ba4a500..88b9db06e331 100644
> >>>> --- a/mm/vmscan.c
> >>>> +++ b/mm/vmscan.c
> >>>> @@ -886,7 +886,7 @@ static enum folio_references folio_check_references(struct folio *folio,
> >>>> if (referenced_ptes == -1)
> >>>> return FOLIOREF_KEEP;
> >>>>
> >>>> - if (lru_gen_enabled()) {
> >>
> >> documentation as following:
> >>
> >> /*
> >> * During the MGLRU state transition (lru_gen_switching), we force
> >> * folios to follow the traditional active/inactive reference checking.
> >> *
> >> * While MGLRU is switching,the generational state of folios is in flux.
> >> * Falling back to the traditional logic (which relies on PG_referenced/
> >> * PG_active flags that are consistent across both mechanisms) provides
> >> * a stable, safe behavior for the folio until it is fully migrated back
> >> * to the traditional LRU lists. This avoids relying on potentially
> >> * inconsistent MGLRU generational metadata during the transition.
> >> */
> >>
> >>>> + if (lru_gen_enabled() && !lru_gen_draining()) {
> >>>
> >>> I’m curious what prompted you to do this.
> >>>
> >>> This feels a bit odd. I assume this effectively makes
> >>> folios on MGLRU, as well as those on active/inactive
> >>> lists, always follow the active/inactive logic.
> >>>
> >>> It might be fine, but it needs thorough documentation here.
> >>>
> >>> another approach would be:
> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>> index 33287ba4a500..91b60664b652 100644
> >>> --- a/mm/vmscan.c
> >>> +++ b/mm/vmscan.c
> >>> @@ -122,6 +122,9 @@ struct scan_control {
> >>> /* Proactive reclaim invoked by userspace */
> >>> unsigned int proactive:1;
> >>>
> >>> + /* Are we reclaiming from MGLRU */
> >>> + unsigned int lru_gen:1;
> >>> +
> >>> /*
> >>> * Cgroup memory below memory.low is protected as long as we
> >>> * don't threaten to OOM. If any cgroup is reclaimed at
> >>> @@ -886,7 +889,7 @@ static enum folio_references
> >>> folio_check_references(struct folio *folio,
> >>> if (referenced_ptes == -1)
> >>> return FOLIOREF_KEEP;
> >>>
> >>> - if (lru_gen_enabled()) {
> >>> + if (sc->lru_gen) {
> >>> if (!referenced_ptes)
> >>> return FOLIOREF_RECLAIM;
> >>>
> >>> This makes the logic perfectly correct (you know exactly
> >>> where your folios come from), but I’m not sure it’s worth it.
> >>>
> >>> Anyway, I’d like to understand why you always need to
> >>> use the active/inactive logic even for folios from MGLRU.
> >>> To me, it seems to work only by coincidence, which isn’t good.
> >>>
> >>> Thanks
> >>> Barry
> >>
> >> Hi Barry,
> >>
> >> I agree that using !lru_gen_draining() feels a bit like a fallback path.
> >> However, after considering your suggestion for sc->lru_gen, I’m
> >> concerned about the broad impact of modifying struct scan_control.Since
> >> lru_drain_core is a very transient state, I prefer a localized fix that
> >> doesn't propagate architectural changes throughout the entire reclaim stack.
> >>
> >> You mentioned that using the active/inactive logic feels like it works
> >> by 'coincidence'. To clarify, this is an intentional fallback: because
> >> the generational metadata in MGLRU becomes unreliable during draining,
> >> we intentionally downgrade these folios to the traditional logic. Since
> >> the PG_referenced and PG_active bits are maintained by the core VM and
> >> are consistent regardless of whether MGLRU is active, this fallback is
> >> technically sound and robust.
> >>
> >> I have added detailed documentation to the code to explain this design
> >> choice, clarifying that it's a deliberate transition strategy rather
> >> than a coincidence."
> >
> > Nope. You still haven’t explained why the active/inactive LRU
> > logic makes it work. MGLRU and active/inactive use different
> > methods to determine whether a folio is hot or cold. You’re
> > forcing active/inactive logic to decide hot/cold for an MGLRU
> > folio. It’s not that simple—PG_referenced isn’t maintained
> > by the core; it’s specific to active/inactive. See folio_mark_accessed().
> >
> > Best Regards
> > Barry
>
> Hi Barry,
>
> Thank you for your patience and for pointing out the version-specific
> nuances. You are absolutely correct—my previous assumption that the
> traditional reference-checking logic would serve as a robust fallback
> was fundamentally flawed.
>
> After re-examining the code in v7.0 and comparing it with older versions
> (e.g., v6.1), I see the core issue you highlighted:
>
> 1. Evolution of PG_referenced: In older kernels, lru_gen_inc_refs()
> often interacted with the PG_referenced bit, which inadvertently
> provided a 'coincidental' hint for the legacy reclaim path. However, in
> v7.0+, lru_gen_inc_refs() has evolved to use set_mask_bits() on the
> LRU_REFS_MASK bitfield, and it no longer relies on or updates the legacy
> PG_referenced bit for MGLRU folios.
>
> 2. The Logic Flaw: When switching from MGLRU to the traditional LRU,
> these folios arrive at the legacy reclaim path with PG_referenced unset
> or stale. If I force them through the legacy folio_check_references()
> path, folio_test_clear_referenced(folio) predictably returns 0. The
> legacy path interprets this as a 'cold' folio, leading to premature
> reclamation. You are correct that forcing this active/inactive logic
> onto MGLRU folios is logically inconsistent.
>
>
> 3. My Revised Approach: Instead of attempting to patch
> folio_check_references() with a fallback logic, I have decided to keep
> the folio_check_references() logic unchanged.
>
> The system handles this transition safely through the kernel's existing
> reclaim loop and retry mechanisms:
>
> a) While MGLRU is draining, folios are moved back to the traditional
> LRU lists. Once migrated, these folios will naturally begin
> participating in the legacy reclaim path.
>
> b) Although some folios might be initially underestimated as 'cold'
> in the very first reclaim pass immediately after the switch, the
> kernel's reclaim loop will naturally re-evaluate them. As they are
> accessed, the standard legacy mechanism will correctly maintain the
> PG_referenced bit, and the system will converge to the correct state
> without needing an explicit fallback path or state-checking in
> folio_check_references().
>
>
> This approach avoids the logical corruption caused by forcing
> incompatible evaluation methods and relies on the natural convergence of
> the existing reclaim loop.
>
>
> Does this alignment with the existing reclaim mechanism address your
> concerns about logical consistency?
My gut feeling is that we probably don’t need to worry
too much about the accuracy of hot/cold evaluation during
switching, since the system is already in a volatile state
at that point. So as long as we avoid introducing unusual
logic—such as forcing active/inactive decisions onto MGLRU
folios—I’m fine with it.
Ideally, we would add an sc->lru_gen boolean so we know
exactly where the folios come from, rather than relying on
folio_lru_gen(folio) != -1, which can be misleading.
However, if this doesn’t bring much improvement, it may
not be worth increasing the complexity.
Thanks
Barry
next prev parent reply other threads:[~2026-03-18 21:29 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 17:43 Leno Hou via B4 Relay
2026-03-18 7:16 ` Barry Song
2026-03-18 8:16 ` Leno Hou
2026-03-18 8:30 ` Barry Song
2026-03-18 12:56 ` Leno Hou
2026-03-18 21:29 ` Barry Song [this message]
2026-03-19 3:14 ` Leno Hou
2026-03-18 12:59 ` Leno Hou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGsJ_4yZEOS+Ht38hqeUGnzLktuVUfvCgusqS4JQCb34eYLXYQ@mail.gmail.com \
--to=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=bfguo@icloud.com \
--cc=laoar.shao@gmail.com \
--cc=lenohou@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryncsn@gmail.com \
--cc=weixugc@google.com \
--cc=wjl.linux@gmail.com \
--cc=yuanchu@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox