From: Leno Hou <lenohou@gmail.com>
To: Barry Song <21cnbao@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Jialing Wang <wjl.linux@gmail.com>,
Yafang Shao <laoar.shao@gmail.com>, Yu Zhao <yuzhao@google.com>,
Kairui Song <ryncsn@gmail.com>, Bingfang Guo <bfguo@icloud.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4] mm/mglru: fix cgroup OOM during MGLRU state switching
Date: Thu, 19 Mar 2026 11:14:10 +0800 [thread overview]
Message-ID: <f27eee3d-e8d7-420f-8525-b0f94706c0c3@gmail.com> (raw)
In-Reply-To: <CAGsJ_4yZEOS+Ht38hqeUGnzLktuVUfvCgusqS4JQCb34eYLXYQ@mail.gmail.com>
On 3/19/26 5:29 AM, Barry Song wrote:
> On Wed, Mar 18, 2026 at 8:56 PM Leno Hou <lenohou@gmail.com> wrote:
>>
>> On 3/18/26 4:30 PM, Barry Song wrote:
>>> On Wed, Mar 18, 2026 at 4:17 PM Leno Hou <lenohou@gmail.com> wrote:
>> [...]
>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>> index 33287ba4a500..88b9db06e331 100644
>>>>>> --- a/mm/vmscan.c
>>>>>> +++ b/mm/vmscan.c
>>>>>> @@ -886,7 +886,7 @@ static enum folio_references folio_check_references(struct folio *folio,
>>>>>> if (referenced_ptes == -1)
>>>>>> return FOLIOREF_KEEP;
>>>>>>
>>>>>> - if (lru_gen_enabled()) {
>>>>
>>>> documentation as following:
>>>>
>>>> /*
>>>> * During the MGLRU state transition (lru_gen_switching), we force
>>>> * folios to follow the traditional active/inactive reference checking.
>>>> *
>>>> * While MGLRU is switching,the generational state of folios is in flux.
>>>> * Falling back to the traditional logic (which relies on PG_referenced/
>>>> * PG_active flags that are consistent across both mechanisms) provides
>>>> * a stable, safe behavior for the folio until it is fully migrated back
>>>> * to the traditional LRU lists. This avoids relying on potentially
>>>> * inconsistent MGLRU generational metadata during the transition.
>>>> */
>>>>
>>>>>> + if (lru_gen_enabled() && !lru_gen_draining()) {
>>>>>
>>>>> I’m curious what prompted you to do this.
>>>>>
>>>>> This feels a bit odd. I assume this effectively makes
>>>>> folios on MGLRU, as well as those on active/inactive
>>>>> lists, always follow the active/inactive logic.
>>>>>
>>>>> It might be fine, but it needs thorough documentation here.
>>>>>
>>>>> another approach would be:
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> index 33287ba4a500..91b60664b652 100644
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -122,6 +122,9 @@ struct scan_control {
>>>>> /* Proactive reclaim invoked by userspace */
>>>>> unsigned int proactive:1;
>>>>>
>>>>> + /* Are we reclaiming from MGLRU */
>>>>> + unsigned int lru_gen:1;
>>>>> +
>>>>> /*
>>>>> * Cgroup memory below memory.low is protected as long as we
>>>>> * don't threaten to OOM. If any cgroup is reclaimed at
>>>>> @@ -886,7 +889,7 @@ static enum folio_references
>>>>> folio_check_references(struct folio *folio,
>>>>> if (referenced_ptes == -1)
>>>>> return FOLIOREF_KEEP;
>>>>>
>>>>> - if (lru_gen_enabled()) {
>>>>> + if (sc->lru_gen) {
>>>>> if (!referenced_ptes)
>>>>> return FOLIOREF_RECLAIM;
>>>>>
>>>>> This makes the logic perfectly correct (you know exactly
>>>>> where your folios come from), but I’m not sure it’s worth it.
>>>>>
>>>>> Anyway, I’d like to understand why you always need to
>>>>> use the active/inactive logic even for folios from MGLRU.
>>>>> To me, it seems to work only by coincidence, which isn’t good.
>>>>>
>>>>> Thanks
>>>>> Barry
>>>>
>>>> Hi Barry,
>>>>
>>>> I agree that using !lru_gen_draining() feels a bit like a fallback path.
>>>> However, after considering your suggestion for sc->lru_gen, I’m
>>>> concerned about the broad impact of modifying struct scan_control.Since
>>>> lru_drain_core is a very transient state, I prefer a localized fix that
>>>> doesn't propagate architectural changes throughout the entire reclaim stack.
>>>>
>>>> You mentioned that using the active/inactive logic feels like it works
>>>> by 'coincidence'. To clarify, this is an intentional fallback: because
>>>> the generational metadata in MGLRU becomes unreliable during draining,
>>>> we intentionally downgrade these folios to the traditional logic. Since
>>>> the PG_referenced and PG_active bits are maintained by the core VM and
>>>> are consistent regardless of whether MGLRU is active, this fallback is
>>>> technically sound and robust.
>>>>
>>>> I have added detailed documentation to the code to explain this design
>>>> choice, clarifying that it's a deliberate transition strategy rather
>>>> than a coincidence."
>>>
>>> Nope. You still haven’t explained why the active/inactive LRU
>>> logic makes it work. MGLRU and active/inactive use different
>>> methods to determine whether a folio is hot or cold. You’re
>>> forcing active/inactive logic to decide hot/cold for an MGLRU
>>> folio. It’s not that simple—PG_referenced isn’t maintained
>>> by the core; it’s specific to active/inactive. See folio_mark_accessed().
>>>
>>> Best Regards
>>> Barry
>>
>> Hi Barry,
>>
>> Thank you for your patience and for pointing out the version-specific
>> nuances. You are absolutely correct—my previous assumption that the
>> traditional reference-checking logic would serve as a robust fallback
>> was fundamentally flawed.
>>
>> After re-examining the code in v7.0 and comparing it with older versions
>> (e.g., v6.1), I see the core issue you highlighted:
>>
>> 1. Evolution of PG_referenced: In older kernels, lru_gen_inc_refs()
>> often interacted with the PG_referenced bit, which inadvertently
>> provided a 'coincidental' hint for the legacy reclaim path. However, in
>> v7.0+, lru_gen_inc_refs() has evolved to use set_mask_bits() on the
>> LRU_REFS_MASK bitfield, and it no longer relies on or updates the legacy
>> PG_referenced bit for MGLRU folios.
>>
>> 2. The Logic Flaw: When switching from MGLRU to the traditional LRU,
>> these folios arrive at the legacy reclaim path with PG_referenced unset
>> or stale. If I force them through the legacy folio_check_references()
>> path, folio_test_clear_referenced(folio) predictably returns 0. The
>> legacy path interprets this as a 'cold' folio, leading to premature
>> reclamation. You are correct that forcing this active/inactive logic
>> onto MGLRU folios is logically inconsistent.
>>
>>
>> 3. My Revised Approach: Instead of attempting to patch
>> folio_check_references() with a fallback logic, I have decided to keep
>> the folio_check_references() logic unchanged.
>>
>> The system handles this transition safely through the kernel's existing
>> reclaim loop and retry mechanisms:
>>
>> a) While MGLRU is draining, folios are moved back to the traditional
>> LRU lists. Once migrated, these folios will naturally begin
>> participating in the legacy reclaim path.
>>
>> b) Although some folios might be initially underestimated as 'cold'
>> in the very first reclaim pass immediately after the switch, the
>> kernel's reclaim loop will naturally re-evaluate them. As they are
>> accessed, the standard legacy mechanism will correctly maintain the
>> PG_referenced bit, and the system will converge to the correct state
>> without needing an explicit fallback path or state-checking in
>> folio_check_references().
>>
>>
>> This approach avoids the logical corruption caused by forcing
>> incompatible evaluation methods and relies on the natural convergence of
>> the existing reclaim loop.
>>
>>
>> Does this alignment with the existing reclaim mechanism address your
>> concerns about logical consistency?
>
> My gut feeling is that we probably don’t need to worry
> too much about the accuracy of hot/cold evaluation during
> switching, since the system is already in a volatile state
> at that point. So as long as we avoid introducing unusual
> logic—such as forcing active/inactive decisions onto MGLRU
> folios—I’m fine with it.
>
> Ideally, we would add an sc->lru_gen boolean so we know
> exactly where the folios come from, rather than relying on
> folio_lru_gen(folio) != -1, which can be misleading.
> However, if this doesn’t bring much improvement, it may
> not be worth increasing the complexity.
>
Hi Barry,
Thank you for the guidance~
To address your concerns regarding readability and maintainability:
1. Naming: I'll rename the transition state to lru_gen_switching
(instead of draining) to better reflect its purpose.
2. Documentation: I'll add the documentation to explain why we disable
look-around optimization when LRU is switching.
This approach keeps the patch minimal and fix for the OOM issue during
switching without introducing complexity.
See next v5 patch later.
---
Best regards,
Leno Hou
next prev parent reply other threads:[~2026-03-19 3:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 17:43 Leno Hou via B4 Relay
2026-03-18 7:16 ` Barry Song
2026-03-18 8:16 ` Leno Hou
2026-03-18 8:30 ` Barry Song
2026-03-18 12:56 ` Leno Hou
2026-03-18 21:29 ` Barry Song
2026-03-19 3:14 ` Leno Hou [this message]
2026-03-18 12:59 ` Leno Hou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f27eee3d-e8d7-420f-8525-b0f94706c0c3@gmail.com \
--to=lenohou@gmail.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=bfguo@icloud.com \
--cc=laoar.shao@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryncsn@gmail.com \
--cc=weixugc@google.com \
--cc=wjl.linux@gmail.com \
--cc=yuanchu@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox