From: Barry Song <21cnbao@gmail.com>
To: lenohou@gmail.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Jialing Wang <wjl.linux@gmail.com>,
Yafang Shao <laoar.shao@gmail.com>, Yu Zhao <yuzhao@google.com>,
Kairui Song <ryncsn@gmail.com>, Bingfang Guo <bfguo@icloud.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/2] mm/mglru: fix cgroup OOM during MGLRU state switching
Date: Tue, 17 Mar 2026 15:52:06 +0800 [thread overview]
Message-ID: <CAGsJ_4wvmZMnLp9V68zOAs9xze3zpSrNGhGZT9gSi_9-93LjgA@mail.gmail.com> (raw)
In-Reply-To: <20260316-b4-switch-mglru-v2-v3-1-c846ce9a2321@gmail.com>
On Mon, Mar 16, 2026 at 1:56 PM Leno Hou via B4 Relay
<devnull+lenohou.gmail.com@kernel.org> wrote:
>
> From: Leno Hou <lenohou@gmail.com>
>
> When the Multi-Gen LRU (MGLRU) state is toggled dynamically, a race
> condition exists between the state switching and the memory reclaim path.
> This can lead to unexpected cgroup OOM kills, even when plenty of
> reclaimable memory is available.
>
> Problem Description
> ==================
>
> The issue arises from a "reclaim vacuum" during the transition.
>
> 1. When disabling MGLRU, lru_gen_change_state() sets lrugen->enabled to
> false before the pages are drained from MGLRU lists back to traditional
> LRU lists.
> 2. Concurrent reclaimers in shrink_lruvec() see lrugen->enabled as false
> and skip the MGLRU path.
> 3. However, these pages might not have reached the traditional LRU lists
> yet, or the changes are not yet visible to all CPUs due to a lack
> of synchronization.
> 4. get_scan_count() subsequently finds traditional LRU lists empty,
> concludes there is no reclaimable memory, and triggers an OOM kill.
>
> A similar race can occur during enablement, where the reclaimer sees the
> new state but the MGLRU lists haven't been populated via fill_evictable()
> yet.
>
> Solution
> ========
>
> Introduce a 'draining' state (`lru_drain_core`) to bridge the transition.
> When transitioning, the system enters this intermediate state where
> the reclaimer is forced to attempt both MGLRU and traditional reclaim
> paths sequentially. This ensures that folios remain visible to at least
> one reclaim mechanism until the transition is fully materialized across
> all CPUs.
>
> Changes
> =======
>
> v3:
> - Rebase onto mm-new branch for queue testing
> - Don't look around while draining
> - Fix Barry Song's comment
>
> v2:
> - Repalce with a static branch `lru_drain_core` to track the transition
> state.
> - Ensures all LRU helpers correctly identify page state by checking
> folio_lru_gen(folio) != -1 instead of relying solely on global flags.
> - Maintain workingset refault context across MGLRU state transitions
> - Fix build error when CONFIG_LRU_GEN is disabled.
>
> v1:
> - Use smp_store_release() and smp_load_acquire() to ensure the visibility
> of 'enabled' and 'draining' flags across CPUs.
> - Modify shrink_lruvec() to allow a "joint reclaim" period. If an lruvec
> is in the 'draining' state, the reclaimer will attempt to scan MGLRU
> lists first, and then fall through to traditional LRU lists instead
> of returning early. This ensures that folios are visible to at least
> one reclaim path at any given time.
>
> Race & Mitigation
> ================
>
> A race window exists between checking the 'draining' state and performing
> the actual list operations. For instance, a reclaimer might observe the
> draining state as false just before it changes, leading to a suboptimal
> reclaim path decision.
>
> However, this impact is effectively mitigated by the kernel's reclaim
> retry mechanism (e.g., in do_try_to_free_pages). If a reclaimer pass fails
> to find eligible folios due to a state transition race, subsequent retries
> in the loop will observe the updated state and correctly direct the scan
> to the appropriate LRU lists. This ensures the transient inconsistency
> does not escalate into a terminal OOM kill.
>
> This effectively reduce the race window that previously triggered OOMs
> under high memory pressure.
>
> To: Andrew Morton <akpm@linux-foundation.org>
> To: Axel Rasmussen <axelrasmussen@google.com>
> To: Yuanchu Xie <yuanchu@google.com>
> To: Wei Xu <weixugc@google.com>
> To: Barry Song <21cnbao@gmail.com>
> To: Jialing Wang <wjl.linux@gmail.com>
> To: Yafang Shao <laoar.shao@gmail.com>
> To: Yu Zhao <yuzhao@google.com>
> To: Kairui Song <ryncsn@gmail.com>
> To: Bingfang Guo <bfguo@icloud.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Leno Hou <lenohou@gmail.com>
> ---
> include/linux/mm_inline.h | 16 ++++++++++++++++
> mm/rmap.c | 2 +-
> mm/swap.c | 15 +++++++++------
> mm/vmscan.c | 38 +++++++++++++++++++++++++++++---------
> 4 files changed, 55 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> index ad50688d89db..16ac700dac9c 100644
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -102,6 +102,12 @@ static __always_inline enum lru_list folio_lru_list(const struct folio *folio)
>
> #ifdef CONFIG_LRU_GEN
>
> +static inline bool lru_gen_draining(void)
> +{
> + DECLARE_STATIC_KEY_FALSE(lru_drain_core);
> +
> + return static_branch_unlikely(&lru_drain_core);
> +}
> #ifdef CONFIG_LRU_GEN_ENABLED
> static inline bool lru_gen_enabled(void)
> {
> @@ -316,11 +322,21 @@ static inline bool lru_gen_enabled(void)
> return false;
> }
>
> +static inline bool lru_gen_draining(void)
> +{
> + return false;
> +}
> +
> static inline bool lru_gen_in_fault(void)
> {
> return false;
> }
>
> +static inline int folio_lru_gen(const struct folio *folio)
> +{
> + return -1;
> +}
> +
> static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
> {
> return false;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 6398d7eef393..0b5f663f3062 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -966,7 +966,7 @@ static bool folio_referenced_one(struct folio *folio,
> nr = folio_pte_batch(folio, pvmw.pte, pteval, max_nr);
> }
>
> - if (lru_gen_enabled() && pvmw.pte) {
> + if (lru_gen_enabled() && !lru_gen_draining() && pvmw.pte) {
> if (lru_gen_look_around(&pvmw, nr))
> referenced++;
> } else if (pvmw.pte) {
> diff --git a/mm/swap.c b/mm/swap.c
> index 5cc44f0de987..ecb192c02d2e 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -462,7 +462,7 @@ void folio_mark_accessed(struct folio *folio)
> {
> if (folio_test_dropbehind(folio))
> return;
> - if (lru_gen_enabled()) {
> + if (folio_lru_gen(folio) != -1) {
I still feel this is quite dangerous. A folio could be on the
lru_cache rather than on MGLRU’s lists.
This still changes MGLRU’s behavior, much like your v2, which
effectively disabled look_around.
I mentioned this in v2: please avoid depending on
folio_lru_gen() == -1 unless it is absolutely necessary and you are
certain the folio is on an LRU list.
This is hard to verify case by case. From a design perspective,
relying on folio_lru_gen() == -1 is not appropriate.
Thanks
Barry
next prev parent reply other threads:[~2026-03-17 7:52 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-15 18:18 [PATCH v3 0/2] " Leno Hou via B4 Relay
2026-03-15 18:18 ` [PATCH v3 1/2] " Leno Hou via B4 Relay
2026-03-17 7:52 ` Barry Song [this message]
2026-03-15 18:18 ` [PATCH v3 2/2] mm/mglru: maintain workingset refault context across state transitions Leno Hou via B4 Relay
2026-03-18 3:30 ` Kairui Song
2026-03-18 3:41 ` Leno Hou
2026-03-18 8:14 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGsJ_4wvmZMnLp9V68zOAs9xze3zpSrNGhGZT9gSi_9-93LjgA@mail.gmail.com \
--to=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=bfguo@icloud.com \
--cc=laoar.shao@gmail.com \
--cc=lenohou@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryncsn@gmail.com \
--cc=weixugc@google.com \
--cc=wjl.linux@gmail.com \
--cc=yuanchu@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox