Re: [PATCH v2 1/2] mm/mglru: fix cgroup OOM during MGLRU state switching

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Barry Song <21cnbao@gmail.com>
To: Leno Hou <lenohou@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	 Jialing Wang <wjl.linux@gmail.com>,
	Yafang Shao <laoar.shao@gmail.com>, Yu Zhao <yuzhao@google.com>,
	 Kairui Song <ryncsn@gmail.com>, Bingfang Guo <bfguo@icloud.com>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/2] mm/mglru: fix cgroup OOM during MGLRU state switching
Date: Fri, 13 Mar 2026 04:08:29 +0800	[thread overview]
Message-ID: <CAGsJ_4zeEpdGmDLKb8Z1NP7q+o2uqvzb1-u1oXZ8SaqOw_CJkg@mail.gmail.com> (raw)
In-Reply-To: <CAGQVrL9AmqOeGa4Ovv6v-0qdy_+4MeupEeU_HQ=-YWm5wRV_ag@mail.gmail.com>

On Fri, Mar 13, 2026 at 12:44 AM Leno Hou <lenohou@gmail.com> wrote:
>
> On 3/12/26 2:02 PM, Barry Song wrote:
> [...]
> >> This effectively eliminates the race window that previously triggered OOMs
> >> under high memory pressure.
> >
> > I don't think this eliminates the race window, but it does reduce it.
> > There is nothing preventing the draining state from changing while
> > you are shrinking.
> >
> Hi Barry，
>
> You raise a valid point about the race window.
>
> While the window exists, I believe the impact is effectively mitigated by
> The reclaim retry mechanism: Even if the reclaimer makes a suboptimal path
>  decision due to the race,  the kernel's reclaim loop (in do_try_to_free_pages)
> provides multiple retries. If the first pass fails to reclaim  memory
> due to the
> race or an inconsistent state, the subsequent retry will observe the updated
> draining state and correctly direct the scan to the appropriate LRU lists.
>
> Anyway，thanks correct me and i'll updating the commit message.

I’m fine with this race condition as long as it is clearly documented
in the commit message.

>
> >>
> >> The issue was consistently reproduced on v6.1.157 and v6.18.3 using
> >> a high-pressure memory cgroup (v1) environment.
> >>
> >> To: Andrew Morton <akpm@linux-foundation.org>
> >> To: Axel Rasmussen <axelrasmussen@google.com>
> >> To: Yuanchu Xie <yuanchu@google.com>
> >> To: Wei Xu <weixugc@google.com>
> >> To: Barry Song <21cnbao@gmail.com>
> >> To: Jialing Wang <wjl.linux@gmail.com>
> >> To: Yafang Shao <laoar.shao@gmail.com>
> >> To: Yu Zhao <yuzhao@google.com>
> >> To: Kairui Song <ryncsn@gmail.com>
> >> To: Bingfang Guo <bfguo@icloud.com>
> >> Cc: linux-mm@kvack.org
> >> Cc: linux-kernel@vger.kernel.org
> >> Signed-off-by: Leno Hou <lenohou@gmail.com>
> >> ---
> >>   include/linux/mm_inline.h |  5 +++++
> >>   mm/rmap.c                 |  2 +-
> >>   mm/swap.c                 | 14 ++++++++------
> >>   mm/vmscan.c               | 49 ++++++++++++++++++++++++++++++++++++++---------
> >>   4 files changed, 54 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> >> index fa2d6ba811b5..e6443e22bf67 100644
> >> --- a/include/linux/mm_inline.h
> >> +++ b/include/linux/mm_inline.h
> >> @@ -321,6 +321,11 @@ static inline bool lru_gen_in_fault(void)
> >>          return false;
> >>   }
> >>
> >> +static inline int folio_lru_gen(const struct folio *folio)
> >> +{
> >> +       return -1;
> >> +}
> >> +
> >>   static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
> >>   {
> >>          return false;
> >> diff --git a/mm/rmap.c b/mm/rmap.c
> >> index 0f00570d1b9e..488bcdca65ed 100644
> >> --- a/mm/rmap.c
> >> +++ b/mm/rmap.c
> >> @@ -958,7 +958,7 @@ static bool folio_referenced_one(struct folio *folio,
> >>                          return false;
> >>                  }
> >>
> >> -               if (lru_gen_enabled() && pvmw.pte) {
> >> +               if ((folio_lru_gen(folio) != -1) && pvmw.pte) {
> >
> > I am not quite sure if a folio's gen is set to -1 when it is isolated
> > from MGLRU for reclamation. If so, I don't think this would work.
> >
>
> Thanks for your feedback. You are absolutely right—relying solely on
> folio_lru_gen(folio) != -1 is insufficient because the generation tag
> is cleared during isolation in lru_gen_del_folio().
>
> In the current implementation, once a page is isolated, its generation
> information is lost, causing the logic to fall back to the traditional LRU
> path prematurely.
>
> To address this, I am changing the logic to:
> ```c
> @@ -966,7 +966,7 @@ static bool folio_referenced_one(struct folio *folio,
>                         nr = folio_pte_batch(folio, pvmw.pte, pteval, max_nr);
>                 }
>
> -               if (lru_gen_enabled() && pvmw.pte) {
> +               if (lru_gen_enabled() && !lru_gen_draining() && pvmw.pte) {
>                         if (lru_gen_look_around(&pvmw, nr))
>                                 referenced++;
>                 } else if (pvmw.pte) {
> ```
> The rationale is that lru_gen_look_around() is an MGLRU-specific optimization
> that promotes pages based on generation aging. During the draining
> transition, we should strictly rely on traditional RMAP-based reference
> auditing to ensure that pages are correctly migrated to traditional LRU
> lists without being spuriously promoted by MGLRU logic. This avoids
> mixing MGLRU's generation-based promotion with traditional LRU's 'active'
> status, preventing potential reclaim state inconsistencies during the
> transition period."

Right, using folio_lru_gen(folio) != -1 is dangerous because it
implicitly depends on whether the folio is still on the LRU list.
Even if we never switch between MGLRU and the active/inactive LRU,
the check can become invalid once the folio is removed from the LRU
for various reasons—for example, during isolate_folio().

So if possible, we should identify the real issue that led us to rely
on folio_lru_gen(folio) != -1 in the first place, and avoid inventing
this kind of unstable condition check.

Please DO NOT depend on folio_lru_gen(folio) != -1. If it is
unavoidable, limit its use to the smallest scope possible.

Thanks
Barry

next prev parent reply	other threads:[~2026-03-12 20:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-11 12:09 [PATCH v2 0/2] " Leno Hou via B4 Relay
2026-03-11 12:09 ` [PATCH v2 1/2] " Leno Hou via B4 Relay
2026-03-12  6:02   ` Barry Song
2026-03-12 16:44     ` Leno Hou
2026-03-12 20:08       ` Barry Song [this message]
2026-03-11 12:09 ` [PATCH v2 2/2] mm/mglru: maintain workingset refault context across state transitions Leno Hou via B4 Relay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGsJ_4zeEpdGmDLKb8Z1NP7q+o2uqvzb1-u1oXZ8SaqOw_CJkg@mail.gmail.com \
    --to=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=bfguo@icloud.com \
    --cc=laoar.shao@gmail.com \
    --cc=lenohou@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryncsn@gmail.com \
    --cc=weixugc@google.com \
    --cc=wjl.linux@gmail.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox