linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: David Rientjes <rientjes@google.com>,
	axelrasmussen@google.com, linux-mm@kvack.org,
	 lsf-pc@lists.linux-foundation.org, weixugc@google.com,
	yuanchu@google.com
Subject: Re: [LSF/MM/BPF] Improving MGLRU
Date: Tue, 3 Mar 2026 12:06:20 +0800	[thread overview]
Message-ID: <CAGsJ_4ypfGxp5vbNaOt2i2qDmy9tCJuemDjMjwWr6hSv2RtYVA@mail.gmail.com> (raw)
In-Reply-To: <CAMgjq7CDgap5pe_xVemow+SbaRW+ERXZBZvkymhFWZQZdDM2Kw@mail.gmail.com>

On Mon, Mar 2, 2026 at 7:10 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Fri, Feb 27, 2026 at 11:30 AM Barry Song <21cnbao@gmail.com> wrote:
> >
> > > 4. MGLRU's swappiness is kind of useless in some situations compared to
> > >   Active / Inactive LRU, since its force protects the youngest two gen, so
> > >   quite often we can only reclaim one type of folios. To workaround that, the
> > >   user usually runs force aging before reclaim. So, can we just remove the
> > >   force protection of the youngest two gens?
> >
> > I guess not—MGLRU needs at least two generations to function,
> > similar to active and inactive lists, meaning it requires two lists.
>
> Hi Barry,
>
> You are right. But I think that doesn't mean we can't never reclaim
> the folios in the oldest gen? Or maybe, just let the kernel itself

I think we could reclaim the oldest generation even when
only two generations remain. However, that would make
MGLRU more conceptually confusing. We currently map the
youngest two generations to “active” and the oldest two
to “inactive.”

If there are only two generations, they effectively both
fall into the “active” category, so reclaiming one of them
would mean reclaiming from “active,” which feels rather
counterintuitive to me.

So I’d prefer a two-step approach:
1. Age pages to form inactive generations.
2. Reclaim the “inactive” generations

rather than reclaiming active generations directly.

> perform aging when one type of folios is not reclaimable.

I would prefer to avoid having only two generations.
Ideally, new generations should be created before reaching
that point—similar to the active→inactive transition,
but driven by aging.

>
> We have an internal workaround for forces aging, and waits for sync
> aging if one type of folios are not reclaimable (without the wait, we
> still hit the MIN_NR_GEN protect again since aging is not finished).
> And without the MIN_NR_GEN protection we might end up over reclaiming
> without aging.

I see your point—I did exactly the same thing in Android.
However, there’s a significant problem. If anon has two
generations and files have four, they end up sharing
generations. To age anon, we would also need to move file
folios between generations; otherwise, the hottest and
oldest generations would overlap, causing cold/hot
inversion. Furthermore, in inc_min_seq(), moving folios
means the oldest generation gets pushed into the second-
oldest generation:

new_gen = folio_inc_gen(lruvec, folio, false);
list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]);

This is far from ideal, as it still mixes cold and hot pages
to some extent. Could we keep anon and file generations
separate instead? I feel this is a strong requirement and
likely the first step toward making swappiness work properly.

>
> The problem with that is that the OOM killer became very slow to
> trigger since aging is costly, so the system will hang for minutes
> before OOM is triggered when it should get triggered immediately.

There’s a shrink_active_list() in active/inactive to
prevent inactivation starvation. We likely need something
similar.

A key difference between MGLRU and active/inactive is that
active/inactive performs demotion—moving pages from
active to inactive, with the ability to specify anon or file
types—whereas MGLRU performs promotion, scanning PTEs
to identify young folios for new generations without
distinguishing between anon and file. This could slow down
MGLRU aging exactly when faster memory reclamation is
needed?

Of course, we could treat mm_state as null and skip
walk_mm() for scanning PTEs, but this would make aging
purely a matter of moving folios, without any basis in
whether the PTEs are actually young?

>
> And for the OOM part I saw David Rientjes also mentioned the TTL
> config in MGLRU, I do think TTL is a good idea, we just need to figure
> out a good way to make better use of that.
>
> I think a feasible solution might be (just idea): implement async
> aging; decouple aging and reclaim, reclaim just keep shrinking
> whatever is oldest; and optionally improve thrashing and OOM with TTL.

I’m not sure we want to add a separate thread for async
aging, since kswapd is already quite complex. Could async
aging be handled mainly by kswapd instead? For direct
reclamation cases, if aging is urgent, we might just skip
walk_mm(), or alternatively call inc_max_seq() directly. On
Android, we once completely disabled walk_mm() and only
observed positive effects, which also reduced mmap_lock
contention. So I’m thinking we could consider disabling
walk_mm() by default on hardware that lacks non-leaf
(e.g., PMD) access bits.

I agree that we can leverage TTL to improve OOM handling
and reduce thrashing.

Thanks
Barry


  reply	other threads:[~2026-03-03  4:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19 17:25 [LSF/MM/BPF TOPIC] " Kairui Song
2026-02-20 18:24 ` Johannes Weiner
2026-02-21  6:03   ` Kairui Song
2026-02-26  1:55 ` Kalesh Singh
2026-02-26  3:06   ` Kairui Song
2026-02-26 10:10     ` wangzicheng
2026-02-26 15:54 ` Matthew Wilcox
2026-02-27  4:31   ` [LSF/MM/BPF] " Barry Song
2026-03-02 17:46     ` Gregory Price
2026-02-27 17:55   ` [LSF/MM/BPF TOPIC] " Shakeel Butt
2026-02-27 18:50     ` Gregory Price
2026-03-03  1:31     ` Axel Rasmussen
2026-03-03  1:30   ` Axel Rasmussen
2026-02-27  3:30 ` [LSF/MM/BPF] " Barry Song
2026-03-02 11:10   ` Kairui Song
2026-03-03  4:06     ` Barry Song [this message]
2026-02-27  7:11 ` [LSF/MM/BPF TOPIC] " David Rientjes
2026-02-27 10:29 ` Vernon Yang
2026-03-02 12:17   ` Kairui Song
  -- strict thread matches above, loose matches on Subject: below --
2026-02-19 17:09 [LSF/MM/BPF] " Kairui Song
2026-02-24 17:19 ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGsJ_4ypfGxp5vbNaOt2i2qDmy9tCJuemDjMjwWr6hSv2RtYVA@mail.gmail.com \
    --to=21cnbao@gmail.com \
    --cc=axelrasmussen@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=rientjes@google.com \
    --cc=ryncsn@gmail.com \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox