From: Gregory Price <gourry@gourry.net>
To: Barry Song <21cnbao@gmail.com>
Cc: willy@infradead.org, axelrasmussen@google.com,
linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org,
ryncsn@gmail.com, weixugc@google.com, yuanchu@google.com
Subject: Re: [LSF/MM/BPF] Improving MGLRU
Date: Thu, 5 Mar 2026 02:31:09 -0500 [thread overview]
Message-ID: <aakxPYGgXMHFQf1K@gourry-fedora-PF4VCD3F> (raw)
In-Reply-To: <CAGsJ_4xsN2Kfa_f_WaZ-h9Ex7dHk6okyZxnW6oEcXW=kLXwLXw@mail.gmail.com>
On Thu, Mar 05, 2026 at 02:27:27PM +0800, Barry Song wrote:
... Just trimming before, promise i read everything ...
> 1. lru_gen_look_around — exploits spatial locality by scanning
> adjacent PTEs of a young PTE.
>
> 2. page table walks for aging — further exploit spatial locality.
> The aging path prefers walking page tables to look for young PTEs
> and promote hot pages.
>
> 3. fallback to the other type when one type has only two generations.
>
> 4. very aggressively promote mapped folios.
>
> 5. min_ttl_ms - thrashing prevention
>
> 6. gen, tier, bloom filter
>
> 7. Missing shrink_active_list() — the function to demote folios from
> active to inactive.
>
> 8. swappiness concept difference.
>
> Considering all of the above, I feel MGLRU is quite different from
> active/inactive. Trying to unify them seems like merging two
> completely different approaches. Still, active/inactive might have
> some useful lessons to learn from MGLRU, particularly on how to
> reduce reclamation cost.
>
I will preface this with: I'm not arguing to rip out MGLRU, but I do
want to take account of what I spent the last week digging through.
(and no, none of this is AI-written)
=======
You list here is more or less the same I came up with - and I poked
at bolting some of these onto the original LRU in the trivial sense.
I tried re-using the code on LRU with minimal modifications just to
see how it affected some really degenerate high-pressure scenarios.
I mostly found these features did nothing, too much, or straight up
caused LRU to fall over dead where it didn't before.
PTE scans
=======
PTE scans and look around are powerful, but really possibly TOO
powerful, that's why there's the bloom filter and the PID to prevent
MGLRU from over-correcting and saving more folios than it should.
As you point out, you also burn many more CPU cycles this way, just
not in the critical path. So if you have a core to burn it can be
fine, if you don't, then scanning might hurt more than help.
I do think there's merit to this approach and could be adopted into
LRU as an option. It does however *greatly* bias towards saving Anon
over Page Cache - and so that can be undesirable.
The PID controller
=======
The existence of the PID really suggests the whole mechanism is a bit
too over-engineered. You put PIDs in things to dampen corrective
actions to keep towards a steady goal.
Requiring a PID doesn't inspire confidence that we can reason about
how tweaking a particular behavior of MGLRU will affect the rest of
the system. In fact, it makes it difficult to know exactly what
effect you are having since there's built-in dynamicism.
e.g.:
a) LRU : folio_mark_accessed() -> promote if already referenced
b) MGLRU: folio_mark_accessed() -> increment a counter
What behavior do we change if increment +2 instead of +1?
Hard to know.
thrashing protection, bloom, intra-generation tiers, etc
=======
Many of these features appear to solve problems MGLRU invents.
Simpler is *generally* (but not always) better for reliability.
The PID is another example, but I put that in its own class.
Aging direction
=======
The fundamental difference in aging direction makes LRU/MGLRU
infeasible to collapse. At best you could pull SOME features into
LRU, but some features ONLY work because the aging differs so much.
example: Bolting generations onto LRU makes it unstable because you
can starve the oldest generation trivially during bursts.
So we've started by making LRU worse, and then setting off to solve
the problem we've created.
You can sort of see how MGLRU got developed naturally:
a) we want multiple generations
b) what do we do when the oldest generation is empty?
c) we can either cascade to the next generation and reclaim there, or
we can get fancy and start to treat aging as a sliding window
The engineering decisions all become pretty straight forward from there,
but you've started by creating a problem to solve.
=======
In my gut, MGLRU is trying to bolt hotness monitoring onto a coldness
tracking mechanism. It's ok if these problems require different systems
to solve efficiently/elegantly - they may in fact demand it.
But reiterating - I'm not of the snap opinion that it should be ripped
out, but I do think MGLRU's feature list raises more eyebrows that it
solves problems (for users, it certain solves some of its own problems).
> >
> > In a random test over the weekend where I turned everything but
> > multiple generations off (no page table scan, no bloom filter, etc -
> > MGLRU just defaults to a multi-gen FIFO) I found that streaming
> > workloads did better this way.
>
...
> Perhaps an eBPF-programmable LRU could also be a direction
> worth exploring. We could set different eBPF programs for
> different workloads? There is a project in this area:
>
> https://dl.acm.org/doi/pdf/10.1145/3731569.3764820
> https://github.com/cache-ext/cache_ext
>
I'm certain the eBPF folks would love this :P.
Though there's always the question of where your hook points are, and I
would question whether this scales, but certainly it's a cool idea.
~Gregory
next prev parent reply other threads:[~2026-03-05 7:31 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-19 17:25 [LSF/MM/BPF TOPIC] " Kairui Song
2026-02-20 18:24 ` Johannes Weiner
2026-02-21 6:03 ` Kairui Song
2026-02-26 1:55 ` Kalesh Singh
2026-02-26 3:06 ` Kairui Song
2026-02-26 10:10 ` wangzicheng
2026-02-26 15:54 ` Matthew Wilcox
2026-02-27 4:31 ` [LSF/MM/BPF] " Barry Song
2026-03-02 17:46 ` Gregory Price
2026-03-05 6:27 ` Barry Song
2026-03-05 7:31 ` Gregory Price [this message]
2026-02-27 17:55 ` [LSF/MM/BPF TOPIC] " Shakeel Butt
2026-02-27 18:50 ` Gregory Price
2026-03-03 1:31 ` Axel Rasmussen
2026-03-03 13:39 ` Shakeel Butt
2026-03-05 6:46 ` Chen Ridong
2026-03-03 1:30 ` Axel Rasmussen
2026-02-27 3:30 ` [LSF/MM/BPF] " Barry Song
2026-03-02 11:10 ` Kairui Song
2026-03-03 4:06 ` Barry Song
2026-02-27 7:11 ` [LSF/MM/BPF TOPIC] " David Rientjes
2026-02-27 10:29 ` Vernon Yang
2026-03-02 12:17 ` Kairui Song
-- strict thread matches above, loose matches on Subject: below --
2026-02-19 17:09 [LSF/MM/BPF] " Kairui Song
2026-02-24 17:19 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aakxPYGgXMHFQf1K@gourry-fedora-PF4VCD3F \
--to=gourry@gourry.net \
--cc=21cnbao@gmail.com \
--cc=axelrasmussen@google.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=ryncsn@gmail.com \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=yuanchu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox