Re: [LSF/MM/BPF TOPIC] Improving MGLRU

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Kairui Song <ryncsn@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: lsf-pc@lists.linux-foundation.org,
	 Chen Ridong <chenridong@huaweicloud.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: [LSF/MM/BPF TOPIC] Improving MGLRU
Date: Sat, 21 Feb 2026 14:03:43 +0800	[thread overview]
Message-ID: <CAMgjq7D8uPq3CpCksdqZYh0kybOiWojsZnfLPTYfP-UCuw=zwA@mail.gmail.com> (raw)
In-Reply-To: <aZim2hT0nNjcRYVG@cmpxchg.org>

On Sat, Feb 21, 2026 at 2:24 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, Feb 20, 2026 at 01:25:33AM +0800, Kairui Song wrote:
> > Hi All,
> >
> > Apologies I forgot to add the proper tag in the previous email so
> > resending this.
> >
> > MGLRU has been introduced in the mainline for years, but we still have two LRUs
> > today. There are many reasons MGLRU is still not the only LRU implementation in
> > the kernel.
> >
> > And I've been looking at a few major issues here:
> >
> > 1. Page flag usage: MGLRU uses many more flags (3+ more) than Active/Inactive
> > LRU.
> > 2. Regressions: MGLRU might cause regression, even though in many workloads it
> > outperforms Active/Inactive by a lot.
> > 3. Metrics: MGLRU makes some metrics work differently, for example: PSI,
> > /proc/meminfo.
> > 4. Some reclaim behavior is less controllable.
>
> I would be very interested in discussing this topic as well.

Thanks, glad to hear that!

>
> > 2. Regressions: Currently regression is a more major problem for us.
> >    From our perspective, almost all regressions are caused by an under- or
> >    overprotected file cache. MGLRU's PID protection either gets too aggressive
> >    or too passive or just have a too long latency. To fix that, I'd propose a
> >    LFU-like design and relax the PID's aggressiveness to make it much more
> >    proactive and effective for file folios. The idea is always use 3 bits in
> >    the page flags to count the referenced time (which would also replace
> >    PG_workingset and PG_referenced). Initial tests showed a 30% reduction of
> >    refaults, and many regressions are gone. A flow chart of how the MGLRU idea
> >    might work:
>
> Are you referring to refaults on the page cache side, or swapins?
>
> Last time we evaluated MGLRU on Meta workloads, we noticed that it
> tends to do better with zswap, but worse with disk swap. It seemed to
> just prefer reclaiming anon, period.
>
> For the balancing between anon and file to work well in all
> situations, it needs to have a notion of backend speed and factor in
> the respective cost of misses on each side.

A bit more than that. When there is no swap, MGLRU still performs
worse in some workloads like MongoDB. From what I've noticed that's
because the PID protection is a bit too passive, and there is a force
protection in sort_folio which sometimes seems too aggressive.
Active/Inactive LRU will just move a folio to head if it's accessed
twice while in RAM, but MGLRU won't do so, as result hotter file
folios are evicted equally as the colder one until the PID gets
triggered, or still gets protected even if it hasn't been used for a
while. And by the time PID finally gets triggered, the workload might
has changed. This is fixable using the approach I mentioned though,
and it seems to be better than the Active/Inactive in all our known
cases after that, whether that is a good fix worth discussion.

I also notice Ridong has a series to apply a "heat" based reclaim,
which also looks interesting.

> >    Can we just ignore the shadow for anon folios? MGLRU basically activates
> >    anon folios unconditionally, especially if we combined with the LFU like
> >    idea above we might only want to track the 3 bit count, and get rid of
> >    the extra bit usage in the shadow. The eviction performance might be even
> >    better, and other components like swap table [3] will have more bits to use
> >    for better performance and more features.
>
> On the face of it, both of these sounds problematic to me. Why are
> anon pages special cased?
>
> The cost of reclaiming a page is:
>
>     reuse frequency * cost of a miss
>
> The *type* of the page is not all that meaningful for workload
> performance. The wait time is qualitatively the same.
>
> If you assume every refaulting anon is hot, it'll fall apart when the
> anon set is huge and has little locality.

Sorry I didn't make it clear. For MGLRU currently it already ignored
the shadow distance for re-activation. And yeah, basically all anons
are activated on fault, which turns out to be quite nice? None MGLRU
users considered that as a problem and in fact the performance looks
good.

Of course we can restore the old behavior to test the folio
against some distance (gen distance or eviction distance), or push it
further to only keep the reference bit (not completely ignore the
shadow, just only keep the reference bits, if the LFU + PID still
works well without the distance), and gain more performance and bits
to use.

BTW I tried to restore the refault distance behavior for both anon and
file folios sometime ago:
https://lwn.net/Articles/945266/

For file folios it indeed looked better, anon folios seems unchanged.
But later tests showed that it doesn't apply to all cases, and I think
something better can be used as suggested in this topic.

     prev parent reply	other threads:[~2026-02-21  6:04 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19 17:25 Kairui Song
2026-02-20 18:24 ` Johannes Weiner
2026-02-21  6:03   ` Kairui Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMgjq7D8uPq3CpCksdqZYh0kybOiWojsZnfLPTYfP-UCuw=zwA@mail.gmail.com' \
    --to=ryncsn@gmail.com \
    --cc=axelrasmussen@google.com \
    --cc=chenridong@huaweicloud.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox