From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Eric Naim <dnaim@cachyos.org>,
Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
David Hildenbrand <david@kernel.org>,
Michal Hocko <mhocko@kernel.org>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
Lorenzo Stoakes <ljs@kernel.org>, Barry Song <baohua@kernel.org>,
David Stevens <stevensd@google.com>,
Chen Ridong <chenridong@huaweicloud.com>,
Leno Hou <lenohou@gmail.com>, Yafang Shao <laoar.shao@gmail.com>,
Yu Zhao <yuzhao@google.com>,
Zicheng Wang <wangzicheng@honor.com>,
Kalesh Singh <kaleshsingh@google.com>,
Suren Baghdasaryan <surenb@google.com>,
Chris Li <chrisl@kernel.org>, Vernon Yang <vernon2gm@gmail.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling
Date: Sun, 29 Mar 2026 01:30:38 +0800 [thread overview]
Message-ID: <acgNCzRDVmSbXrOE@KASONG-MC4> (raw)
In-Reply-To: <CAMgjq7AQeP8maeMWNun=60oyq_KDu18MwXfGEyK4bwj_k92NgQ@mail.gmail.com>
On Wed, Mar 25, 2026 at 05:47:41PM +0800, Kairui Song wrote:
> On Wed, Mar 25, 2026 at 5:27 PM Eric Naim <dnaim@cachyos.org> wrote:
> >
> > On 3/25/26 1:47 PM, Kairui Song wrote:
> > > On Wed, Mar 25, 2026 at 1:04 PM Eric Naim <dnaim@cachyos.org> wrote:
> > >>
> > >> Hi Kairui,
> > >>
> > >> On 3/18/26 3:08 AM, Kairui Song via B4 Relay wrote:
> > >>> This series cleans up and slightly improves MGLRU's reclaim loop and
> > >>> dirty flush logic. As a result, we can see an up to ~50% reduce of file
> > >>> faults and 30% increase in MongoDB throughput with YCSB and no swap
> > >>> involved, other common benchmarks have no regression, and LOC is
> > >>> reduced, with less unexpected OOM in our production environment.
> > >>>
> > >
> > > ...
> > >
> > >>
> > >> I applied this patch set to 7.0-rc5 and noticed the system locking up when performing the below test.
> > >>
> > >> fallocate -l 5G 5G
> > >> while true; do tail /dev/zero; done
> > >> while true; do time cat 5G > /dev/null; sleep $(($(cat /sys/kernel/mm/lru_gen/min_ttl_ms)/1000+1)); done
> > >>
> > >> After reading [1], I suspect that this was because the system was using zram as swap, and yes if zram is disabled then the lock up does not occur.
> > >
> > > Hi Eric,
> > >
> > > Thanks for the report, I was about to send V2 but noticing your report
> > > I'll try to reproduce your issue first.
> > >
> > > So far I didn't notice any regression, is this an issue caused by this
> > > patch or is it an existing issue? I don't have any context about how
> > > you are doing the test. BTW the calculation in patch "mm/mglru:
> > > restructure the reclaim loop" needs to have a lowest bar
> > > "max(nr_to_scan, SWAP_CLUSTER_MAX)" for small machines, not sure if
> > > related but will add to V2.
> > >
> >
> > As of writing this, I got some new information that makes this a bit more confusing. The kernel that doesn't have the issue was patched with [1] as a means of protecting the working set (similar to lru_gen_min_ttl_ms).
> >
> > So this time on an unpatched kernel, the system still freezes but quickly recovers itself after about 2 seconds. With this patchset applied, the system freezes but it doesn't quickly recover (if at all).
> >
> > Curiously, I had the user test again but this time with lru_gen_min_ttl_ms = 100. With this set, the system doesn't freeze at all with or without this patchset.
>
> Ah thanks, that makes sense now, the downstream patch you mentioned
> limits the reclaim of file pages to avoid thrashing, and your test
> cases exhaust the memory on purpose which forces the kernel to reclaim
> all reclaimable folios including page cache.
>
> A thrashing page cache causes desktop hangs easily, using TTL is an
> effective way to avoid thrashing and trigger OOM early. That's why the
> problem is gone with lru_gen_min_ttl_ms = 100 or le9.
>
> > > And about the test you posted:
> > > while true; do tail /dev/zero; done
> > >
> > > I believe this will just consume all memory with zero pages and then
> > > get OOM killed, that's exactly what the test is meant to do. By lockup
> > > I'm not sure you mean since you mentioned OOM kill. The system
> > > actually hung or the desktop is dead?
> >
> > The system actually hung. They needed a hard reset to recover the system. (pure speculation: given a few minutes the system would likely recover itself as this seems to be a common scenario)
>
> Yeah I believe so.
>
> Thrashing prevention is why MGLRU's TTL is introduced, so I do suggest
> using that. It can be further improved too.
>
> Will keep that in mind and try to make some test cases to cover your
> case too and make some adjustments.
>
> BTW how does the kernel behave with MGLRU disabled for your case?
Hi all,
I tested it multiple times on my Fedora, comparing MGLRU to classic LRU
(using v2 of this series also also includes some minor improvements).
I modified the reproduce a bit just to test the OOM behavior:
- Running following command in console A:
fallocate -l 5G 5G
while true; do time cat 5G > /dev/null; done
- Then run following command in console B:
while true; do tail /dev/zero; done
The console A output is below:
With MGLRU disabled:
...
real 0m4.925s user 0m0.016s sys 0m4.904s # Under pressure
real 0m5.544s user 0m0.015s sys 0m5.521s
real 0m5.444s user 0m0.012s sys 0m5.425s
real 0m7.607s user 0m0.016s sys 0m7.561s
real 0m7.268s user 0m0.017s sys 0m7.240s
real 0m6.686s user 0m0.016s sys 0m6.656s
real 0m9.919s user 0m0.014s sys 0m9.831s # <- OOM in B triggers
real 0m4.559s user 0m0.012s sys 0m4.539s
real 0m1.381s user 0m0.009s sys 0m1.362s
real 0m11.816s user 0m0.010s sys 0m11.795s
real 0m6.797s user 0m0.021s sys 0m6.753s
real 0m0.944s user 0m0.013s sys 0m0.931s # <- OOM kill in B ends
real 0m0.285s user 0m0.013s sys 0m0.272s
MGLRU enabled, before this series:
...
real 0m0.355s user 0m0.009s sys 0m0.346s # Under pressure
real 0m0.352s user 0m0.008s sys 0m0.344s
real 0m0.549s user 0m0.014s sys 0m0.535s
real 0m0.628s user 0m0.009s sys 0m0.619s
real 0m0.651s user 0m0.009s sys 0m0.642s
real 0m5.294s user 0m0.010s sys 0m5.280s # <- OOM in B triggers
real 0m1.041s user 0m0.014s sys 0m1.026s
real 0m0.837s user 0m0.011s sys 0m0.826s
real 0m2.450s user 0m0.013s sys 0m2.435s
real 0m2.499s user 0m0.012s sys 0m2.485s
real 0m1.857s user 0m0.015s sys 0m1.841s
real 0m0.512s user 0m0.015s sys 0m0.497s
real 0m0.418s user 0m0.011s sys 0m0.407s # <- OOM kill in B ends
real 0m0.282s user 0m0.010s sys 0m0.272s
MGLRU enabled, after this series:
...
real 0m0.280s user 0m0.015s sys 0m0.265s # Under pressure
real 0m0.283s user 0m0.010s sys 0m0.273s
real 0m0.278s user 0m0.012s sys 0m0.266s
real 0m0.315s user 0m0.018s sys 0m0.297s
real 0m0.679s user 0m0.014s sys 0m0.663s
real 0m0.716s user 0m0.011s sys 0m0.705s
real 0m0.657s user 0m0.009s sys 0m0.648s
real 0m6.615s user 0m0.007s sys 0m6.453s # <- OOM in B triggers
real 0m1.244s user 0m0.018s sys 0m1.226s
real 0m1.290s user 0m0.014s sys 0m1.276s
real 0m1.119s user 0m0.011s sys 0m1.108s
real 0m0.882s user 0m0.010s sys 0m0.872s
real 0m0.855s user 0m0.007s sys 0m0.848s
real 0m0.933s user 0m0.005s sys 0m0.928s
real 0m0.833s user 0m0.009s sys 0m0.823s
real 0m0.279s user 0m0.012s sys 0m0.267s # <- OOM killed in B
real 0m0.273s user 0m0.010s sys 0m0.263s
It seems with MGLRU enabled, both performance and OOM jitter
seem better.
As for this series, it now has no significant effect or slightly
changed the jitter pattern, which I can't say is better or worse.
The peak latency seems slightly higher, but the system seems to
recover faster. Or maybe that's just noise.
The OOM behavior is not really perfect in any case, but with
MGLRU's TTL enabled, I got confirmation that the jitter is
gone completely (only a few frames).
prev parent reply other threads:[~2026-03-28 17:30 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 19:08 Kairui Song via B4 Relay
2026-03-17 19:08 ` [PATCH 1/8] mm/mglru: consolidate common code for retrieving evitable size Kairui Song via B4 Relay
2026-03-17 19:55 ` Yuanchu Xie
2026-03-18 9:42 ` Barry Song
2026-03-18 9:57 ` Kairui Song
2026-03-19 1:40 ` Chen Ridong
2026-03-20 19:51 ` Axel Rasmussen
2026-03-22 16:10 ` Kairui Song
2026-03-26 6:25 ` Baolin Wang
2026-03-17 19:08 ` [PATCH 2/8] mm/mglru: relocate the LRU scan batch limit to callers Kairui Song via B4 Relay
2026-03-19 2:00 ` Chen Ridong
2026-03-19 4:12 ` Kairui Song
2026-03-20 21:00 ` Axel Rasmussen
2026-03-22 8:14 ` Barry Song
2026-03-24 6:05 ` Kairui Song
2026-03-17 19:08 ` [PATCH 3/8] mm/mglru: restructure the reclaim loop Kairui Song via B4 Relay
2026-03-20 20:09 ` Axel Rasmussen
2026-03-22 16:11 ` Kairui Song
2026-03-24 6:41 ` Chen Ridong
2026-03-26 7:31 ` Baolin Wang
2026-03-26 8:37 ` Kairui Song
2026-03-17 19:09 ` [PATCH 4/8] mm/mglru: scan and count the exact number of folios Kairui Song via B4 Relay
2026-03-20 20:57 ` Axel Rasmussen
2026-03-22 16:20 ` Kairui Song
2026-03-24 7:22 ` Chen Ridong
2026-03-24 8:05 ` Kairui Song
2026-03-24 9:10 ` Chen Ridong
2026-03-24 9:29 ` Kairui Song
2026-03-17 19:09 ` [PATCH 5/8] mm/mglru: use a smaller batch for reclaim Kairui Song via B4 Relay
2026-03-20 20:58 ` Axel Rasmussen
2026-03-24 7:51 ` Chen Ridong
2026-03-17 19:09 ` [PATCH 6/8] mm/mglru: don't abort scan immediately right after aging Kairui Song via B4 Relay
2026-03-17 19:09 ` [PATCH 7/8] mm/mglru: simplify and improve dirty writeback handling Kairui Song via B4 Relay
2026-03-20 21:18 ` Axel Rasmussen
2026-03-22 16:22 ` Kairui Song
2026-03-24 8:57 ` Chen Ridong
2026-03-24 11:09 ` Kairui Song
2026-03-26 7:56 ` Baolin Wang
2026-03-17 19:09 ` [PATCH 8/8] mm/vmscan: remove sc->file_taken Kairui Song via B4 Relay
2026-03-20 21:19 ` Axel Rasmussen
2026-03-25 4:49 ` [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling Eric Naim
2026-03-25 5:47 ` Kairui Song
2026-03-25 9:26 ` Eric Naim
2026-03-25 9:47 ` Kairui Song
2026-03-28 17:30 ` Kairui Song [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acgNCzRDVmSbXrOE@KASONG-MC4 \
--to=ryncsn@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=dnaim@cachyos.org \
--cc=hannes@cmpxchg.org \
--cc=kaleshsingh@google.com \
--cc=laoar.shao@gmail.com \
--cc=lenohou@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=stevensd@google.com \
--cc=surenb@google.com \
--cc=vernon2gm@gmail.com \
--cc=wangzicheng@honor.com \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=yuzhao@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox