From: Eric Naim <dnaim@cachyos.org>
To: Kairui Song <ryncsn@gmail.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
David Hildenbrand <david@kernel.org>,
Michal Hocko <mhocko@kernel.org>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
Lorenzo Stoakes <ljs@kernel.org>, Barry Song <baohua@kernel.org>,
David Stevens <stevensd@google.com>,
Chen Ridong <chenridong@huaweicloud.com>,
Leno Hou <lenohou@gmail.com>, Yafang Shao <laoar.shao@gmail.com>,
Yu Zhao <yuzhao@google.com>, Zicheng Wang <wangzicheng@honor.com>,
Kalesh Singh <kaleshsingh@google.com>,
Suren Baghdasaryan <surenb@google.com>,
Chris Li <chrisl@kernel.org>, Vernon Yang <vernon2gm@gmail.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling
Date: Wed, 25 Mar 2026 09:26:00 +0000 [thread overview]
Message-ID: <85b4be3c-09a3-4a28-924d-71a20db3fd62@cachyos.org> (raw)
In-Reply-To: <CAMgjq7CuEYAzZFmS2T7N6je6VHffVkH1hh64YUuShFBnrJJvqA@mail.gmail.com>
On 3/25/26 1:47 PM, Kairui Song wrote:
> On Wed, Mar 25, 2026 at 1:04 PM Eric Naim <dnaim@cachyos.org> wrote:
>>
>> Hi Kairui,
>>
>> On 3/18/26 3:08 AM, Kairui Song via B4 Relay wrote:
>>> This series cleans up and slightly improves MGLRU's reclaim loop and
>>> dirty flush logic. As a result, we can see an up to ~50% reduce of file
>>> faults and 30% increase in MongoDB throughput with YCSB and no swap
>>> involved, other common benchmarks have no regression, and LOC is
>>> reduced, with less unexpected OOM in our production environment.
>>>
>
> ...
>
>>
>> I applied this patch set to 7.0-rc5 and noticed the system locking up when performing the below test.
>>
>> fallocate -l 5G 5G
>> while true; do tail /dev/zero; done
>> while true; do time cat 5G > /dev/null; sleep $(($(cat /sys/kernel/mm/lru_gen/min_ttl_ms)/1000+1)); done
>>
>> After reading [1], I suspect that this was because the system was using zram as swap, and yes if zram is disabled then the lock up does not occur.
>
> Hi Eric,
>
> Thanks for the report, I was about to send V2 but noticing your report
> I'll try to reproduce your issue first.
>
> So far I didn't notice any regression, is this an issue caused by this
> patch or is it an existing issue? I don't have any context about how
> you are doing the test. BTW the calculation in patch "mm/mglru:
> restructure the reclaim loop" needs to have a lowest bar
> "max(nr_to_scan, SWAP_CLUSTER_MAX)" for small machines, not sure if
> related but will add to V2.
>
As of writing this, I got some new information that makes this a bit more confusing. The kernel that doesn't have the issue was patched with [1] as a means of protecting the working set (similar to lru_gen_min_ttl_ms).
So this time on an unpatched kernel, the system still freezes but quickly recovers itself after about 2 seconds. With this patchset applied, the system freezes but it doesn't quickly recover (if at all).
Curiously, I had the user test again but this time with lru_gen_min_ttl_ms = 100. With this set, the system doesn't freeze at all with or without this patchset.
> And about the test you posted:
> while true; do tail /dev/zero; done
>
> I believe this will just consume all memory with zero pages and then
> get OOM killed, that's exactly what the test is meant to do. By lockup
> I'm not sure you mean since you mentioned OOM kill. The system
> actually hung or the desktop is dead?
The system actually hung. They needed a hard reset to recover the system. (pure speculation: given a few minutes the system would likely recover itself as this seems to be a common scenario)
>
> I just ran that with or without ZRAM on two machines and my laptop,
> everything looks good here with this series.
>
>> zram as swap seems to be unsupported by upstream.
>
> That's simply not true, other distros like Fedora even have ZRAM as
> swap by default:
> https://fedoraproject.org/wiki/Changes/SwapOnZRAM
>
> And systemd have a widely used ZRAM swap support:
> https://github.com/systemd/zram-generator
>
> Android also uses that, and we are using ZRAM by default in our fleet
> which runs fine.
>
>> the user that tested this wasn't able to get a
>> good kernel trace, the only thing left was
>> a trace of the OOM killer firing.
>
> No worry, that's fine, just send me the OOM trace or log, the more
> detailed context I get the better.
Mar 25 08:24:22 osiris kernel: Call Trace:
Mar 25 08:24:22 osiris kernel: <TASK>
Mar 25 08:24:22 osiris kernel: dump_stack_lvl+0x61/0x80
Mar 25 08:24:22 osiris kernel: dump_header+0x4a/0x160
Mar 25 08:24:22 osiris kernel: oom_kill_process+0x18f/0x1f0
Mar 25 08:24:22 osiris kernel: out_of_memory+0x4ab/0x5c0
Mar 25 08:24:22 osiris kernel: __alloc_pages_slowpath+0x9ac/0x1060
Mar 25 08:24:22 osiris kernel: __alloc_frozen_pages_noprof+0x29a/0x320
Mar 25 08:24:22 osiris kernel: alloc_pages_mpol+0x107/0x1b0
Mar 25 08:24:22 osiris kernel: folio_alloc_noprof+0x85/0xb0
Mar 25 08:24:22 osiris kernel: __filemap_get_folio_mpol+0x1ff/0x4c0
Mar 25 08:24:22 osiris kernel: filemap_fault+0x3e3/0x6e0
Mar 25 08:24:22 osiris kernel: __do_fault+0x46/0x140
Mar 25 08:24:22 osiris kernel: do_pte_missing+0x154/0xea0
Mar 25 08:24:22 osiris kernel: ? __pte_offset_map+0x1d/0xd0
Mar 25 08:24:22 osiris kernel: handle_mm_fault+0x89c/0x1280
Mar 25 08:24:22 osiris kernel: do_user_addr_fault+0x23b/0x720
Mar 25 08:24:22 osiris kernel: exc_page_fault+0x75/0xe0
Mar 25 08:24:22 osiris kernel: asm_exc_page_fault+0x26/0x30
Mar 25 08:24:22 osiris kernel: RIP: 0033:0x7fec4beb43c0
Mar 25 08:24:22 osiris kernel: Code: Unable to access opcode bytes at 0x7fec4beb4396.
Mar 25 08:24:22 osiris kernel: RSP: 002b:00007ffcb348d698 EFLAGS: 00010293
Mar 25 08:24:22 osiris kernel: RAX: 00000000c70f6907 RBX: 00007ffcb348d8d0 RCX: 00007fec4bb1604d
Mar 25 08:24:22 osiris kernel: RDX: c6a4a7935bd1e995 RSI: 4fb7dae88ad99bfb RDI: 000055ee77cc8150
Mar 25 08:24:22 osiris kernel: RBP: 00007ffcb348dd60 R08: 000055ee77cc8158 R09: 000000000000000c
Mar 25 08:24:22 osiris kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
Mar 25 08:24:22 osiris kernel: R13: 000055ee77cc8150 R14: 0000000000000064 R15: 431bde82d7b634db
Mar 25 08:24:22 osiris kernel: </TASK>
Here's the call trace that was recovered. Some mm related settings that we set in our kernel in case its useful:
vm.compact_unevictable_allowed = 0
vm.compaction_proactiveness = 0
vm.page-cluster = 0
vm.swappiness = 150
vm.vfs_cache_pressure = 50
vm.dirty_bytes = 268435456
vm.dirty_background_bytes = 67108864
vm.dirty_writeback_centisecs = 1500
vm.watermark_boost_factor = 0
/sys/kernel/mm/transparent_hugepage/defrag = defer+madvise
[1] https://github.com/firelzrd/le9uo/
--
Regards,
Eric
next prev parent reply other threads:[~2026-03-25 9:27 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 19:08 Kairui Song via B4 Relay
2026-03-17 19:08 ` [PATCH 1/8] mm/mglru: consolidate common code for retrieving evitable size Kairui Song via B4 Relay
2026-03-17 19:55 ` Yuanchu Xie
2026-03-18 9:42 ` Barry Song
2026-03-18 9:57 ` Kairui Song
2026-03-19 1:40 ` Chen Ridong
2026-03-20 19:51 ` Axel Rasmussen
2026-03-22 16:10 ` Kairui Song
2026-03-26 6:25 ` Baolin Wang
2026-03-17 19:08 ` [PATCH 2/8] mm/mglru: relocate the LRU scan batch limit to callers Kairui Song via B4 Relay
2026-03-19 2:00 ` Chen Ridong
2026-03-19 4:12 ` Kairui Song
2026-03-20 21:00 ` Axel Rasmussen
2026-03-22 8:14 ` Barry Song
2026-03-24 6:05 ` Kairui Song
2026-03-17 19:08 ` [PATCH 3/8] mm/mglru: restructure the reclaim loop Kairui Song via B4 Relay
2026-03-20 20:09 ` Axel Rasmussen
2026-03-22 16:11 ` Kairui Song
2026-03-24 6:41 ` Chen Ridong
2026-03-26 7:31 ` Baolin Wang
2026-03-26 8:37 ` Kairui Song
2026-03-17 19:09 ` [PATCH 4/8] mm/mglru: scan and count the exact number of folios Kairui Song via B4 Relay
2026-03-20 20:57 ` Axel Rasmussen
2026-03-22 16:20 ` Kairui Song
2026-03-24 7:22 ` Chen Ridong
2026-03-24 8:05 ` Kairui Song
2026-03-24 9:10 ` Chen Ridong
2026-03-24 9:29 ` Kairui Song
2026-03-17 19:09 ` [PATCH 5/8] mm/mglru: use a smaller batch for reclaim Kairui Song via B4 Relay
2026-03-20 20:58 ` Axel Rasmussen
2026-03-24 7:51 ` Chen Ridong
2026-03-17 19:09 ` [PATCH 6/8] mm/mglru: don't abort scan immediately right after aging Kairui Song via B4 Relay
2026-03-17 19:09 ` [PATCH 7/8] mm/mglru: simplify and improve dirty writeback handling Kairui Song via B4 Relay
2026-03-20 21:18 ` Axel Rasmussen
2026-03-22 16:22 ` Kairui Song
2026-03-24 8:57 ` Chen Ridong
2026-03-24 11:09 ` Kairui Song
2026-03-26 7:56 ` Baolin Wang
2026-03-17 19:09 ` [PATCH 8/8] mm/vmscan: remove sc->file_taken Kairui Song via B4 Relay
2026-03-20 21:19 ` Axel Rasmussen
2026-03-25 4:49 ` [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling Eric Naim
2026-03-25 5:47 ` Kairui Song
2026-03-25 9:26 ` Eric Naim [this message]
2026-03-25 9:47 ` Kairui Song
2026-03-28 17:30 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=85b4be3c-09a3-4a28-924d-71a20db3fd62@cachyos.org \
--to=dnaim@cachyos.org \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kaleshsingh@google.com \
--cc=laoar.shao@gmail.com \
--cc=lenohou@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@kernel.org \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=stevensd@google.com \
--cc=surenb@google.com \
--cc=vernon2gm@gmail.com \
--cc=wangzicheng@honor.com \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=yuzhao@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox