Re: [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Eric Naim <dnaim@cachyos.org>
To: Kairui Song <ryncsn@gmail.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	David Hildenbrand <david@kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Lorenzo Stoakes <ljs@kernel.org>, Barry Song <baohua@kernel.org>,
	David Stevens <stevensd@google.com>,
	Chen Ridong <chenridong@huaweicloud.com>,
	Leno Hou <lenohou@gmail.com>, Yafang Shao <laoar.shao@gmail.com>,
	Yu Zhao <yuzhao@google.com>, Zicheng Wang <wangzicheng@honor.com>,
	Kalesh Singh <kaleshsingh@google.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Chris Li <chrisl@kernel.org>, Vernon Yang <vernon2gm@gmail.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling
Date: Wed, 25 Mar 2026 09:26:00 +0000	[thread overview]
Message-ID: <85b4be3c-09a3-4a28-924d-71a20db3fd62@cachyos.org> (raw)
In-Reply-To: <CAMgjq7CuEYAzZFmS2T7N6je6VHffVkH1hh64YUuShFBnrJJvqA@mail.gmail.com>

On 3/25/26 1:47 PM, Kairui Song wrote:
> On Wed, Mar 25, 2026 at 1:04 PM Eric Naim <dnaim@cachyos.org> wrote:
>>
>> Hi Kairui,
>>
>> On 3/18/26 3:08 AM, Kairui Song via B4 Relay wrote:
>>> This series cleans up and slightly improves MGLRU's reclaim loop and
>>> dirty flush logic. As a result, we can see an up to ~50% reduce of file
>>> faults and 30% increase in MongoDB throughput with YCSB and no swap
>>> involved, other common benchmarks have no regression, and LOC is
>>> reduced, with less unexpected OOM in our production environment.
>>>
> 
> ...
> 
>>
>> I applied this patch set to 7.0-rc5 and noticed the system locking up when performing the below test.
>>
>> fallocate -l 5G 5G
>> while true; do tail /dev/zero; done
>> while true; do time cat 5G > /dev/null; sleep $(($(cat /sys/kernel/mm/lru_gen/min_ttl_ms)/1000+1)); done
>>
>> After reading [1], I suspect that this was because the system was using zram as swap, and yes if zram is disabled then the lock up does not occur.
> 
> Hi Eric,
> 
> Thanks for the report, I was about to send V2 but noticing your report
> I'll try to reproduce your issue first.
> 
> So far I didn't notice any regression, is this an issue caused by this
> patch or is it an existing issue? I don't have any context about how
> you are doing the test. BTW the calculation in patch "mm/mglru:
> restructure the reclaim loop" needs to have a lowest bar
> "max(nr_to_scan, SWAP_CLUSTER_MAX)" for small machines, not sure if
> related but will add to V2.
> 

As of writing this, I got some new information that makes this a bit more confusing. The kernel that doesn't have the issue was patched with [1] as a means of protecting the working set (similar to lru_gen_min_ttl_ms). 

So this time on an unpatched kernel, the system still freezes but quickly recovers itself after about 2 seconds. With this patchset applied, the system freezes but it doesn't quickly recover (if at all).

Curiously, I had the user test again but this time with lru_gen_min_ttl_ms = 100. With this set, the system doesn't freeze at all with or without this patchset.

> And about the test you posted:
> while true; do tail /dev/zero; done
> 
> I believe this will just consume all memory with zero pages and then
> get OOM killed, that's exactly what the test is meant to do. By lockup
> I'm not sure you mean since you mentioned OOM kill. The system
> actually hung or the desktop is dead?

The system actually hung. They needed a hard reset to recover the system. (pure speculation: given a few minutes the system would likely recover itself as this seems to be a common scenario)

> 
> I just ran that with or without ZRAM on two machines and my laptop,
> everything looks good here with this series.
> 
>> zram as swap seems to be unsupported by upstream.
> 
> That's simply not true, other distros like Fedora even have ZRAM as
> swap by default:
> https://fedoraproject.org/wiki/Changes/SwapOnZRAM
> 
> And systemd have a widely used ZRAM swap support:
> https://github.com/systemd/zram-generator
> 
> Android also uses that, and we are using ZRAM by default in our fleet
> which runs fine.
> 
>> the user that tested this wasn't able to get a
>> good kernel trace, the only thing left was
>> a trace of the OOM killer firing.
> 
> No worry, that's fine, just send me the OOM trace or log, the more
> detailed context I get the better.

Mar 25 08:24:22 osiris kernel: Call Trace:
Mar 25 08:24:22 osiris kernel:  <TASK>
Mar 25 08:24:22 osiris kernel:  dump_stack_lvl+0x61/0x80
Mar 25 08:24:22 osiris kernel:  dump_header+0x4a/0x160
Mar 25 08:24:22 osiris kernel:  oom_kill_process+0x18f/0x1f0
Mar 25 08:24:22 osiris kernel:  out_of_memory+0x4ab/0x5c0
Mar 25 08:24:22 osiris kernel:  __alloc_pages_slowpath+0x9ac/0x1060
Mar 25 08:24:22 osiris kernel:  __alloc_frozen_pages_noprof+0x29a/0x320
Mar 25 08:24:22 osiris kernel:  alloc_pages_mpol+0x107/0x1b0
Mar 25 08:24:22 osiris kernel:  folio_alloc_noprof+0x85/0xb0
Mar 25 08:24:22 osiris kernel:  __filemap_get_folio_mpol+0x1ff/0x4c0
Mar 25 08:24:22 osiris kernel:  filemap_fault+0x3e3/0x6e0
Mar 25 08:24:22 osiris kernel:  __do_fault+0x46/0x140
Mar 25 08:24:22 osiris kernel:  do_pte_missing+0x154/0xea0
Mar 25 08:24:22 osiris kernel:  ? __pte_offset_map+0x1d/0xd0
Mar 25 08:24:22 osiris kernel:  handle_mm_fault+0x89c/0x1280
Mar 25 08:24:22 osiris kernel:  do_user_addr_fault+0x23b/0x720
Mar 25 08:24:22 osiris kernel:  exc_page_fault+0x75/0xe0
Mar 25 08:24:22 osiris kernel:  asm_exc_page_fault+0x26/0x30
Mar 25 08:24:22 osiris kernel: RIP: 0033:0x7fec4beb43c0
Mar 25 08:24:22 osiris kernel: Code: Unable to access opcode bytes at 0x7fec4beb4396.
Mar 25 08:24:22 osiris kernel: RSP: 002b:00007ffcb348d698 EFLAGS: 00010293
Mar 25 08:24:22 osiris kernel: RAX: 00000000c70f6907 RBX: 00007ffcb348d8d0 RCX: 00007fec4bb1604d
Mar 25 08:24:22 osiris kernel: RDX: c6a4a7935bd1e995 RSI: 4fb7dae88ad99bfb RDI: 000055ee77cc8150
Mar 25 08:24:22 osiris kernel: RBP: 00007ffcb348dd60 R08: 000055ee77cc8158 R09: 000000000000000c
Mar 25 08:24:22 osiris kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
Mar 25 08:24:22 osiris kernel: R13: 000055ee77cc8150 R14: 0000000000000064 R15: 431bde82d7b634db
Mar 25 08:24:22 osiris kernel:  </TASK>

Here's the call trace that was recovered. Some mm related settings that we set in our kernel in case its useful:

vm.compact_unevictable_allowed = 0
vm.compaction_proactiveness = 0
vm.page-cluster = 0
vm.swappiness = 150 
vm.vfs_cache_pressure = 50
vm.dirty_bytes = 268435456
vm.dirty_background_bytes = 67108864
vm.dirty_writeback_centisecs = 1500
vm.watermark_boost_factor = 0

/sys/kernel/mm/transparent_hugepage/defrag = defer+madvise

[1] https://github.com/firelzrd/le9uo/

-- 
Regards,
  Eric

next prev parent reply	other threads:[~2026-03-25  9:27 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 19:08 Kairui Song via B4 Relay
2026-03-17 19:08 ` [PATCH 1/8] mm/mglru: consolidate common code for retrieving evitable size Kairui Song via B4 Relay
2026-03-17 19:55   ` Yuanchu Xie
2026-03-18  9:42   ` Barry Song
2026-03-18  9:57     ` Kairui Song
2026-03-19  1:40   ` Chen Ridong
2026-03-20 19:51     ` Axel Rasmussen
2026-03-22 16:10       ` Kairui Song
2026-03-26  6:25   ` Baolin Wang
2026-03-17 19:08 ` [PATCH 2/8] mm/mglru: relocate the LRU scan batch limit to callers Kairui Song via B4 Relay
2026-03-19  2:00   ` Chen Ridong
2026-03-19  4:12     ` Kairui Song
2026-03-20 21:00   ` Axel Rasmussen
2026-03-22  8:14   ` Barry Song
2026-03-24  6:05     ` Kairui Song
2026-03-17 19:08 ` [PATCH 3/8] mm/mglru: restructure the reclaim loop Kairui Song via B4 Relay
2026-03-20 20:09   ` Axel Rasmussen
2026-03-22 16:11     ` Kairui Song
2026-03-24  6:41   ` Chen Ridong
2026-03-26  7:31   ` Baolin Wang
2026-03-26  8:37     ` Kairui Song
2026-03-17 19:09 ` [PATCH 4/8] mm/mglru: scan and count the exact number of folios Kairui Song via B4 Relay
2026-03-20 20:57   ` Axel Rasmussen
2026-03-22 16:20     ` Kairui Song
2026-03-24  7:22       ` Chen Ridong
2026-03-24  8:05         ` Kairui Song
2026-03-24  9:10           ` Chen Ridong
2026-03-24  9:29             ` Kairui Song
2026-03-17 19:09 ` [PATCH 5/8] mm/mglru: use a smaller batch for reclaim Kairui Song via B4 Relay
2026-03-20 20:58   ` Axel Rasmussen
2026-03-24  7:51   ` Chen Ridong
2026-03-17 19:09 ` [PATCH 6/8] mm/mglru: don't abort scan immediately right after aging Kairui Song via B4 Relay
2026-03-17 19:09 ` [PATCH 7/8] mm/mglru: simplify and improve dirty writeback handling Kairui Song via B4 Relay
2026-03-20 21:18   ` Axel Rasmussen
2026-03-22 16:22     ` Kairui Song
2026-03-24  8:57   ` Chen Ridong
2026-03-24 11:09     ` Kairui Song
2026-03-26  7:56   ` Baolin Wang
2026-03-17 19:09 ` [PATCH 8/8] mm/vmscan: remove sc->file_taken Kairui Song via B4 Relay
2026-03-20 21:19   ` Axel Rasmussen
2026-03-25  4:49 ` [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling Eric Naim
2026-03-25  5:47   ` Kairui Song
2026-03-25  9:26     ` Eric Naim [this message]
2026-03-25  9:47       ` Kairui Song
2026-03-28 17:30         ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85b4be3c-09a3-4a28-924d-71a20db3fd62@cachyos.org \
    --to=dnaim@cachyos.org \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kaleshsingh@google.com \
    --cc=laoar.shao@gmail.com \
    --cc=lenohou@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=stevensd@google.com \
    --cc=surenb@google.com \
    --cc=vernon2gm@gmail.com \
    --cc=wangzicheng@honor.com \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox