linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kairui Song <ryncsn@gmail.com>
To: Axel Rasmussen <axelrasmussen@google.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	David Hildenbrand <david@kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	 Qi Zheng <zhengqi.arch@bytedance.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 Lorenzo Stoakes <ljs@kernel.org>, Barry Song <baohua@kernel.org>,
	David Stevens <stevensd@google.com>,
	 Chen Ridong <chenridong@huaweicloud.com>,
	Leno Hou <lenohou@gmail.com>,  Yafang Shao <laoar.shao@gmail.com>,
	Yu Zhao <yuzhao@google.com>,
	 Zicheng Wang <wangzicheng@honor.com>,
	Kalesh Singh <kaleshsingh@google.com>,
	 Suren Baghdasaryan <surenb@google.com>,
	Chris Li <chrisl@kernel.org>, Vernon Yang <vernon2gm@gmail.com>,
	 linux-kernel@vger.kernel.org, Qi Zheng <qi.zheng@linux.dev>,
	 Baolin Wang <baolin.wang@linux.alibaba.com>
Subject: Re: [PATCH v3 14/14] mm/vmscan: unify writeback reclaim statistic and throttling
Date: Sun, 5 Apr 2026 02:36:14 +0800	[thread overview]
Message-ID: <CAMgjq7DFTUPhXqOxUa0UJa_v4mzseuLf2G9+FFGYtgmNasn6tQ@mail.gmail.com> (raw)
In-Reply-To: <CAJHvVcjDXktz-_Nk8qj3jXfLkHhJVo+N2KWb1sA+DvENOMkdDw@mail.gmail.com>

On Sat, Apr 4, 2026 at 5:16 AM Axel Rasmussen <axelrasmussen@google.com> wrote:
>
> On Thu, Apr 2, 2026 at 11:53 AM Kairui Song via B4 Relay
> <devnull+kasong.tencent.com@kernel.org> wrote:
> >
> > From: Kairui Song <kasong@tencent.com>
> >
> > Currently MGLRU and non-MGLRU handle the reclaim statistic and
> > writeback handling very differently, especially throttling.
> > Basically MGLRU just ignored the throttling part.
> >
> > Let's just unify this part, use a helper to deduplicate the code
> > so both setups will share the same behavior.
> >
> > Test using following reproducer using bash:
> >
> >   echo "Setup a slow device using dm delay"
> >   dd if=/dev/zero of=/var/tmp/backing bs=1M count=2048
> >   LOOP=$(losetup --show -f /var/tmp/backing)
> >   mkfs.ext4 -q $LOOP
> >   echo "0 $(blockdev --getsz $LOOP) delay $LOOP 0 0 $LOOP 0 1000" | \
> >       dmsetup create slow_dev
> >   mkdir -p /mnt/slow && mount /dev/mapper/slow_dev /mnt/slow
> >
> >   echo "Start writeback pressure"
> >   sync && echo 3 > /proc/sys/vm/drop_caches
> >   mkdir /sys/fs/cgroup/test_wb
> >   echo 128M > /sys/fs/cgroup/test_wb/memory.max
> >   (echo $BASHPID > /sys/fs/cgroup/test_wb/cgroup.procs && \
> >       dd if=/dev/zero of=/mnt/slow/testfile bs=1M count=192)
> >
> >   echo "Clean up"
> >   echo "0 $(blockdev --getsz $LOOP) error" | dmsetup load slow_dev
> >   dmsetup resume slow_dev
> >   umount -l /mnt/slow && sync
> >   dmsetup remove slow_dev
> >
> > Before this commit, `dd` will get OOM killed immediately if
> > MGLRU is enabled. Classic LRU is fine.
> >
> > After this commit, throttling is now effective and no more spin on
> > LRU or premature OOM. Stress test on other workloads also looking good.
> >
> > Global throttling is not here yet, we will fix that separately later.
>
> If I understand correctly, I think this fixes this regression report
> [1] from a long time ago that was never fully resolved?
>
> [1]: https://lore.kernel.org/lkml/ZeC-u7GRSptoVqia@chrisdown.name/
>
> We investigated at that time, but I don't feel we got to a consensus
> on how to solve it. I think we got a bit bogged down trying to
> "completely solve writeback throttling" rather than just doing some
> incremental improvement which fixed that particular case.
>

Hello Axel!

Yes, we also observed that problem. I almost forgot about that report,
thanks for the link! No worry, for the majority of the users I think
the problem was fixed already a year ago.

I asked Jingxiang previously to help fix that by waking up writeback
previously. In that discussion, the info is showing that fluster is
not waking at all, and Yafang reports that reverting 14aa8b2d5c2e can
fix it. So Jingxiang's fix seemed work well at that time:
https://lore.kernel.org/linux-mm/20241026115714.1437435-1-jingxiangzeng.cas@gmail.com/

AFAIK there seems to be no more reports of premature OOM in the mail
list since then, but later we found that that fix isn't enough for
some particular and rare setups (for example I used dm delay in the
test script above to simulate slow IO). Usually the reclaim can always
keep up, since it's rare for LRU to be full of writeback folios and
there are always clean folios to drop, waking up flusher is good
enough. But when under extreme pressure or very slow devices, LRU
could get congested with writeback folios. And it's hard to apply a
reasonable throttle or improve the dirty flush without a bit more
refactor first, and that's not the only cgroup OOM problem we
encountered.

With this series, I think the known problems mentioned above are all
covered in a clean way.

Global pressure and throttle is still not here yet, it's an even more
rare problem since LRU getting congested with writeback globally seems
already a really bad situation to me. That can also be fixed
separately later.


  reply	other threads:[~2026-04-04 18:36 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-02 18:53 [PATCH v3 00/14] mm/mglru: improve reclaim loop and dirty folio handling Kairui Song via B4 Relay
2026-04-02 18:53 ` [PATCH v3 01/14] mm/mglru: consolidate common code for retrieving evictable size Kairui Song via B4 Relay
2026-04-03  3:16   ` Kairui Song
2026-04-02 18:53 ` [PATCH v3 02/14] mm/mglru: rename variables related to aging and rotation Kairui Song via B4 Relay
2026-04-02 18:53 ` [PATCH v3 03/14] mm/mglru: relocate the LRU scan batch limit to callers Kairui Song via B4 Relay
2026-04-02 18:53 ` [PATCH v3 04/14] mm/mglru: restructure the reclaim loop Kairui Song via B4 Relay
2026-04-03  4:44   ` Kairui Song
2026-04-02 18:53 ` [PATCH v3 05/14] mm/mglru: scan and count the exact number of folios Kairui Song via B4 Relay
2026-04-02 18:53 ` [PATCH v3 06/14] mm/mglru: use a smaller batch for reclaim Kairui Song via B4 Relay
2026-04-03  7:50   ` Barry Song
2026-04-03  9:09     ` Kairui Song
2026-04-03  9:25       ` Barry Song
2026-04-02 18:53 ` [PATCH v3 07/14] mm/mglru: don't abort scan immediately right after aging Kairui Song via B4 Relay
2026-04-02 18:53 ` [PATCH v3 08/14] mm/mglru: remove redundant swap constrained check upon isolation Kairui Song via B4 Relay
2026-04-02 18:53 ` [PATCH v3 09/14] mm/mglru: use the common routine for dirty/writeback reactivation Kairui Song via B4 Relay
2026-04-03  5:00   ` Kairui Song
2026-04-02 18:53 ` [PATCH v3 10/14] mm/mglru: simplify and improve dirty writeback handling Kairui Song via B4 Relay
2026-04-02 18:53 ` [PATCH v3 11/14] mm/mglru: remove no longer used reclaim argument for folio protection Kairui Song via B4 Relay
2026-04-02 18:53 ` [PATCH v3 12/14] mm/vmscan: remove sc->file_taken Kairui Song via B4 Relay
2026-04-02 18:53 ` [PATCH v3 13/14] mm/vmscan: remove sc->unqueued_dirty Kairui Song via B4 Relay
2026-04-02 18:53 ` [PATCH v3 14/14] mm/vmscan: unify writeback reclaim statistic and throttling Kairui Song via B4 Relay
2026-04-03 21:15   ` Axel Rasmussen
2026-04-04 18:36     ` Kairui Song [this message]
2026-04-07  6:27   ` Baolin Wang
2026-04-03 21:26 ` [PATCH v3 00/14] mm/mglru: improve reclaim loop and dirty folio handling Axel Rasmussen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMgjq7DFTUPhXqOxUa0UJa_v4mzseuLf2G9+FFGYtgmNasn6tQ@mail.gmail.com \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=chenridong@huaweicloud.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kaleshsingh@google.com \
    --cc=laoar.shao@gmail.com \
    --cc=lenohou@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=qi.zheng@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=stevensd@google.com \
    --cc=surenb@google.com \
    --cc=vernon2gm@gmail.com \
    --cc=wangzicheng@honor.com \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox