From: Kairui Song via B4 Relay <devnull+kasong.tencent.com@kernel.org>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
David Hildenbrand <david@kernel.org>,
Michal Hocko <mhocko@kernel.org>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
Lorenzo Stoakes <ljs@kernel.org>, Barry Song <baohua@kernel.org>,
David Stevens <stevensd@google.com>,
Chen Ridong <chenridong@huaweicloud.com>,
Leno Hou <lenohou@gmail.com>, Yafang Shao <laoar.shao@gmail.com>,
Yu Zhao <yuzhao@google.com>,
Zicheng Wang <wangzicheng@honor.com>,
Kalesh Singh <kaleshsingh@google.com>,
Suren Baghdasaryan <surenb@google.com>,
Chris Li <chrisl@kernel.org>, Vernon Yang <vernon2gm@gmail.com>,
linux-kernel@vger.kernel.org, Qi Zheng <qi.zheng@linux.dev>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Kairui Song <kasong@tencent.com>
Subject: [PATCH v4 10/14] mm/mglru: simplify and improve dirty writeback handling
Date: Tue, 07 Apr 2026 19:57:39 +0800 [thread overview]
Message-ID: <20260407-mglru-reclaim-v4-10-98cf3dc69519@tencent.com> (raw)
In-Reply-To: <20260407-mglru-reclaim-v4-0-98cf3dc69519@tencent.com>
From: Kairui Song <kasong@tencent.com>
Right now the flusher wakeup mechanism for MGLRU is less responsive
and unlikely to trigger compared to classical LRU. The classical
LRU wakes the flusher if one batch of folios passed to shrink_folio_list
is unevictable due to under writeback. MGLRU instead check and handle
this after the whole reclaim loop is done.
We previously even saw OOM problems due to passive flusher, which were
fixed but still not perfect [1].
We have just unified the dirty folio counting and activation routine,
now just move the dirty flush into the loop right after shrink_folio_list.
This improves the performance a lot for workloads involving heavy
writeback and prepares for throttling too.
Test with YCSB workloadb showed a major performance improvement:
Before this series:
Throughput(ops/sec): 62485.02962831822
AverageLatency(us): 500.9746963330107
pgpgin 159347462
workingset_refault_file 34522071
After this commit:
Throughput(ops/sec): 80857.08510208207
AverageLatency(us): 386.653262968934
pgpgin 112233121
workingset_refault_file 19516246
The performance is a lot better with significantly lower refault. We also
observed similar or higher performance gain for other real-world workloads.
We were concerned that the dirty flush could cause more wear for SSD:
that should not be the problem here, since the wakeup condition is when
the dirty folios have been pushed to the tail of LRU, which indicates
that memory pressure is so high that writeback is blocking the workload
already.
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Link: https://lore.kernel.org/linux-mm/20241026115714.1437435-1-jingxiangzeng.cas@gmail.com/ [1]
Signed-off-by: Kairui Song <kasong@tencent.com>
---
mm/vmscan.c | 41 ++++++++++++++++-------------------------
1 file changed, 16 insertions(+), 25 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2a722ebec4d8..23ec74d3bf6a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4724,8 +4724,6 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
scanned, skipped, isolated,
type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
- if (type == LRU_GEN_FILE)
- sc->nr.file_taken += isolated;
*isolatedp = isolated;
return scanned;
@@ -4833,12 +4831,27 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
return scanned;
retry:
reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false, memcg);
- sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
sc->nr_reclaimed += reclaimed;
trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
type_scanned, reclaimed, &stat, sc->priority,
type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
+ /*
+ * If too many file cache in the coldest generation can't be evicted
+ * due to being dirty, wake up the flusher.
+ */
+ if (stat.nr_unqueued_dirty == isolated) {
+ wakeup_flusher_threads(WB_REASON_VMSCAN);
+
+ /*
+ * For cgroupv1 dirty throttling is achieved by waking up
+ * the kernel flusher here and later waiting on folios
+ * which are in writeback to finish (see shrink_folio_list()).
+ */
+ if (!writeback_throttling_sane(sc))
+ reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
+ }
+
list_for_each_entry_safe_reverse(folio, next, &list, lru) {
DEFINE_MIN_SEQ(lruvec);
@@ -4994,28 +5007,6 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
cond_resched();
}
- /*
- * If too many file cache in the coldest generation can't be evicted
- * due to being dirty, wake up the flusher.
- */
- if (sc->nr.unqueued_dirty && sc->nr.unqueued_dirty == sc->nr.file_taken) {
- struct pglist_data *pgdat = lruvec_pgdat(lruvec);
-
- wakeup_flusher_threads(WB_REASON_VMSCAN);
-
- /*
- * For cgroupv1 dirty throttling is achieved by waking up
- * the kernel flusher here and later waiting on folios
- * which are in writeback to finish (see shrink_folio_list()).
- *
- * Flusher may not be able to issue writeback quickly
- * enough for cgroupv1 writeback throttling to work
- * on a large system.
- */
- if (!writeback_throttling_sane(sc))
- reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
- }
-
return need_rotate;
}
--
2.53.0
next prev parent reply other threads:[~2026-04-07 12:05 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-07 11:57 [PATCH v4 00/14] mm/mglru: improve reclaim loop and dirty folio handling Kairui Song via B4 Relay
2026-04-07 11:57 ` [PATCH v4 01/14] mm/mglru: consolidate common code for retrieving evictable size Kairui Song via B4 Relay
2026-04-07 11:57 ` [PATCH v4 02/14] mm/mglru: rename variables related to aging and rotation Kairui Song via B4 Relay
2026-04-07 11:57 ` [PATCH v4 03/14] mm/mglru: relocate the LRU scan batch limit to callers Kairui Song via B4 Relay
2026-04-08 3:12 ` Chen Ridong
2026-04-07 11:57 ` [PATCH v4 04/14] mm/mglru: restructure the reclaim loop Kairui Song via B4 Relay
2026-04-08 8:08 ` Chen Ridong
2026-04-08 8:43 ` Kairui Song
2026-04-07 11:57 ` [PATCH v4 05/14] mm/mglru: scan and count the exact number of folios Kairui Song via B4 Relay
2026-04-08 8:27 ` Chen Ridong
2026-04-07 11:57 ` [PATCH v4 06/14] mm/mglru: use a smaller batch for reclaim Kairui Song via B4 Relay
2026-04-07 11:57 ` [PATCH v4 07/14] mm/mglru: don't abort scan immediately right after aging Kairui Song via B4 Relay
2026-04-08 9:32 ` Chen Ridong
2026-04-07 11:57 ` [PATCH v4 08/14] mm/mglru: remove redundant swap constrained check upon isolation Kairui Song via B4 Relay
2026-04-07 11:57 ` [PATCH v4 09/14] mm/mglru: use the common routine for dirty/writeback reactivation Kairui Song via B4 Relay
2026-04-07 11:57 ` Kairui Song via B4 Relay [this message]
2026-04-07 11:57 ` [PATCH v4 11/14] mm/mglru: remove no longer used reclaim argument for folio protection Kairui Song via B4 Relay
2026-04-07 11:57 ` [PATCH v4 12/14] mm/vmscan: remove sc->file_taken Kairui Song via B4 Relay
2026-04-07 11:57 ` [PATCH v4 13/14] mm/vmscan: remove sc->unqueued_dirty Kairui Song via B4 Relay
2026-04-07 11:57 ` [PATCH v4 14/14] mm/vmscan: unify writeback reclaim statistic and throttling Kairui Song via B4 Relay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260407-mglru-reclaim-v4-10-98cf3dc69519@tencent.com \
--to=devnull+kasong.tencent.com@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chenridong@huaweicloud.com \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kaleshsingh@google.com \
--cc=kasong@tencent.com \
--cc=laoar.shao@gmail.com \
--cc=lenohou@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@kernel.org \
--cc=qi.zheng@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=stevensd@google.com \
--cc=surenb@google.com \
--cc=vernon2gm@gmail.com \
--cc=wangzicheng@honor.com \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=yuzhao@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox