linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <baohua@kernel.org>
To: kasong@tencent.com
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>,  Wei Xu <weixugc@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 David Hildenbrand <david@kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	 Qi Zheng <zhengqi.arch@bytedance.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 Lorenzo Stoakes <ljs@kernel.org>,
	David Stevens <stevensd@google.com>,
	 Chen Ridong <chenridong@huaweicloud.com>,
	Leno Hou <lenohou@gmail.com>,  Yafang Shao <laoar.shao@gmail.com>,
	Yu Zhao <yuzhao@google.com>,
	 Zicheng Wang <wangzicheng@honor.com>,
	Kalesh Singh <kaleshsingh@google.com>,
	 Suren Baghdasaryan <surenb@google.com>,
	Chris Li <chrisl@kernel.org>, Vernon Yang <vernon2gm@gmail.com>,
	 linux-kernel@vger.kernel.org, Qi Zheng <qi.zheng@linux.dev>,
	 Baolin Wang <baolin.wang@linux.alibaba.com>
Subject: Re: [PATCH v5 04/14] mm/mglru: restructure the reclaim loop
Date: Thu, 16 Apr 2026 14:33:48 +0800	[thread overview]
Message-ID: <CAGsJ_4yQDgD-etzDzXJvaf2mOja6GAuFRw1WS2NZqxBvo7Ag8w@mail.gmail.com> (raw)
In-Reply-To: <20260413-mglru-reclaim-v5-4-8eaeacbddc44@tencent.com>

On Mon, Apr 13, 2026 at 12:48 AM Kairui Song via B4 Relay
<devnull+kasong.tencent.com@kernel.org> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> The current loop will calculate the scan number on each iteration. The
> number of folios to scan is based on the LRU length, with some unclear
> behaviors, eg, the scan number is only shifted by reclaim priority when
> aging is not needed or when at the default priority, and it couples
> the number calculation with aging and rotation.
>
> Adjust, simplify it, and decouple aging and rotation. Just calculate the
> scan number for once at the beginning of the reclaim, always respect the
> reclaim priority, and make the aging and rotation more explicit.
>
> This slightly changes how aging and offline memcg reclaim works:
> Previously, aging was always skipped at DEF_PRIORITY even when
> eviction was impossible. Now, aging is always triggered when it
> is necessary to make progress. The old behavior may waste a reclaim
> iteration only to escalate priority, potentially causing over-reclaim
> of slab and breaking reclaim balance in multi-cgroup setups.
>
> Similar for offline memcg. Previously, offline memcg wouldn't be
> aged unless it didn't have any evictable folios. Now, we might age
> it if it has only 3 generations and the reclaim priority is less
> than DEF_PRIORITY, which should be fine. On one hand, offline memcg
> might still hold long-term folios, and in fact, a long-existing offline
> memcg must be pinned by some long-term folios like shmem. These folios
> might be used by other memcg, so aging them as ordinary memcg seems
> correct. Besides, aging enables further reclaim of an offlined memcg,
> which will certainly happen if we keep shrinking it. And offline
> memcg might soon be no longer an issue with reparenting.
>
> And while at it, make it clear that unevictable memcg will get rotated
> so following reclaim will more likely to skip them, as a optimization.
> And apply a minimal batch factor when reclaim is running with higher
> priority.
>
> Overall, the memcg LRU rotation, as described in mmzone.h,
> remains the same.
>
> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>  mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 39 insertions(+), 33 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 963362523782..d4aaaa62056d 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4913,49 +4913,41 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
>  }
>
>  static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
> -                            int swappiness, unsigned long *nr_to_scan)
> +                            struct scan_control *sc, int swappiness)
>  {
>         DEFINE_MIN_SEQ(lruvec);
>
> -       *nr_to_scan = 0;
>         /* have to run aging, since eviction is not possible anymore */
>         if (evictable_min_seq(min_seq, swappiness) + MIN_NR_GENS > max_seq)
>                 return true;
>
> -       *nr_to_scan = lruvec_evictable_size(lruvec, swappiness);
> +       /* try to get away with not aging at the default priority */

Not a native speaker, and I’ve been struggling a bit with this sentence.
Does it mean “try to avoid aging at the default priority”?

> +       if (sc->priority == DEF_PRIORITY)
> +               return false;


"This slightly changes how aging and offline memcg reclaim works:

Previously, aging was always skipped at DEF_PRIORITY even when
eviction was impossible. Now, aging is always triggered when it
is necessary to make progress."

It seems clear that you are returning false for DEF_PRIORITY.
How should I understand “aging is always triggered”?

> +
>         /* better to run aging even though eviction is still possible */
>         return evictable_min_seq(min_seq, swappiness) + MIN_NR_GENS == max_seq;
>  }
>
> -/*
> - * For future optimizations:
> - * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
> - *    reclaim.
> - */
> -static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
> +static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,
> +                          struct mem_cgroup *memcg, int swappiness)
>  {
> -       bool need_aging;
> -       unsigned long nr_to_scan;
> -       struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> -       DEFINE_MAX_SEQ(lruvec);
> +       unsigned long nr_to_scan, evictable;
>
> -       if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg))
> -               return -1;
> -
> -       need_aging = should_run_aging(lruvec, max_seq, swappiness, &nr_to_scan);
> +       evictable = lruvec_evictable_size(lruvec, swappiness);
> +       nr_to_scan = evictable;
>
>         /* try to scrape all its memory if this memcg was deleted */
> -       if (nr_to_scan && !mem_cgroup_online(memcg))
> +       if (!mem_cgroup_online(memcg))
>                 return nr_to_scan;
>
>         nr_to_scan = apply_proportional_protection(memcg, sc, nr_to_scan);
> +       nr_to_scan >>= sc->priority;
>
> -       /* try to get away with not aging at the default priority */
> -       if (!need_aging || sc->priority == DEF_PRIORITY)
> -               return nr_to_scan >> sc->priority;
> +       if (!nr_to_scan && sc->priority < DEF_PRIORITY)
> +               nr_to_scan = min(evictable, SWAP_CLUSTER_MAX);
>
> -       /* stop scanning this lruvec as it's low on cold folios */
> -       return try_to_inc_max_seq(lruvec, max_seq, swappiness, false) ? -1 : 0;
> +       return nr_to_scan;
>  }
>
>  static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc)
> @@ -4985,31 +4977,46 @@ static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc)
>         return true;
>  }
>
> +/*
> + * For future optimizations:
> + * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
> + *    reclaim.
> + */
>  static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
>  {
> +       bool need_rotate = false;
>         long nr_batch, nr_to_scan;
> -       unsigned long scanned = 0;
>         int swappiness = get_swappiness(lruvec, sc);
> +       struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> +
> +       nr_to_scan = get_nr_to_scan(lruvec, sc, memcg, swappiness);
> +       if (!nr_to_scan)
> +               need_rotate = true;
>
> -       while (true) {
> +       while (nr_to_scan > 0) {
>                 int delta;
> +               DEFINE_MAX_SEQ(lruvec);
>
> -               nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
> -               if (nr_to_scan <= 0)
> +               if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg)) {
> +                       need_rotate = true;
>                         break;
> +               }
> +
> +               if (should_run_aging(lruvec, max_seq, sc, swappiness)) {
> +                       if (try_to_inc_max_seq(lruvec, max_seq, swappiness, false))

Could we move the original comment here:
/* stop scanning this lruvec as it's low on cold folios */

Thanks
Barry


  reply	other threads:[~2026-04-16  6:34 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-12 16:48 [PATCH v5 00/14] mm/mglru: improve reclaim loop and dirty folio handling Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 01/14] mm/mglru: consolidate common code for retrieving evictable size Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 02/14] mm/mglru: rename variables related to aging and rotation Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 03/14] mm/mglru: relocate the LRU scan batch limit to callers Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 04/14] mm/mglru: restructure the reclaim loop Kairui Song via B4 Relay
2026-04-16  6:33   ` Barry Song [this message]
2026-04-16 18:47   ` Kairui Song
2026-04-12 16:48 ` [PATCH v5 05/14] mm/mglru: scan and count the exact number of folios Kairui Song via B4 Relay
2026-04-15  3:16   ` Baolin Wang
2026-04-16  7:01   ` Barry Song
2026-04-16 17:39     ` Kairui Song
2026-04-12 16:48 ` [PATCH v5 06/14] mm/mglru: use a smaller batch for reclaim Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 07/14] mm/mglru: don't abort scan immediately right after aging Kairui Song via B4 Relay
2026-04-16  7:32   ` Barry Song
2026-04-12 16:48 ` [PATCH v5 08/14] mm/mglru: remove redundant swap constrained check upon isolation Kairui Song via B4 Relay
2026-04-14  7:43   ` Chen Ridong
2026-04-15  3:19   ` Baolin Wang
2026-04-16  9:05   ` Barry Song
2026-04-12 16:48 ` [PATCH v5 09/14] mm/mglru: use the common routine for dirty/writeback reactivation Kairui Song via B4 Relay
2026-04-15  3:30   ` Baolin Wang
2026-04-16  9:18   ` Barry Song
2026-04-12 16:48 ` [PATCH v5 10/14] mm/mglru: simplify and improve dirty writeback handling Kairui Song via B4 Relay
2026-04-15  3:25   ` Baolin Wang
2026-04-12 16:48 ` [PATCH v5 11/14] mm/mglru: remove no longer used reclaim argument for folio protection Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 12/14] mm/vmscan: remove sc->file_taken Kairui Song via B4 Relay
2026-04-14  7:46   ` Chen Ridong
2026-04-12 16:48 ` [PATCH v5 13/14] mm/vmscan: remove sc->unqueued_dirty Kairui Song via B4 Relay
2026-04-14  7:46   ` Chen Ridong
2026-04-12 16:48 ` [PATCH v5 14/14] mm/vmscan: unify writeback reclaim statistic and throttling Kairui Song via B4 Relay
2026-04-17  2:51 ` [PATCH v5 00/14] mm/mglru: improve reclaim loop and dirty folio handling wangxinyu19
2026-04-17 17:52   ` Kairui Song
2026-04-18  7:17   ` wangzicheng
2026-04-18  8:16     ` Kairui Song
2026-04-18  8:55     ` Barry Song
2026-04-17  2:55 ` wangxinyu19

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGsJ_4yQDgD-etzDzXJvaf2mOja6GAuFRw1WS2NZqxBvo7Ag8w@mail.gmail.com \
    --to=baohua@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=chenridong@huaweicloud.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kaleshsingh@google.com \
    --cc=kasong@tencent.com \
    --cc=laoar.shao@gmail.com \
    --cc=lenohou@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=qi.zheng@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=stevensd@google.com \
    --cc=surenb@google.com \
    --cc=vernon2gm@gmail.com \
    --cc=wangzicheng@honor.com \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox