linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Chen Ridong <chenridong@huaweicloud.com>
To: kasong@tencent.com, linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	David Hildenbrand <david@kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Lorenzo Stoakes <ljs@kernel.org>, Barry Song <baohua@kernel.org>,
	David Stevens <stevensd@google.com>, Leno Hou <lenohou@gmail.com>,
	Yafang Shao <laoar.shao@gmail.com>, Yu Zhao <yuzhao@google.com>,
	Zicheng Wang <wangzicheng@honor.com>,
	Kalesh Singh <kaleshsingh@google.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Chris Li <chrisl@kernel.org>, Vernon Yang <vernon2gm@gmail.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/8] mm/mglru: restructure the reclaim loop
Date: Tue, 24 Mar 2026 14:41:33 +0800	[thread overview]
Message-ID: <9fbb618b-19d1-4d03-8488-7e4c52c859ff@huaweicloud.com> (raw)
In-Reply-To: <20260318-mglru-reclaim-v1-3-2c46f9eb0508@tencent.com>



On 2026/3/18 3:08, Kairui Song via B4 Relay wrote:
> From: Kairui Song <kasong@tencent.com>
> 
> The current loop will calculate the scan number on each iteration. The
> number of folios to scan is based on the LRU length, with some unclear
> behaviors, eg, it only shifts the scan number by reclaim priority at the
> default priority, and it couples the number calculation with aging and
> rotation.
> 
> Adjust, simplify it, and decouple aging and rotation. Just calculate the
> scan number for once at the beginning of the reclaim, always respect the
> reclaim priority, and make the aging and rotation more explicit.
> 
> This slightly changes how offline memcg aging works: previously, offline
> memcg wouldn't be aged unless it didn't have any evictable folios. Now,
> we might age it if it has only 3 generations and the reclaim priority is
> less than DEF_PRIORITY, which should be fine. On one hand, offline memcg
> might still hold long-term folios, and in fact, a long-existing offline
> memcg must be pinned by some long-term folios like shmem. These folios
> might be used by other memcg, so aging them as ordinary memcg doesn't
> seem wrong. And besides, aging enables further reclaim of an offlined
> memcg, which will certainly happen if we keep shrinking it. And offline
> memcg might soon be no longer an issue once reparenting is all ready.
> 
> Overall, the memcg LRU rotation, as described in mmzone.h,
> remains the same.
> 
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>  mm/vmscan.c | 74 ++++++++++++++++++++++++++++++-------------------------------
>  1 file changed, 36 insertions(+), 38 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index d48074f9bd87..ed5b5f8dd3c7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4926,49 +4926,35 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
>  }
>  
>  static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
> -			     int swappiness, unsigned long *nr_to_scan)
> +			     struct scan_control *sc, int swappiness)
>  {
>  	DEFINE_MIN_SEQ(lruvec);
>  
> -	*nr_to_scan = 0;
>  	/* have to run aging, since eviction is not possible anymore */
>  	if (evictable_min_seq(min_seq, swappiness) + MIN_NR_GENS > max_seq)
>  		return true;
>  
> -	*nr_to_scan = lruvec_evictable_size(lruvec, swappiness);
> +	/* try to get away with not aging at the default priority */
> +	if (sc->priority == DEF_PRIORITY)
> +		return false;
> +
>  	/* better to run aging even though eviction is still possible */
>  	return evictable_min_seq(min_seq, swappiness) + MIN_NR_GENS == max_seq;
>  }
>  
> -/*
> - * For future optimizations:
> - * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
> - *    reclaim.
> - */
> -static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
> +static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,
> +			   struct mem_cgroup *memcg, int swappiness)
>  {
> -	bool need_aging;
>  	unsigned long nr_to_scan;
> -	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> -	DEFINE_MAX_SEQ(lruvec);
> -
> -	if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg))
> -		return -1;
> -
> -	need_aging = should_run_aging(lruvec, max_seq, swappiness, &nr_to_scan);
>  
> +	nr_to_scan = lruvec_evictable_size(lruvec, swappiness);
>  	/* try to scrape all its memory if this memcg was deleted */
> -	if (nr_to_scan && !mem_cgroup_online(memcg))
> +	if (!mem_cgroup_online(memcg))
>  		return nr_to_scan;
>  
>  	nr_to_scan = apply_proportional_protection(memcg, sc, nr_to_scan);
> -
> -	/* try to get away with not aging at the default priority */
> -	if (!need_aging || sc->priority == DEF_PRIORITY)
> -		return nr_to_scan >> sc->priority;
> -
> -	/* stop scanning this lruvec as it's low on cold folios */
> -	return try_to_inc_max_seq(lruvec, max_seq, swappiness, false) ? -1 : 0;
> +	/* always respect scan priority */
> +	return nr_to_scan >> sc->priority;
>  }
>  
>  static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc)
> @@ -4998,31 +4984,43 @@ static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc)
>  	return true;
>  }
>  
> +/*
> + * For future optimizations:
> + * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
> + *    reclaim.
> + */
>  static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
>  {
> +	bool need_rotate = false;
>  	long nr_batch, nr_to_scan;
> -	unsigned long scanned = 0;
>  	int swappiness = get_swappiness(lruvec, sc);
> +	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
>  
> -	while (true) {
> +	nr_to_scan = get_nr_to_scan(lruvec, sc, memcg, swappiness);
> +	while (nr_to_scan > 0) {
>  		int delta;
> +		DEFINE_MAX_SEQ(lruvec);
>  
> -		nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
> -		if (nr_to_scan <= 0)
> +		if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg)) {
> +			need_rotate = true;
>  			break;
> +		}
> +
> +		if (should_run_aging(lruvec, max_seq, sc, swappiness)) {
> +			if (try_to_inc_max_seq(lruvec, max_seq, swappiness, false))
> +				need_rotate = true;
> +			break;
> +		}
>  
>  		nr_batch = min(nr_to_scan, MAX_LRU_BATCH);
>  		delta = evict_folios(nr_batch, lruvec, sc, swappiness);
>  		if (!delta)
>  			break;
>  
> -		scanned += delta;
> -		if (scanned >= nr_to_scan)
> -			break;
> -
>  		if (should_abort_scan(lruvec, sc))
>  			break;
>  
> +		nr_to_scan -= delta;
>  		cond_resched();
>  	}
>  
> @@ -5034,12 +5032,12 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
>  		wakeup_flusher_threads(WB_REASON_VMSCAN);
>  
>  	/* whether this lruvec should be rotated */
> -	return nr_to_scan < 0;
> +	return need_rotate;
>  }
>  
>  static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
>  {
> -	bool success;
> +	bool need_rotate;
>  	unsigned long scanned = sc->nr_scanned;
>  	unsigned long reclaimed = sc->nr_reclaimed;
>  	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> @@ -5057,7 +5055,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
>  		memcg_memory_event(memcg, MEMCG_LOW);
>  	}
>  
> -	success = try_to_shrink_lruvec(lruvec, sc);
> +	need_rotate = try_to_shrink_lruvec(lruvec, sc);
>  
>  	shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
>  
> @@ -5067,10 +5065,10 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
>  
>  	flush_reclaim_state(sc);
>  
> -	if (success && mem_cgroup_online(memcg))
> +	if (need_rotate && mem_cgroup_online(memcg))
>  		return MEMCG_LRU_YOUNG;
>  
> -	if (!success && lruvec_is_sizable(lruvec, sc))
> +	if (!need_rotate && lruvec_is_sizable(lruvec, sc))
>  		return 0;
>  
>  	/* one retry if offlined or too small */
> 

Maybe this renaming could be combined with the renaming in path 1/7 to split the
patch, which would be much clearer. Other than that, the path looks good to me.

-- 
Best regards,
Ridong



  parent reply	other threads:[~2026-03-24  6:41 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 19:08 [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling Kairui Song via B4 Relay
2026-03-17 19:08 ` [PATCH 1/8] mm/mglru: consolidate common code for retrieving evitable size Kairui Song via B4 Relay
2026-03-17 19:55   ` Yuanchu Xie
2026-03-18  9:42   ` Barry Song
2026-03-18  9:57     ` Kairui Song
2026-03-19  1:40   ` Chen Ridong
2026-03-20 19:51     ` Axel Rasmussen
2026-03-22 16:10       ` Kairui Song
2026-03-26  6:25   ` Baolin Wang
2026-03-17 19:08 ` [PATCH 2/8] mm/mglru: relocate the LRU scan batch limit to callers Kairui Song via B4 Relay
2026-03-19  2:00   ` Chen Ridong
2026-03-19  4:12     ` Kairui Song
2026-03-20 21:00   ` Axel Rasmussen
2026-03-22  8:14   ` Barry Song
2026-03-24  6:05     ` Kairui Song
2026-03-17 19:08 ` [PATCH 3/8] mm/mglru: restructure the reclaim loop Kairui Song via B4 Relay
2026-03-20 20:09   ` Axel Rasmussen
2026-03-22 16:11     ` Kairui Song
2026-03-24  6:41   ` Chen Ridong [this message]
2026-03-26  7:31   ` Baolin Wang
2026-03-26  8:37     ` Kairui Song
2026-03-17 19:09 ` [PATCH 4/8] mm/mglru: scan and count the exact number of folios Kairui Song via B4 Relay
2026-03-20 20:57   ` Axel Rasmussen
2026-03-22 16:20     ` Kairui Song
2026-03-24  7:22       ` Chen Ridong
2026-03-24  8:05         ` Kairui Song
2026-03-24  9:10           ` Chen Ridong
2026-03-24  9:29             ` Kairui Song
2026-03-17 19:09 ` [PATCH 5/8] mm/mglru: use a smaller batch for reclaim Kairui Song via B4 Relay
2026-03-20 20:58   ` Axel Rasmussen
2026-03-24  7:51   ` Chen Ridong
2026-03-17 19:09 ` [PATCH 6/8] mm/mglru: don't abort scan immediately right after aging Kairui Song via B4 Relay
2026-03-17 19:09 ` [PATCH 7/8] mm/mglru: simplify and improve dirty writeback handling Kairui Song via B4 Relay
2026-03-20 21:18   ` Axel Rasmussen
2026-03-22 16:22     ` Kairui Song
2026-03-24  8:57   ` Chen Ridong
2026-03-24 11:09     ` Kairui Song
2026-03-26  7:56   ` Baolin Wang
2026-03-17 19:09 ` [PATCH 8/8] mm/vmscan: remove sc->file_taken Kairui Song via B4 Relay
2026-03-20 21:19   ` Axel Rasmussen
2026-03-25  4:49 ` [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling Eric Naim
2026-03-25  5:47   ` Kairui Song
2026-03-25  9:26     ` Eric Naim
2026-03-25  9:47       ` Kairui Song
2026-03-28 17:30         ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fbb618b-19d1-4d03-8488-7e4c52c859ff@huaweicloud.com \
    --to=chenridong@huaweicloud.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kaleshsingh@google.com \
    --cc=kasong@tencent.com \
    --cc=laoar.shao@gmail.com \
    --cc=lenohou@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=stevensd@google.com \
    --cc=surenb@google.com \
    --cc=vernon2gm@gmail.com \
    --cc=wangzicheng@honor.com \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox