Re: [PATCH v5 00/14] mm/mglru: improve reclaim loop and dirty folio handling

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Kairui Song <ryncsn@gmail.com>
To: wangzicheng <wangzicheng@honor.com>
Cc: wangxinyu19 <wxy2009nrrr@163.com>,
	 "devnull+kasong.tencent.com@kernel.org"
	<devnull+kasong.tencent.com@kernel.org>,
	 "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	 "axelrasmussen@google.com" <axelrasmussen@google.com>,
	"baohua@kernel.org" <baohua@kernel.org>,
	 "baolin.wang@linux.alibaba.com" <baolin.wang@linux.alibaba.com>,
	 "chenridong@huaweicloud.com" <chenridong@huaweicloud.com>,
	"chrisl@kernel.org" <chrisl@kernel.org>,
	 "david@kernel.org" <david@kernel.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	 "kaleshsingh@google.com" <kaleshsingh@google.com>,
	"laoar.shao@gmail.com" <laoar.shao@gmail.com>,
	 "lenohou@gmail.com" <lenohou@gmail.com>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	 "ljs@kernel.org" <ljs@kernel.org>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	 "qi.zheng@linux.dev" <qi.zheng@linux.dev>,
	"shakeel.butt@linux.dev" <shakeel.butt@linux.dev>,
	 "stevensd@google.com" <stevensd@google.com>,
	"surenb@google.com" <surenb@google.com>,
	 "vernon2gm@gmail.com" <vernon2gm@gmail.com>,
	"weixugc@google.com" <weixugc@google.com>,
	 "yuanchu@google.com" <yuanchu@google.com>,
	"yuzhao@google.com" <yuzhao@google.com>,
	 "zhengqi.arch@bytedance.com" <zhengqi.arch@bytedance.com>,
	wangzhen <wangzhen5@honor.com>,  wangtao <tao.wangtao@honor.com>
Subject: Re: [PATCH v5 00/14] mm/mglru: improve reclaim loop and dirty folio handling
Date: Sat, 18 Apr 2026 19:50:26 +0800	[thread overview]
Message-ID: <CAMgjq7Cac7jo2PB4uUcr0usq6vQDRVSCFNQ8JBXD+7yHE+95rg@mail.gmail.com> (raw)
In-Reply-To: <830980eb128a49c6adc55571b7015fab@honor.com>

On Sat, Apr 18, 2026 at 5:08 PM wangzicheng <wangzicheng@honor.com> wrote:
> There is indeed a relatively large gap between mm-unstable and our
> android16-6.12 tree. The series was backported manually and we only
> applied the changes required to make it build and run in our tree.
>
> Because of this, it is possible that some related changes from
> mm-unstable were not included, which may have affected the behavior or
> performance we observed. If this caused misleading results, we
> apologize for the confusion.
>
> Regarding vendor hooks, in our tree there is only one hook in
> get_nr_to_scan(). We tested with that hook disabled.
>
> The performance data was collected using Perfetto traces.
> Unfortunately those traces contain a large amount of runtime
> information and are not easy to share externally.
>
> If needed, we can also try to reproduce the test on a tree closer to
> mm-unstable once our chipset platform kernel tree gets updated to
> a newer version, to see whether the behavior still reproduces.
>
> Below is the patch we manually applied during the backport.
>

Hi Zicheng!

Thanks for sharing this. It helps a lot!

I'm still not sure how I can reproduce your issue though. Android have
many adaptive behaviors and vendors (in userspace) have many
customized policies too, so maybe some metrics change have unexpected
behavior.

>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f78cfe059f14..50109cd5e94c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1987,6 +1987,44 @@ static int current_may_throttle(void)
>         return !(current->flags & PF_LOCAL_THROTTLE);
>  }
>
> +static void handle_reclaim_writeback(unsigned long nr_taken,
> +                                    struct pglist_data *pgdat,
> +                                    struct scan_control *sc,
> +                                    struct reclaim_stat *stat)
> +{
> +       /*
> +        * If dirty folios are scanned that are not queued for IO, it
> +        * implies that flushers are not doing their job. This can
> +        * happen when memory pressure pushes dirty folios to the end of
> +        * the LRU before the dirty limits are breached and the dirty
> +        * data has expired. It can also happen when the proportion of
> +        * dirty folios grows not through writes but through memory
> +        * pressure reclaiming all the clean cache. And in some cases,
> +        * the flushers simply cannot keep up with the allocation
> +        * rate. Nudge the flusher threads in case they are asleep.
> +        */
> +       if (stat->nr_unqueued_dirty == nr_taken && nr_taken) {
> +               wakeup_flusher_threads(WB_REASON_VMSCAN);
> +               /*
> +                * For cgroupv1 dirty throttling is achieved by waking up
> +                * the kernel flusher here and later waiting on folios
> +                * which are in writeback to finish (see shrink_folio_list()).
> +                *
> +                * Flusher may not be able to issue writeback quickly
> +                * enough for cgroupv1 writeback throttling to work
> +                * on a large system.
> +                */
> +               if (!writeback_throttling_sane(sc))
> +                       reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
> +       }
> +
> +       sc->nr.dirty += stat->nr_dirty;
> +       sc->nr.congested += stat->nr_congested;
> +       sc->nr.writeback += stat->nr_writeback;
> +       sc->nr.immediate += stat->nr_immediate;
> +       sc->nr.taken += nr_taken;
> +}
> +
>  /*
>   * shrink_inactive_list() is a helper for shrink_node().  It returns the number
>   * of reclaimed pages
> @@ -2054,41 +2092,15 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>
>         lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed);
>
> -       /*
> -        * If dirty folios are scanned that are not queued for IO, it
> -        * implies that flushers are not doing their job. This can
> -        * happen when memory pressure pushes dirty folios to the end of
> -        * the LRU before the dirty limits are breached and the dirty
> -        * data has expired. It can also happen when the proportion of
> -        * dirty folios grows not through writes but through memory
> -        * pressure reclaiming all the clean cache. And in some cases,
> -        * the flushers simply cannot keep up with the allocation
> -        * rate. Nudge the flusher threads in case they are asleep.
> -        */
> -       if (stat.nr_unqueued_dirty == nr_taken) {
> -               wakeup_flusher_threads(WB_REASON_VMSCAN);
> -               /*
> -                * For cgroupv1 dirty throttling is achieved by waking up
> -                * the kernel flusher here and later waiting on folios
> -                * which are in writeback to finish (see shrink_folio_list()).
> -                *
> -                * Flusher may not be able to issue writeback quickly
> -                * enough for cgroupv1 writeback throttling to work
> -                * on a large system.
> -                */
> -               if (!writeback_throttling_sane(sc))
> -                       reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
> -       }
> +
> +       // sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
> +       // leave nr_unqueued_dirty in scan_control to keep integrity
>
> -       sc->nr.dirty += stat.nr_dirty;
> -       sc->nr.congested += stat.nr_congested;
> -       sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
> -       sc->nr.writeback += stat.nr_writeback;
> -       sc->nr.immediate += stat.nr_immediate;
> -       sc->nr.taken += nr_taken;
> -       if (file)
> -               sc->nr.file_taken += nr_taken;
> +       // if (file)
> +       //      sc->nr.file_taken += nr_taken;
> +       // leave nr_taken in scan_control to keep integrity
>
> +       handle_reclaim_writeback(nr_taken, pgdat, sc, &stat);

Since it's not a full backport, the backport itself might be buggy or
be missing things or dependency. For example, this part, I dropped
nr_unqueued_dirty and file_taken in this series, that's perfectly fine
for upstream mainline after 2f05435df932 (6.19), but I just checked
android16-6.12 branch of AOSP, if you remove this counter update here,
maybe some dirty reactivation path is completely broken, or if there
are related downstream metrics or user, they are broken.

> -static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
> +static unsigned long lruvec_evictable_size(struct lruvec *lruvec, int swappiness)
>  {
>         int gen, type, zone;
> -       unsigned long total = 0;
> -       int swappiness = get_swappiness(lruvec, sc);
> +       unsigned long seq, total = 0;
>         struct lru_gen_folio *lrugen = &lruvec->lrugen;
> -       struct mem_cgroup *memcg = lruvec_memcg(lruvec);
>         DEFINE_MAX_SEQ(lruvec);
>         DEFINE_MIN_SEQ(lruvec);
>
>         for_each_evictable_type(type, swappiness) {
> -               unsigned long seq;
> -
>                 for (seq = min_seq[type]; seq <= max_seq; seq++) {
>                         gen = lru_gen_from_seq(seq);
> -
>                         for (zone = 0; zone < MAX_NR_ZONES; zone++)
>                                 total += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
>                 }
>         }
>
> +       return total;
> +}
> +
> +static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
> +{
> +       unsigned long total;
> +       int swappiness = get_swappiness(lruvec, sc);
> +       struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> +
> +       total = lruvec_evictable_size(lruvec, swappiness);
> +
>         /* whether the size is big enough to be helpful */
>         return mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
>  }
> @@ -4475,7 +4496,6 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
>                        int tier_idx)
>  {
>         bool success;
> -       bool dirty, writeback;
>         int gen = folio_lru_gen(folio);
>         int type = folio_is_file_lru(folio);
>         int zone = folio_zonenum(folio);
> @@ -4505,7 +4525,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
>
>         /* protected */
>         if (tier > tier_idx || refs + workingset == BIT(LRU_REFS_WIDTH) + 1) {
> -               gen = folio_inc_gen(lruvec, folio, false);
> +               gen = folio_inc_gen(lruvec, folio);
>                 list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
>
>                 /* don't count the workingset being lazily promoted */
> @@ -4520,26 +4540,11 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
>
>         /* ineligible */
>         if (!folio_test_lru(folio) || zone > sc->reclaim_idx) {
> -               gen = folio_inc_gen(lruvec, folio, false);
> +               gen = folio_inc_gen(lruvec, folio);
>                 list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
>                 return true;
>         }
>
> -       dirty = folio_test_dirty(folio);
> -       writeback = folio_test_writeback(folio);
> -       if (type == LRU_GEN_FILE && dirty) {
> -               sc->nr.file_taken += delta;
> -               if (!writeback)
> -                       sc->nr.unqueued_dirty += delta;
> -       }
> -
> -       /* waiting for writeback */
> -       if (writeback || (type == LRU_GEN_FILE && dirty)) {
> -               gen = folio_inc_gen(lruvec, folio, true);
> -               list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
> -               return true;
> -       }
> -
>         return false;
>  }
>
> @@ -4547,12 +4552,6 @@ bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct scan_contr
>  {
>         bool success;
>
> -       /* swap constrained */
> -       if (!(sc->gfp_mask & __GFP_IO) &&
> -           (folio_test_dirty(folio) ||
> -            (folio_test_anon(folio) && !folio_test_swapcache(folio))))
> -               return false;
> -
>         /* raced with release_pages() */
>         if (!folio_try_get(folio))
>                 return false;
> @@ -4567,8 +4566,6 @@ bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct scan_contr
>         if (!folio_test_referenced(folio))
>                 set_mask_bits(&folio->flags, LRU_REFS_MASK, 0);
>
> -       /* for shrink_folio_list() */
> -       folio_clear_reclaim(folio);
>
>         success = lru_gen_del_folio(lruvec, folio, true);
>         VM_WARN_ON_ONCE_FOLIO(!success, folio);
> @@ -4577,8 +4574,9 @@ bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct scan_contr
>  }
>  EXPORT_SYMBOL_GPL(isolate_folio);
>
> -static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> -                      int type, int tier, struct list_head *list)
> +static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec, struct scan_control *sc,
> +                      int type, int tier,
> +                          struct list_head *list, int *isolatedp)
>  {
>         int i;
>         int gen;
> @@ -4587,10 +4585,11 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
>         int scanned = 0;
>         int isolated = 0;
>         int skipped = 0;
> -       int remaining = MAX_LRU_BATCH;
> +       unsigned long remaining = nr_to_scan;
>         struct lru_gen_folio *lrugen = &lruvec->lrugen;
>         struct mem_cgroup *memcg = lruvec_memcg(lruvec);
>
> +       VM_WARN_ON_ONCE(nr_to_scan > MAX_LRU_BATCH);
>         VM_WARN_ON_ONCE(!list_empty(list));
>
>         if (get_nr_gens(lruvec, type) == MIN_NR_GENS)
> @@ -4647,16 +4646,12 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
>         __count_memcg_events(memcg, item, isolated);
>         __count_memcg_events(memcg, PGREFILL, sorted);
>         __count_vm_events(PGSCAN_ANON + type, isolated);
> -       trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, MAX_LRU_BATCH,
> +       trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
>                                 scanned, skipped, isolated,
>                                 type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
> -       if (type == LRU_GEN_FILE)
> -               sc->nr.file_taken += isolated;
> -       /*
> -        * There might not be eligible folios due to reclaim_idx. Check the
> -        * remaining to prevent livelock if it's not making progress.
> -        */
> -       return isolated || !remaining ? scanned : 0;
> +
> +       *isolatedp = isolated;
> +       return scanned;
>  }
>
>  static int get_tier_idx(struct lruvec *lruvec, int type)
> @@ -4698,33 +4693,36 @@ static int get_type_to_scan(struct lruvec *lruvec, int swappiness)
>         return positive_ctrl_err(&sp, &pv);
>  }
>
> -static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness,
> -                         int *type_scanned, struct list_head *list)
> +static int isolate_folios(unsigned long nr_to_scan, struct lruvec *lruvec, struct scan_control *sc, int swappiness,
> +                          struct list_head *list, int *isolated,
> +                         int *isolate_type, int *isolate_scanned)
>  {
>         int i;
> +       int scanned = 0;
>         int type = get_type_to_scan(lruvec, swappiness);
>
>         for_each_evictable_type(i, swappiness) {
> -               int scanned;
> +               int type_scan;
>                 int tier = get_tier_idx(lruvec, type);
>
> -               *type_scanned = type;
> +               type_scan = scan_folios(nr_to_scan, lruvec, sc,
> +                                       type, tier, list, isolated);
>
> -               scanned = scan_folios(lruvec, sc, type, tier, list);
> -               if (scanned)
> -                       return scanned;
> +               scanned += type_scan;
> +               if (*isolated) {
> +                       *isolate_type = type;
> +                       *isolate_scanned = type_scan;
> +                       break;
> +               }
>
>                 type = !type;
>         }
>
> -       return 0;
> +       return scanned;
>  }
>
> -static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
> +static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec, struct scan_control *sc, int swappiness)

The signature change in upstream comes with the proportional
protection, simply changing that downstream might be missing things
and we are not on the same baseline.

>  {
> -       int type;
> -       int scanned;
> -       int reclaimed;
>         LIST_HEAD(list);
>         LIST_HEAD(clean);
>         struct folio *folio;
> @@ -4732,19 +4730,23 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>         enum vm_event_item item;
>         struct reclaim_stat stat;
>         struct lru_gen_mm_walk *walk;
> +       int scanned, reclaimed;
> +       int isolated = 0, type, type_scanned;
>         bool skip_retry = false;
> -       struct lru_gen_folio *lrugen = &lruvec->lrugen;
>         struct mem_cgroup *memcg = lruvec_memcg(lruvec);
>         struct pglist_data *pgdat = lruvec_pgdat(lruvec);
>
>         spin_lock_irq(&lruvec->lru_lock);
>
> -       scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
> +       /* In case folio deletion left empty old gens, flush them */
> +       try_to_inc_min_seq(lruvec, swappiness);
>
> -       scanned += try_to_inc_min_seq(lruvec, swappiness);
> +       scanned = isolate_folios(nr_to_scan, lruvec, sc, swappiness,
> +                                &list, &isolated, &type, &type_scanned);
>
> -       if (evictable_min_seq(lrugen->min_seq, swappiness) + MIN_NR_GENS > lrugen->max_seq)
> -               scanned = 0;
> +       /* Isolation might create empty gen, flush them */
> +       if (scanned)
> +               try_to_inc_min_seq(lruvec, swappiness);
>
>         spin_unlock_irq(&lruvec->lru_lock);
>
> @@ -4752,10 +4754,10 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>                 return scanned;
>  retry:
>         reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false);
> -       sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
>         sc->nr_reclaimed += reclaimed;
> +       handle_reclaim_writeback(isolated, pgdat, sc, &stat);
>         trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
> -                       scanned, reclaimed, &stat, sc->priority,
> +                       type_scanned, reclaimed, &stat, sc->priority,
>                         type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
>
>         list_for_each_entry_safe_reverse(folio, next, &list, lru) {
> @@ -4804,6 +4806,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>
>         if (!list_empty(&list)) {
>                 skip_retry = true;
> +               isolated = 0;
>                 goto retry;
>         }
>
> @@ -4813,28 +4816,14 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>  static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
>                              int swappiness, unsigned long *nr_to_scan)
>  {
> -       int gen, type, zone;
> -       unsigned long size = 0;
> -       struct lru_gen_folio *lrugen = &lruvec->lrugen;
>         DEFINE_MIN_SEQ(lruvec);
>
> -       *nr_to_scan = 0;
>         /* have to run aging, since eviction is not possible anymore */
>         if (evictable_min_seq(min_seq, swappiness) + MIN_NR_GENS > max_seq)
>                 return true;

And you lost the DEF_PRIORITY early-return here.

>
> -       for_each_evictable_type(type, swappiness) {
> -               unsigned long seq;
> -
> -               for (seq = min_seq[type]; seq <= max_seq; seq++) {
> -                       gen = lru_gen_from_seq(seq);
> +       *nr_to_scan = lruvec_evictable_size(lruvec, swappiness);
>
> -                       for (zone = 0; zone < MAX_NR_ZONES; zone++)
> -                               size += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
> -               }
> -       }
> -
> -       *nr_to_scan = size;
>         /* better to run aging even though eviction is still possible */
>         return evictable_min_seq(min_seq, swappiness) + MIN_NR_GENS == max_seq;
>  }
> @@ -4844,27 +4833,55 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
>   * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
>   *    reclaim.
>   */
> -static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
> -{
> -       bool success;
> -       unsigned long nr_to_scan;
> -       struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> -       DEFINE_MAX_SEQ(lruvec);
> +// static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,
> +//                        struct mem_cgroup *memcg, int swappiness)
> +// {
> +//     unsigned long nr_to_scan, evictable;
> +//     bool bypass = false;
> +//     bool young = false;
> +//     DEFINE_MAX_SEQ(lruvec);
> +
> +//     evictable = lruvec_evictable_size(lruvec, swappiness);
> +//     nr_to_scan = evictable;
> +
> +//     /* try to scrape all its memory if this memcg was deleted */
> +//     if (!mem_cgroup_online(memcg))
> +//             return nr_to_scan;
> +
> +//     // nr_to_scan = apply_proportional_protection(memcg, sc, nr_to_scan);
> +//     // not exist in the android code
> +//     nr_to_scan >>= sc->priority;
> +
> +//     if (!nr_to_scan && sc->priority < DEF_PRIORITY)
> +//             nr_to_scan = min(evictable, SWAP_CLUSTER_MAX);
> +
> +//     trace_android_vh_mglru_aging_bypass(lruvec, max_seq,
> +//             swappiness, &bypass, &young);

This part looks really hackish... I'm not sure if anything is wrong.

> +//     if (bypass)
> +//             return young ? -1 : 0;
> +
> +//     return nr_to_scan;
> +// }
> +/*
> + * For future optimizations:
> + * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
> + *    reclaim.
> + */
>  static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
>  {
> -       long nr_to_scan;
> -       unsigned long scanned = 0;
> +       bool need_rotate = false, should_age = false;
> +       long nr_batch, nr_to_scan;
>         int swappiness = get_swappiness(lruvec, sc);
> +       struct mem_cgroup *memcg = lruvec_memcg(lruvec);
>
> -       while (true) {
> +       nr_to_scan = get_nr_to_scan(lruvec, sc, memcg, swappiness);
> +       if (!nr_to_scan)
> +               need_rotate = true;
> +
> +       while (nr_to_scan > 0) {
>                 int delta;
> +               DEFINE_MAX_SEQ(lruvec);
>
> -               nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
> -               if (nr_to_scan <= 0)
> +               if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg)) {
> +                       need_rotate = true;
>                         break;
> +               }
>
> -               delta = evict_folios(lruvec, sc, swappiness);
> +               if (should_run_aging(lruvec, max_seq, swappiness, &nr_to_scan)) {

Here should_run_aging() clobbers the same nr_to_scan the loop, which
changes the reclaim behavior dramatically compared to this series.

> +                       if (try_to_inc_max_seq(lruvec, max_seq, swappiness, false))
> +                               need_rotate = true;
> +                       should_age = true;
> +               }
> +
> +               nr_batch = min(nr_to_scan, MIN_LRU_BATCH);
> +               delta = evict_folios(nr_batch, lruvec, sc, swappiness);
>                 if (!delta)
>                         break;
>
> -               scanned += delta;
> -               if (scanned >= nr_to_scan)
> +               if (should_abort_scan(lruvec, sc))
>                         break;
>
> -               if (should_abort_scan(lruvec, sc))
> +               /* For cgroup reclaim, fairness is handled by iterator, not rotation */
> +               if (root_reclaim(sc) && should_age)
>                         break;
>
>                 cond_resched();

And here you are not doing "nr_to_scan -= delta". Maybe the reclaim
will keep going on in a extreme aggressive way? The new design meant
to use nr_to_scan as budget, not a bool.

>         }
>
> -       /*
> -        * If too many file cache in the coldest generation can't be evicted
> -        * due to being dirty, wake up the flusher.
> -        */
> -       if (sc->nr.unqueued_dirty && sc->nr.unqueued_dirty == sc->nr.file_taken)
> -               wakeup_flusher_threads(WB_REASON_VMSCAN);
> -
> -       /* whether this lruvec should be rotated */
> -       return nr_to_scan < 0;
> +       return need_rotate;
>  }
>

So I did a quick look of this backport, it does looks very buggy
itself with several inconsistent part I can identify on spot, and
besides I'm not sure if there are more gaps in other parts in the
downstream with this series.

I think you are simply not testing the same thing as I posted. It pass
the build doesn't mean it's correct, at least the reactivation, budget
and the aging part might be kind of broken.

Don't worry, your workload and concern definitely make sense, but I
think we really need to come up with some reproducible tests that can
be benchmarked upstreamly to avoid confusion and inaccuracy, so all
our cases can be better covered.

I'll also try to do a few more tests on my android phone. And feel
free to provide more suggestion or cases :)

next prev parent reply	other threads:[~2026-04-18 11:51 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-12 16:48 Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 01/14] mm/mglru: consolidate common code for retrieving evictable size Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 02/14] mm/mglru: rename variables related to aging and rotation Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 03/14] mm/mglru: relocate the LRU scan batch limit to callers Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 04/14] mm/mglru: restructure the reclaim loop Kairui Song via B4 Relay
2026-04-16  6:33   ` Barry Song
2026-04-16 18:47   ` Kairui Song
2026-04-12 16:48 ` [PATCH v5 05/14] mm/mglru: scan and count the exact number of folios Kairui Song via B4 Relay
2026-04-15  3:16   ` Baolin Wang
2026-04-16  7:01   ` Barry Song
2026-04-16 17:39     ` Kairui Song
2026-04-12 16:48 ` [PATCH v5 06/14] mm/mglru: use a smaller batch for reclaim Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 07/14] mm/mglru: don't abort scan immediately right after aging Kairui Song via B4 Relay
2026-04-16  7:32   ` Barry Song
2026-04-12 16:48 ` [PATCH v5 08/14] mm/mglru: remove redundant swap constrained check upon isolation Kairui Song via B4 Relay
2026-04-14  7:43   ` Chen Ridong
2026-04-15  3:19   ` Baolin Wang
2026-04-16  9:05   ` Barry Song
2026-04-12 16:48 ` [PATCH v5 09/14] mm/mglru: use the common routine for dirty/writeback reactivation Kairui Song via B4 Relay
2026-04-15  3:30   ` Baolin Wang
2026-04-16  9:18   ` Barry Song
2026-04-12 16:48 ` [PATCH v5 10/14] mm/mglru: simplify and improve dirty writeback handling Kairui Song via B4 Relay
2026-04-15  3:25   ` Baolin Wang
2026-04-12 16:48 ` [PATCH v5 11/14] mm/mglru: remove no longer used reclaim argument for folio protection Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 12/14] mm/vmscan: remove sc->file_taken Kairui Song via B4 Relay
2026-04-14  7:46   ` Chen Ridong
2026-04-12 16:48 ` [PATCH v5 13/14] mm/vmscan: remove sc->unqueued_dirty Kairui Song via B4 Relay
2026-04-14  7:46   ` Chen Ridong
2026-04-12 16:48 ` [PATCH v5 14/14] mm/vmscan: unify writeback reclaim statistic and throttling Kairui Song via B4 Relay
2026-04-18 16:57   ` Kairui Song
2026-04-17  2:51 ` [PATCH v5 00/14] mm/mglru: improve reclaim loop and dirty folio handling wangxinyu19
2026-04-17 17:52   ` Kairui Song
2026-04-18  7:17   ` wangzicheng
2026-04-18  8:16     ` Kairui Song
2026-04-18  9:08       ` wangzicheng
2026-04-18 11:50         ` Kairui Song [this message]
2026-04-18  8:55     ` Barry Song
2026-04-17  2:55 ` wangxinyu19

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMgjq7Cac7jo2PB4uUcr0usq6vQDRVSCFNQ8JBXD+7yHE+95rg@mail.gmail.com \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=chenridong@huaweicloud.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=devnull+kasong.tencent.com@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kaleshsingh@google.com \
    --cc=laoar.shao@gmail.com \
    --cc=lenohou@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=qi.zheng@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=stevensd@google.com \
    --cc=surenb@google.com \
    --cc=tao.wangtao@honor.com \
    --cc=vernon2gm@gmail.com \
    --cc=wangzhen5@honor.com \
    --cc=wangzicheng@honor.com \
    --cc=weixugc@google.com \
    --cc=wxy2009nrrr@163.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox