Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Barry Song <baohua@kernel.org>
To: Kairui Song <ryncsn@gmail.com>
Cc: wangzhen <wangzhen5@honor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	David Hildenbrand <david@kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	 Qi Zheng <zhengqi.arch@bytedance.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 Lorenzo Stoakes <ljs@kernel.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	 "kasong@tencent.com" <kasong@tencent.com>,
	 "baolin.wang@linux.alibaba.com" <baolin.wang@linux.alibaba.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
Date: Thu, 9 Apr 2026 11:49:07 +0800	[thread overview]
Message-ID: <CAGsJ_4wcQTrp-zQ=pj2v5Pa3z_SLs56p+t2OmG6myz7GJdqb2A@mail.gmail.com> (raw)
In-Reply-To: <adW_O1-QAQadQ68-@KASONG-MC4>

[...]
> > > Hi, thanks for the patch.
> > >
> > > We have a very similar patch internally, and the result is kind of bad.
> > >
> > > Currently MGLRU forbid the gen distance between file and anon go larger
> > > than 2, which mean with this patch, when under great pressure, you may
> > > have to keep rotating a long list of the opposite type of folios to
> > > reclaim another type.
> > >
> > > For example, when you have only 2 gens of file folios, swap disabled,
> > > and there are 3 gens of anon folios. Anon folios are unevictable because
> > > there is no SWAP. And file is also unevcitable due to force protection
> > > of gen. Consider anon folios are mostly cold (at least a portion of them
> > > are), now the oldest gen of anon folios will be very long (e.g. 12G,
> > > 3145728 folios).
> > >
> > > Now, to reclaim any file folios, you have to age first. Before this
> > > patch that is usually fast. But after this, it will have to rotate
> > > all 3145728 folios to second oldest anon gen, will could take a
> > > very long time.
> > >
> > > During that period any concurrent reclaimer will get rejected
> > > due to force protection, result in very ugly long tailing or
> > > unexpected OOM.
> > >
> > > So I agree this is a good idea in general, I agree we should do
> > > this. But better defer this until we patch up MGLRU to remove
> > > the force protection first.
> >
> > I suspect that once we can age file and anonymous pages
> > separately, this issue will resolve itself. David already has
> > some code for this [1].
> >
> > Not sure when he will have time to push it upstream, but I
> > may carve out some time to take care of it this month.
> >
> > [1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@google.com/
>
> Hi, thanks for sharing the idea.
>
> Right, a few weeks ago I also got info from CachyOS that they are using
> following patch for MGLRU:
>
> https://github.com/firelzrd/re-swappiness
>
> The idea is also split the seq number for anon / file so swappiness
> works again.
>
> However, I really not sure if this is the right approach. It changes
> the model of MGLRU and things like TTL may no longer work as expected.
> And TTL does solve real problems too (also from CachyOS):
>
> https://github.com/firelzrd/le9uo
>
> TTL replaced the le9 patch above in a cleaner way for thrashing
> prevention.
>
> Right now we do page table walk (and it walks both anon / folio)
> while generating one unified new gen, meaning the folios in that
> gen have the same (or at least all older than a specific) access
> time, which is used as the metric for TTL.
>
> Besides, having unified gens also help implementing things like
> workingset reporting where each gen is like a bin for histogram:
>
> https://lwn.net/Articles/976985/
>
> Aging triggering could be a bit more problematic too.
> I think the right way is to just do the aging asynchronously, Yu
> even left a TODO comment in vmscan.c:
>
> /*
>  * For future optimizations:
>  * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
>  *    reclaim.
>  */

Aging asynchronously could be a separate topic, as we can
do many things in an async manner—similar to proposals for
asynchronous compression. These async approaches may improve
performance, but they also add complexity—for example, managing
CPU utilization of reclamation threads to prevent devices from
overheating.

>
> Then, we start the aging when ever there is less than 4 gens, and
> allow reclaim to always go on even if there is only 2 gens left.

I don’t think allowing reclamation with two generations left
will resolve the problem. The fundamental issue with sharing the
same generations for file and anon is that one type must catch
up with the other—either through reclamation or via what this
patch is (admittedly) doing as a workaround. If we have to go
through reclamation, that effectively makes swappiness invalid
again.

Allowing reclamation with two generations may let one type move
ahead briefly, but over a smoothed time window there is no real
difference, as the other type still has to catch up with the one
that has fewer generations left.

>
> The performance would be better since the is no more blocking
> on aging, no change to existing model, and the change should
> be smaller and easier to review IIUC.
>
> One concerning part is doing reclaim while only having 2 gens left.
> I think it seems OK. It should be rare as 3 gens act as a buffer
> already, having only 2 gens left means the async aging can't catch
> up and system is under extreme pressure so it's unlikely the folios
> will get access enough times to get meaningful heat info, and
> refault will be more meaningful help to sorting out the workingset:
>
> https://lwn.net/Articles/945266/
>
> Cgroup reclaim can do some throttling on that too, and kswapd can
> still do aging synchronically.
>
> Just some ideas, we may need to do some test and benchmark
> to figure out which is the best solution. Discussion
> is welcomed! :D

Maybe we can still find a way to address the concerns you raised
above, as well as TTL—for example, by using separate timestamps
for anon and file pages.

Thanks
Barry

     prev parent reply	other threads:[~2026-04-09  3:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <7829b070df1b405dbc97dd6a028d8c8a@honor.com>
2026-04-07 13:37 ` wangzhen
2026-04-07 14:25   ` Kairui Song
2026-04-07 23:00     ` Barry Song
2026-04-08  2:35       ` Baolin Wang
2026-04-08  3:15       ` Kairui Song
2026-04-09  3:49         ` Barry Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGsJ_4wcQTrp-zQ=pj2v5Pa3z_SLs56p+t2OmG6myz7GJdqb2A@mail.gmail.com' \
    --to=baohua@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=wangzhen5@honor.com \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox