Re: [PATCH 05/14] mm: workingset: let cache workingset challenge anon

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Johannes Weiner <hannes@cmpxchg.org>
To: Joonsoo Kim <js1304@gmail.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	Rik van Riel <riel@surriel.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Michal Hocko <mhocko@suse.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	LKML <linux-kernel@vger.kernel.org>,
	kernel-team@fb.com
Subject: Re: [PATCH 05/14] mm: workingset: let cache workingset challenge anon
Date: Tue, 2 Jun 2020 12:47:26 -0400	[thread overview]
Message-ID: <20200602164726.GA225032@cmpxchg.org> (raw)
In-Reply-To: <CAAmzW4NDVznjOsW1Vgg1P+0vSQarE1ziY=MN5S5f70pQiOPn-Q@mail.gmail.com>

On Tue, Jun 02, 2020 at 11:34:17AM +0900, Joonsoo Kim wrote:
> 2020년 6월 2일 (화) 오전 12:56, Johannes Weiner <hannes@cmpxchg.org>님이 작성:
> > On Mon, Jun 01, 2020 at 03:14:24PM +0900, Joonsoo Kim wrote:
> > > But, I still think that modified refault activation equation isn't
> > > safe. The next
> > > problem I found is related to the scan ratio limit patch ("limit the range of
> > > LRU type balancing") on this series. See the below example.
> > >
> > > anon: Hot (X M)
> > > file: Hot (200 M) / dummy (200 M)
> > > P: 1200 M (3 parts, each one 400 M, P1, P2, P3)
> > > Access Pattern: A -> F(H) -> P1 -> A -> F(H) -> P2 -> ... ->
> > >
> > > Without this patch, A and F(H) are kept on the memory and look like
> > > it's correct.
> > >
> > > With this patch and below fix, refault equation for Pn would be:
> > >
> > > Refault dist of Pn = 1200 (from file non-resident) + 1200 * anon scan
> > > ratio (from anon non-resident)
> > > anon + active file = X + 200
> > > 1200 + 1200 * anon scan ratio (0.5 ~ 2) < X + 200
> >
> > That doesn't look quite right to me. The anon part of the refault
> > distance is driven by X, so the left-hand of this formula contains X
> > as well.
> >
> > 1000 file (1200M reuse distance, 200M in-core size) + F(H) reactivations + X * scan ratio < X + 1000
> 
> As I said before, there is no X on left-hand of this formula. To
> access all Pn and
> re-access P1, we need 1200M file list scan and reclaim. More scan isn't needed.
> With your patch "limit the range of LRU type balancing", scan ratio
> between file/anon
> list is limited to 0.5 ~ 2.0, so, maximum anon scan would be 1200 M *
> 2.0, that is,
> 2400 M and not bounded by X. That means that file list cannot be
> stable with some X.

Oh, no X on the left because you're talking about the number of pages
scanned until the first refaults, which is fixed - so why are we still
interpreting the refault distance against a variable anon size X?

Well, that's misleading. We compare against anon because part of the
cache is already encoded in the refault distance. What we're really
checking is access distance against total amount of available RAM.

Consider this. We want to activate pages where

	access_distance <= RAM

and our measure of access distance is:

	access_distance = refault_distance + inactive_file

So the comparison becomes:

	refault_distance + inactive_file < RAM

which we simplify to:

	refault_distance < active_file + anon

There is a certain threshold for X simply because there is a certain
threshold for RAM beyond which we need to start activating. X cannot
be arbitrary, it must be X + cache filling up memory - after all we
have page reclaim evicting pages.

Again, this isn't new. In the current code, we activate when:

	refault_distance < active_file

which is

	access_distance <= RAM - anon

You can see, whether things are stable or not always depends on the
existing workingset size. It's just a proxy for how much total RAM we
have potentially available to the refaulting page.

> If my lastly found example is a correct example (your confirm is required),
> it is also related to the correctness issue since cold pages causes
> eviction of the hot pages repeatedly.

I think your example is correct, but it benefits from the VM
arbitrarily making an assumption that has a 50/50 shot of being true.

You and I know which pages are hot and which are cold because you
designed the example.

All the VM sees is this:

- We have an established workingset that has previously shown an
  access distance <= RAM and therefor was activated.

- We now have another set that also appears to have an access distance
  <= RAM. The only way to know for sure, however, is sample the
  established workingset and compare the relative access frequencies.

Currently, we just assume the incoming pages are colder. Clearly
that's beneficial when it's true. Clearly that can be totally wrong.

We must allow a fair comparison between these two sets.

For cache, that's already the case - that's why I brought up the
cache-only example: if refault distances are 50M and you have 60M of
active cache, we activate all refaults and force an even competition
between the established workingset and the new pages.

Whether we can protect active file when anon needs to shrink first and
can't (the activate/iocost split) that's a different question. But I'm
no longer so sure after looking into it further.

First, we would need two different refault distances: either we
consider anon age and need to compare to active_file + anon, or we
don't and compare to active_file only. We cannot mix willy nilly,
because the metrics wouldn't be comparable. We don't have the space to
store two different eviction timestamps, nor could we afford to cut
the precision in half.

Second, the additional page flag required to implement it.

Third, it's somewhat moot because we still have the same behavior when
active_file would need to shrink and can't. There can't be a stable
state as long as refault distances <= active_file.

> In this case, they (without patch, with patch) all have some correctness
> issue so we need to judge which one is better in terms of overall impact.
> I don't have strong opinion about it so it's up to you to decide the way to go.

If my patch was simply changing the default assumption on which pages
are hot and which are cold, I would agree with you - the pros would be
equal to the cons, one way wouldn't be more correct than the other.

But that isn't what my patch is doing. What it does is get rid of the
assumption, to actually sample and compare the access frequencies when
there isn't enough data to make an informed decision.

That's a net improvement.

next prev parent reply	other threads:[~2020-06-02 16:48 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20 23:25 [PATCH 00/14] mm: balance LRU lists based on relative thrashing v2 Johannes Weiner
2020-05-20 23:25 ` [PATCH 01/14] mm: fix LRU balancing effect of new transparent huge pages Johannes Weiner
2020-05-27 19:48   ` Shakeel Butt
2020-05-20 23:25 ` [PATCH 02/14] mm: keep separate anon and file statistics on page reclaim activity Johannes Weiner
2020-05-20 23:25 ` [PATCH 03/14] mm: allow swappiness that prefers reclaiming anon over the file workingset Johannes Weiner
2020-05-20 23:25 ` [PATCH 04/14] mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() Johannes Weiner
2020-05-20 23:25 ` [PATCH 05/14] mm: workingset: let cache workingset challenge anon Johannes Weiner
2020-05-27  2:06   ` Joonsoo Kim
2020-05-27 13:43     ` Johannes Weiner
2020-05-28  7:16       ` Joonsoo Kim
2020-05-28 17:01         ` Johannes Weiner
2020-05-29  6:48           ` Joonsoo Kim
2020-05-29 15:12             ` Johannes Weiner
2020-06-01  6:14               ` Joonsoo Kim
2020-06-01 15:56                 ` Johannes Weiner
2020-06-01 20:44                   ` Johannes Weiner
2020-06-04 13:35                     ` Vlastimil Babka
2020-06-04 15:05                       ` Johannes Weiner
2020-06-12  3:19                         ` Joonsoo Kim
2020-06-15 13:41                           ` Johannes Weiner
2020-06-16  6:09                             ` Joonsoo Kim
2020-06-02  2:34                   ` Joonsoo Kim
2020-06-02 16:47                     ` Johannes Weiner [this message]
2020-06-03  5:40                       ` Joonsoo Kim
2020-05-20 23:25 ` [PATCH 06/14] mm: remove use-once cache bias from LRU balancing Johannes Weiner
2020-05-20 23:25 ` [PATCH 07/14] mm: vmscan: drop unnecessary div0 avoidance rounding in get_scan_count() Johannes Weiner
2020-05-20 23:25 ` [PATCH 08/14] mm: base LRU balancing on an explicit cost model Johannes Weiner
2020-05-20 23:25 ` [PATCH 09/14] mm: deactivations shouldn't bias the LRU balance Johannes Weiner
2020-05-22 13:33   ` Qian Cai
2020-05-26 15:55     ` Johannes Weiner
2020-05-20 23:25 ` [PATCH 10/14] mm: only count actual rotations as LRU reclaim cost Johannes Weiner
2020-05-20 23:25 ` [PATCH 11/14] mm: balance LRU lists based on relative thrashing Johannes Weiner
2020-05-20 23:25 ` [PATCH 12/14] mm: vmscan: determine anon/file pressure balance at the reclaim root Johannes Weiner
2020-05-20 23:25 ` [PATCH 13/14] mm: vmscan: reclaim writepage is IO cost Johannes Weiner
2020-05-20 23:25 ` [PATCH 14/14] mm: vmscan: limit the range of LRU type balancing Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200602164726.GA225032@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=js1304@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan.kim@gmail.com \
    --cc=riel@surriel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox