linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: "T.J. Mercier" <tjmercier@google.com>
Cc: Michal Hocko <mhocko@suse.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	android-mm@google.com, yuzhao@google.com,
	yangyifei03@kuaishou.com, cgroups@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Revert "mm:vmscan: fix inaccurate reclaim during proactive reclaim"
Date: Tue, 23 Jan 2024 11:48:19 -0500	[thread overview]
Message-ID: <20240123164819.GB1745986@cmpxchg.org> (raw)
In-Reply-To: <CABdmKX2K4MMe9rsKfWi9RxUS5G1RkLVzuUkPnovt5O2hqVmbWA@mail.gmail.com>

The revert isn't a straight-forward solution.

The patch you're reverting fixed conventional reclaim and broke
MGLRU. Your revert fixes MGLRU and breaks conventional reclaim.

On Tue, Jan 23, 2024 at 05:58:05AM -0800, T.J. Mercier wrote:
> They both are able to make progress. The main difference is that a
> single iteration of try_to_free_mem_cgroup_pages with MGLRU ends soon
> after it reclaims nr_to_reclaim, and before it touches all memcgs. So
> a single iteration really will reclaim only about SWAP_CLUSTER_MAX-ish
> pages with MGLRU. WIthout MGLRU the memcg walk is not aborted
> immediately after nr_to_reclaim is reached, so a single call to
> try_to_free_mem_cgroup_pages can actually reclaim thousands of pages
> even when sc->nr_to_reclaim is 32. (I.E. MGLRU overreclaims less.)
> https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@google.com/

Is that a feature or a bug?

 * 1. Memcg LRU only applies to global reclaim, and the round-robin incrementing
 *    of their max_seq counters ensures the eventual fairness to all eligible
 *    memcgs. For memcg reclaim, it still relies on mem_cgroup_iter().

If it bails out exactly after nr_to_reclaim, it'll overreclaim
less. But with steady reclaim in a complex subtree, it will always hit
the first cgroup returned by mem_cgroup_iter() and then bail. This
seems like a fairness issue.

We should figure out what the right method for balancing fairness with
overreclaim is, regardless of reclaim implementation. Because having
two different approaches and reverting dependent things back and forth
doesn't make sense.

Using an LRU to rotate through memcgs over multiple reclaim cycles
seems like a good idea. Why is this specific to MGLRU? Shouldn't this
be a generic piece of memcg infrastructure?

Then there is the question of why there is an LRU for global reclaim,
but not for subtree reclaim. Reclaiming a container with multiple
subtrees would benefit from the fairness provided by a container-level
LRU order just as much; having fairness for root but not for subtrees
would produce different reclaim and pressure behavior, and can cause
regressions when moving a service from bare-metal into a container.

Figuring out these differences and converging on a method for cgroup
fairness would be the better way of fixing this. Because of the
regression risk to the default reclaim implementation, I'm inclined to
NAK this revert.


  parent reply	other threads:[~2024-01-23 16:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-21 21:44 T.J. Mercier
2024-01-23  2:24 ` Yu Zhao
2024-01-23  9:33 ` Michal Hocko
2024-01-23 13:58   ` T.J. Mercier
2024-01-23 16:19     ` Michal Hocko
2024-01-24 17:14       ` T.J. Mercier
2024-01-23 16:48     ` Johannes Weiner [this message]
2024-01-24  7:50       ` Michal Hocko
2024-01-24 17:46       ` T.J. Mercier
2024-01-26 16:34         ` Johannes Weiner
2024-01-26 16:41           ` T.J. Mercier
2024-01-30 20:58             ` T.J. Mercier
2024-01-30 21:56               ` Johannes Weiner
2024-01-27  6:17       ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240123164819.GB1745986@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=android-mm@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tjmercier@google.com \
    --cc=yangyifei03@kuaishou.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox