From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 77342C46CD2
	for <linux-mm@archiver.kernel.org>; Wed, 24 Jan 2024 17:46:38 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 049158D0007; Wed, 24 Jan 2024 12:46:38 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id F14F48D0001; Wed, 24 Jan 2024 12:46:37 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id D8DFC8D0007; Wed, 24 Jan 2024 12:46:37 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id C0C9E8D0001
	for <linux-mm@kvack.org>; Wed, 24 Jan 2024 12:46:37 -0500 (EST)
Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id A23FF1A0288
	for <linux-mm@kvack.org>; Wed, 24 Jan 2024 17:46:37 +0000 (UTC)
X-FDA: 81714934434.29.2797072
Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178])
	by imf09.hostedemail.com (Postfix) with ESMTP id E235214001F
	for <linux-mm@kvack.org>; Wed, 24 Jan 2024 17:46:35 +0000 (UTC)
Authentication-Results: imf09.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=VrTIed0r;
	spf=pass (imf09.hostedemail.com: domain of tjmercier@google.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=tjmercier@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1706118395;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=ZcrjU1rXiL0wUCSwbKfLMXGroLa5TBhUecGAmJobD/I=;
	b=kOIexSiu0mfas2jwSnTNDN3V9HnZr6lJkpeaYodR9A9B57Sp6KVKdSaihZgoqVZB9E/YvT
	sisdQZCrE4NJDf+sUb15UPtJQoeUjsvoXiCs9PdSj51P9EZI1b/OcKmxKmoQVIrMIIGHuP
	J3cD+wNS0QhT6tLJWY0yRJ135qK4q7Y=
ARC-Authentication-Results: i=1;
	imf09.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=VrTIed0r;
	spf=pass (imf09.hostedemail.com: domain of tjmercier@google.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=tjmercier@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706118395; a=rsa-sha256;
	cv=none;
	b=oEO4ONo4MUv9TrdK6iadlapFJkZCJC3PMeky/zpV6z0PsWElv9X9nng/oGlY0++tqc3dnT
	iwg6Hezf9AQFw9ZHsoacPXfB0G5uMPWFq28IEH8DjU4rKcCj++TG7Xx+fkWEApZoyzl9iq
	TvYpLu1ykmae7iAQDuF3+Vn8An0x2xs=
Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-6029b5946f5so7084787b3.1
        for <linux-mm@kvack.org>; Wed, 24 Jan 2024 09:46:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1706118395; x=1706723195; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=ZcrjU1rXiL0wUCSwbKfLMXGroLa5TBhUecGAmJobD/I=;
        b=VrTIed0r4HW4O5dSVnpXxkbXBFXsZwYORRy3MfXQVWIKir7SeUzsQj5AyaRpzjsJ7l
         WJzuJUiP1fOo9Hbe3NB3Q9zECUKTVU1o0Bas0Zus+Av5GuCRzyQGF8quT/h4r7PaYKcJ
         MZaCAQ4LPvAS3r/BG2AYJkFEpFvYZei+4FPYf3Ze2/P2jEfhgUYkCDQtBHCEEptEEaWn
         XTG9BawdOETOFuwnLKk6vD7NOnqBs3TED88B026WFlt/wk5MsOr4329kbZRGMISwACEU
         jtGl5WG0KLYn9llcQUJ008gAgKpuSA0++WhgBch+sYXWAipavGlpX01o97ZcDwwyc2XV
         6sTQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1706118395; x=1706723195;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=ZcrjU1rXiL0wUCSwbKfLMXGroLa5TBhUecGAmJobD/I=;
        b=n3wAQHKy903QUQdQ7b3qvF22Unt2Pb/7kf+hvA7JmgIFwegbJ7FiOpblhpKpG9BN7X
         Bkr4RERyWguQzTqvW12ZH9yB1VM4fA9tVRBNli08F2Ebym6VBSfrKmWnOEdo/sdnEP10
         PC/TEZYxSSB/MFTwrR7Kkbmc9Fy2mfkrClW8FJhzsGa6hcvbMyngpUa7msEjlpgOy+nn
         3wO709ITq+0wcTy0W6paw/h9yOgsDc84OKly7RAQm8CGddhH9W3s6KQZR58eX/ZnPk0Q
         AxKGrNq400cKnYAhaS834bJflQaOQ30tVTd/JU+nuP/oSvNxLns1VKLVaU1axwIBwyoS
         PJ0A==
X-Gm-Message-State: AOJu0Yy5TueMCBNzwpf6hPd02R5MilK51pf0SAiXrow8wJ+7Ns298wD7
	l+MRAcKW0NvxuomNE7hzt8lIDX9hS4FossLrVkviOzNCZ0JoIbFru13/xmA1Jbzefaa3vpCO2bQ
	Mj95xUukOTCi1XFnmNezgJN4xkwDcRPgakLub
X-Google-Smtp-Source: AGHT+IGzutMsHILIshINx/VyFhlmwzviKnVmCZccXt19IOpc00/Klu99g2L+eiYhp7VbIDr/mDW1lxNH0NqhwmeCZNM=
X-Received: by 2002:a81:ae21:0:b0:5ff:a961:d91c with SMTP id
 m33-20020a81ae21000000b005ffa961d91cmr19539ywh.1.1706118394850; Wed, 24 Jan
 2024 09:46:34 -0800 (PST)
MIME-Version: 1.0
References: <20240121214413.833776-1-tjmercier@google.com> <Za-H8NNW9bL-I4gj@tiehlicka>
 <CABdmKX2K4MMe9rsKfWi9RxUS5G1RkLVzuUkPnovt5O2hqVmbWA@mail.gmail.com> <20240123164819.GB1745986@cmpxchg.org>
In-Reply-To: <20240123164819.GB1745986@cmpxchg.org>
From: "T.J. Mercier" <tjmercier@google.com>
Date: Wed, 24 Jan 2024 09:46:23 -0800
Message-ID: <CABdmKX1uDsnFSG2YCyToZHD2R+A9Vr=SKeLgSqPocUgWd16+XA@mail.gmail.com>
Subject: Re: [PATCH] Revert "mm:vmscan: fix inaccurate reclaim during
 proactive reclaim"
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>, Roman Gushchin <roman.gushchin@linux.dev>, 
	Shakeel Butt <shakeelb@google.com>, Muchun Song <muchun.song@linux.dev>, 
	Andrew Morton <akpm@linux-foundation.org>, android-mm@google.com, yuzhao@google.com, 
	yangyifei03@kuaishou.com, cgroups@vger.kernel.org, linux-mm@kvack.org, 
	linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: E235214001F
X-Rspam-User: 
X-Stat-Signature: sj4nftt159d55kh4ctgf4io9ky4qepar
X-Rspamd-Server: rspam01
X-HE-Tag: 1706118395-275361
X-HE-Meta: U2FsdGVkX1/xZrVEhEGt/nRtsb22BNZslm81ETkzraxEzP8JJmCc24RMrRkVIGB/nEu4w410jwBEM8p2wfBJjewMJ51J6bxkI5UVDUeFRURDNhwxbPvgkfU76aQ8Qz4sN6udmHBwe4uDgLWHuBz/S01x+M8p1fmLUfxP1dLSZZfGXH01perygLQl4zv/edgrWgnewPLrL0dzjSilpNCIE1v0nAeaPImDUDpnGRloQW7rISaAupOX048FJrhQxqGnpnJWJuPgepQHcSbCQ75u9noLUV5Sap1lxgBsSd2NDYddhor6jEwLQ7T5tIJCtBMSXYdYegz9zy0rJ5MX/iLU2S20piYEbUF3Bm7qu/lprQd4EOv4NUznLARO8QNT2iomFJyDZa1RkG16NrcxHBmnCso1eAbNprpTuEvPmnKQsVcVDCt/cqLmnGmC/7XRtK0eF6n9MLZoONu3AoI+J/4FXRjerU9QxggpwnNYpY4U1TaWq1hcJ6f0y0kZ6egehxaf1XpZp2LkSzi95jmgMYNVZdlArqzQPJAsHAvF4jKaEtsrpIM3JNaVbTA4YCmNAjSipCtB2IMFB+EKgb8lZ1k2g2WIkQqtirGZYuuFmDDWuAU2olPP2hrfz0BRX9qKOdS7yCX4L+80e56gU7DB5do5vdk9nbYnYX7zwOExxX4iUGhDosBJezJoCIFeoRunUyUOkS9Ggt2W2nG8T6lfAFBEX1Q253ZWpicWQKaWPm2Eb6AmFT1Tgte711E3QIr+IeebgsARSne6zYwv7zoP0wiCY9hWnczR39sg/XZIUi2hbYtX16x/au1M009n7I3CbI+jmgiP0Q/vhpqJ/kinIjWQjxopPZbudiQSQSaCb53sIH8PBjvhaxq8zd1qKkSO3MwSXG46bidQaMoYYminfypnKMPgYuTETuRZQNCRSfsJF7GF98w/zB7arx6fCaKc2lV0+9S9tSQr59ef8E1SBDh
 tZ/9M8/G
 1hRFe7K1z1G/uJvBWJ35wnujNIfbAR+B0X9cLyjBmtzoMpdxi4M07oQ6rcE5UmBrpw6FQD/PWTJQB4KoOCHg9gqA3jM5FYsQYpUJZ6mtsWxL6+Iajk0BcGU4ZjBCLKCMzxLc03peezECxFaxl2+RmJHFWQ94hfgHnZ97g3XWqn3bHPFLN+uV5RqWJEZdS69PsKid+Nxdfc0p9n5S5G0beYzfcKjmyUKGpPoqGOiw7z7zLONnpoPU/eBDl822OuiAw+j/yKEX4SbyOHxNBNL2GqQW/z9AyvyABsXuWx4TOxzc1o49e6Nv6r+aNnZA3okL7rRE2K0qwk1oPpUBYqeZmWkxQ6e2pfxg7QLXVwMLhjV/tcvpFHT3bcJ9KUw==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Tue, Jan 23, 2024 at 8:48=E2=80=AFAM Johannes Weiner <hannes@cmpxchg.org=
> wrote:
>
> The revert isn't a straight-forward solution.
>
> The patch you're reverting fixed conventional reclaim and broke
> MGLRU. Your revert fixes MGLRU and breaks conventional reclaim.
>
> On Tue, Jan 23, 2024 at 05:58:05AM -0800, T.J. Mercier wrote:
> > They both are able to make progress. The main difference is that a
> > single iteration of try_to_free_mem_cgroup_pages with MGLRU ends soon
> > after it reclaims nr_to_reclaim, and before it touches all memcgs. So
> > a single iteration really will reclaim only about SWAP_CLUSTER_MAX-ish
> > pages with MGLRU. WIthout MGLRU the memcg walk is not aborted
> > immediately after nr_to_reclaim is reached, so a single call to
> > try_to_free_mem_cgroup_pages can actually reclaim thousands of pages
> > even when sc->nr_to_reclaim is 32. (I.E. MGLRU overreclaims less.)
> > https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@google.com/
>
> Is that a feature or a bug?

Feature!

>  * 1. Memcg LRU only applies to global reclaim, and the round-robin incre=
menting
>  *    of their max_seq counters ensures the eventual fairness to all elig=
ible
>  *    memcgs. For memcg reclaim, it still relies on mem_cgroup_iter().
>
> If it bails out exactly after nr_to_reclaim, it'll overreclaim
> less. But with steady reclaim in a complex subtree, it will always hit
> the first cgroup returned by mem_cgroup_iter() and then bail. This
> seems like a fairness issue.

Right. Because the memcg LRU is maintained in pg_data_t and not in
each cgroup, I think we are currently forced to have the iteration
across all child memcgs for non-root memcg reclaim for fairness.

> We should figure out what the right method for balancing fairness with
> overreclaim is, regardless of reclaim implementation. Because having
> two different approaches and reverting dependent things back and forth
> doesn't make sense.
>
> Using an LRU to rotate through memcgs over multiple reclaim cycles
> seems like a good idea. Why is this specific to MGLRU? Shouldn't this
> be a generic piece of memcg infrastructure?

It would be pretty sweet if it were. I haven't tried to measure this
part in isolation, but I know we had to abandon attempts to use
per-app memcgs in the past (2018?) because the perf overhead was too
much. In recent tests where this feature is used, I see some perf
gains which I think are probably attributable to this.

> Then there is the question of why there is an LRU for global reclaim,
> but not for subtree reclaim. Reclaiming a container with multiple
> subtrees would benefit from the fairness provided by a container-level
> LRU order just as much; having fairness for root but not for subtrees
> would produce different reclaim and pressure behavior, and can cause
> regressions when moving a service from bare-metal into a container.
>
> Figuring out these differences and converging on a method for cgroup
> fairness would be the better way of fixing this. Because of the
> regression risk to the default reclaim implementation, I'm inclined to
> NAK this revert.

In the meantime, instead of a revert how about changing the batch size
geometrically instead of the SWAP_CLUSTER_MAX constant:

                reclaimed =3D try_to_free_mem_cgroup_pages(memcg,
-                                       min(nr_to_reclaim -
nr_reclaimed, SWAP_CLUSTER_MAX),
+                                       (nr_to_reclaim - nr_reclaimed)/2,
                                        GFP_KERNEL, reclaim_options);

I think that should address the overreclaim concern (it was mentioned
that the upper bound of overreclaim was 2 * request), and this should
also increase the reclaim rate for root reclaim with MGLRU closer to
what it was before.