From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8427C46CD2 for ; Sat, 27 Jan 2024 06:18:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C05516B0072; Sat, 27 Jan 2024 01:18:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BB5CA6B0074; Sat, 27 Jan 2024 01:18:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA48B6B0078; Sat, 27 Jan 2024 01:18:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9C0956B0072 for ; Sat, 27 Jan 2024 01:18:09 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1D5CF120641 for ; Sat, 27 Jan 2024 06:18:09 +0000 (UTC) X-FDA: 81724085898.06.1C76F91 Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf12.hostedemail.com (Postfix) with ESMTP id 4950840007 for ; Sat, 27 Jan 2024 06:18:07 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=lLyGjeMe; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of yuzhao@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706336287; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rQS8ABsYVYk+vXzULL6wVLsWJRItyzMNzDk9tNnAZhk=; b=xavXVedLxJOvsLyQ40AEUg1JGW9CZq+a7uJVzaor4TkyJIf1N8vt8ePCo/svhT1fACPQb7 sdWjJdGtUNSQyEkUuhzPkbpcAdTR2kQfIu6z1rqr7UgMRkv1xKUgScjp79mT2Pb+tEsH/d m0zFqKA8sbH6Ko/JX5VsS6aX6Nkti/8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=lLyGjeMe; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of yuzhao@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706336287; a=rsa-sha256; cv=none; b=Lg1hN8QPsRbe/SOxt3YmPbK7XkhSGObMwlBdfb5UciC6PVOLfnzaj576JYcyWfjy1366// mVP3rRcE9rQJvj13daxW/P0mJiFdD5I9j1eJ/UCkoI3SaX4VnMiI4MXHpjWBPA0h2EbvPf I6SfkCIItoND6q+TLBBKGXAcqBblbi0= Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-55d418c5ca1so3029a12.1 for ; Fri, 26 Jan 2024 22:18:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706336286; x=1706941086; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=rQS8ABsYVYk+vXzULL6wVLsWJRItyzMNzDk9tNnAZhk=; b=lLyGjeMe1kYNK3R6rv+vKO9OMSv/0KSiwid6J13uOt/nd28BBJiR0+oPAGXIzHKTAx 6Ra/6krzAkMdzP5dab6ZvHIObHxV8PFjohVwI3E/6EtmTWo3VHuSAZAJw5XqG5AtgDFD LE+gGxKmGbUnj1UBzJeAbq2Qqo28w9rNzBR7DCmR1+dpZu3IPkY/071FgFvGMQc7N6Si VgQqfY99ufw0Xhlm0EbQIoK8uyivEqrh5g4Nc9Cbn++pyqjmS9ocpzqiMlw4ZU36/X/J TvwAMDIg1UlaO8V6dWVVXNNyiVmvTSI0eff5w2aDboXO+Y1Zz+Qmc9tLql4kw7CzcmP/ 8A4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706336286; x=1706941086; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rQS8ABsYVYk+vXzULL6wVLsWJRItyzMNzDk9tNnAZhk=; b=DkWwD3vOg0g3os8sl+tdUG1rTtErruEFHSEdpgiHXQmN5OGUkFrFD/vrxEl7gKsevd lGaf+HTzu1ONU/lLMlL+6Sl6X8/QX4B4TOyZ1lMLC+pmuNjenyqQDWtD6b+nnXRXSo8C Zks4EfrFACCIK/HSiarx2+zuAIr4x3uRIiqbtAMyFtb6/75+GGgQfM04pZmEZ7SNktwg 4TbgYqIw936FA9VWnOQ9NoWjoYpWYm759CWcQ+jtJvM9ehdX0ZNLNysbrenR/Zhfg2W+ AcgIunjdOLtCjVxo2JdOuqGhytd6rtl76wmekCHIma913Duz8g1sF6wQRv4i2UWyuzeT oEXg== X-Gm-Message-State: AOJu0YwuD6qd0erBWJWE9vECEsQTMW1r3VJAR4rhOECSUgInV5QIzEZR h36o6lU7RHOARg8UWUJM8iDNc7TfPmo2+3JP+sHdlGIcDDu1/LD8Xh3J57EgXgE8RF5hUbpg+iy 3KoccBektIOl4SWPM/bUDVhWkIIsHDaadILxu X-Google-Smtp-Source: AGHT+IEC/ffY0/I5b9O6s6DxqO/WeV7hfBpt/L89V/jFI/rxF/SDPYan36q5Xk+Jeuh9WDU6CfTsaVmCB0AJFAzvT0s= X-Received: by 2002:a05:6402:2281:b0:55e:b62f:6eb6 with SMTP id cw1-20020a056402228100b0055eb62f6eb6mr27532edb.3.1706336285548; Fri, 26 Jan 2024 22:18:05 -0800 (PST) MIME-Version: 1.0 References: <20240121214413.833776-1-tjmercier@google.com> <20240123164819.GB1745986@cmpxchg.org> In-Reply-To: <20240123164819.GB1745986@cmpxchg.org> From: Yu Zhao Date: Fri, 26 Jan 2024 23:17:26 -0700 Message-ID: Subject: Re: [PATCH] Revert "mm:vmscan: fix inaccurate reclaim during proactive reclaim" To: Johannes Weiner Cc: "T.J. Mercier" , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , android-mm@google.com, yangyifei03@kuaishou.com, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4950840007 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: pjtj9b6u96zko37xbekgqt3a6tzb6oy4 X-HE-Tag: 1706336287-913336 X-HE-Meta: U2FsdGVkX19rELMrHy/oo9Ik+Rm2u/SWn7FZJAqUJ5m+S5I5+kc8pFw+4j+UpZhxNo9/vh7vzm+Zp0iJtqVEJzXPeerxXxEHWB6b58KJpAKu+6b129dvSEY67x221a5nYvoFlG3JeDyDhq2Ivg1Le26daiGUmR9zRGT+CutSSVFNAyFkSH9qtNl5RPs85bBid1+A29GjVJeue/aYYoKrdQaWF3Zz0Zen0DXdUFqNlS3VNZB0890ARLFa3hQr2vp6ubIo17eWf9IsXgRd98HlcGVAnZQKl6s5yaoKWpXkMhGX9ZbuEZaVRN2+21HbDDBrb457EtTk6OmILR4hFUSV5pe3Q2fwR8eF1g0agzDgS0oLZZ+ygWsn+Kh8AGaMwZxkhMrFHGwzdwP8yccLS1uYfSv7VQBLKwq7gj34nlpeu0oOt+pN0CdpAZCs0OcWQx3EjNGsjXXjIobBbwCNJGABEKDSfNDf/iD2dSHEBv7mwIbhhy6XYRQ0BWEXHA+0musYe3B+EDhhgQWdq9yGOLjfyVhHxz8oJNvypPjofCkIv1R3KPiFVsUuAqwpOpmC6y+ZPXybfift/eJl97A7NKpER7fGyerMzaigthNXIP/K+kYK5rfhYyncVSVn87AzPqYCH4HHqvcEBYN/Z9pwBX+us9A1Ruq/5NXgYODidLyH1ciuTbnhnPITQBxq8IzR3f7WATYlFuV4AmLO57k7MK/E+TbQYMj6BdUkUM4cDleB0Y0zjL85Y7VdFJa+Tih600Tw5GhXA8BUsjNhOprXbtdoW3/ACgxcH90HBhCCSa2x6iPWdq1qKv/J6sZxewnt11on/MeoyfOv+YsD++torsUI+cYs8PxLUr/sKGbA6qpEjOGy/ZQMEztYbC9sJtQe1Ei0mhT2ttZfgW0ytzcNoH9G3li4xjyk1XKmIej2FlofPMmPMUzl0qEJcffZgUBC7CrEEPRSkHJtTzeQTM8eJ6b 99zebM35 m23ydUw/R8jks8Ry/3LYw7I6CzkgEGU9/DM9eSOSxDYuqr3i6V+mqC41w2gqrIMz1reo/BkT0TWC5w2xFhqyE0vYcBhA4g6uWFCnXcGzUeoQVo/VpqCfz96cDTK7Z/Ma9Bc1pVn4r+zAm2pcBc4gG21lsDxPbXkL++NYh9vKEOXGirQh9IFOy0/omKgKrChRyKLdIoxxqCYsHBFdjBDVZ4ahAqJgJWH4D0fcGGJYqP4S1SfCahGvefYWmP498u3m5XoyDjDOUgdktPfexVt7SErfmurABQoU93dZ97h0olBFjC8epS1urOja6mugS4UNz4tsKGuOBLzoNpJdZBqAbWJDxhItftS+6YHEANt7BSZ0hB+bLo6pj6ymbYg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 23, 2024 at 9:48=E2=80=AFAM Johannes Weiner wrote: > > The revert isn't a straight-forward solution. > > The patch you're reverting fixed conventional reclaim and broke > MGLRU. Your revert fixes MGLRU and breaks conventional reclaim. This is not true -- the patch reverted regressed the active/inactive LRU too, on execution time. Quoting the commit message: "completion times for proactive reclaim on much smaller non-root cgroups take ~30% longer (with or without MGLRU)." And I wouldn't call the original patch a fix -- it shifted the problem from space to time, which at best is a tradeoff. > On Tue, Jan 23, 2024 at 05:58:05AM -0800, T.J. Mercier wrote: > > They both are able to make progress. The main difference is that a > > single iteration of try_to_free_mem_cgroup_pages with MGLRU ends soon > > after it reclaims nr_to_reclaim, and before it touches all memcgs. So > > a single iteration really will reclaim only about SWAP_CLUSTER_MAX-ish > > pages with MGLRU. WIthout MGLRU the memcg walk is not aborted > > immediately after nr_to_reclaim is reached, so a single call to > > try_to_free_mem_cgroup_pages can actually reclaim thousands of pages > > even when sc->nr_to_reclaim is 32. (I.E. MGLRU overreclaims less.) > > https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@google.com/ > > Is that a feature or a bug? > > * 1. Memcg LRU only applies to global reclaim, and the round-robin incre= menting > * of their max_seq counters ensures the eventual fairness to all elig= ible > * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). > > If it bails out exactly after nr_to_reclaim, it'll overreclaim > less. But with steady reclaim in a complex subtree, it will always hit > the first cgroup returned by mem_cgroup_iter() and then bail. This > seems like a fairness issue. > > We should figure out what the right method for balancing fairness with > overreclaim is, regardless of reclaim implementation. Because having > two different approaches and reverting dependent things back and forth > doesn't make sense. > > Using an LRU to rotate through memcgs over multiple reclaim cycles > seems like a good idea. Why is this specific to MGLRU? Shouldn't this > be a generic piece of memcg infrastructure? > > Then there is the question of why there is an LRU for global reclaim, > but not for subtree reclaim. Reclaiming a container with multiple > subtrees would benefit from the fairness provided by a container-level > LRU order just as much; having fairness for root but not for subtrees > would produce different reclaim and pressure behavior, and can cause > regressions when moving a service from bare-metal into a container. > > Figuring out these differences and converging on a method for cgroup > fairness would be the better way of fixing this. Because of the > regression risk to the default reclaim implementation, I'm inclined to > NAK this revert.