From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFD7FC46CD2 for ; Wed, 24 Jan 2024 17:14:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 803BD6B007E; Wed, 24 Jan 2024 12:14:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B2A76B0083; Wed, 24 Jan 2024 12:14:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 653C18D0001; Wed, 24 Jan 2024 12:14:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 525F76B007E for ; Wed, 24 Jan 2024 12:14:56 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E3969C0CD7 for ; Wed, 24 Jan 2024 17:14:55 +0000 (UTC) X-FDA: 81714854550.26.E79CB6D Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com [209.85.219.174]) by imf27.hostedemail.com (Postfix) with ESMTP id 2E7D440006 for ; Wed, 24 Jan 2024 17:14:53 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=BKJ6nP6q; spf=pass (imf27.hostedemail.com: domain of tjmercier@google.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706116494; a=rsa-sha256; cv=none; b=kOh4Gpgpz1BfAcqJvjUY7kIDy7oiNPfTHpCPdv/4k/wKfRNy/+fylxdrI1Xvnf2yaL7XPM G3lvnoUIzhbGqRIqld5zu48x8G/nf4UslVRsi+e406eqid0rrYNHT/BUjv9PMdBhMPpgAO UV4AnVtdqA2gGuAvpJvU2cQECfgbENs= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=BKJ6nP6q; spf=pass (imf27.hostedemail.com: domain of tjmercier@google.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706116494; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=L+uKNZ2WLPla/ZZDd2j6xgzXzpDBrpy/IS7/1AiSgck=; b=7Oe9b52qJt8T1/9IZwdg0tPohxMwZjXOka1a7ZYEdXUgTPcU9sH1zoe3NJ30x8P8pRolAB L5eY1wWZ3wxlPQdwqgEHKDFVm9b8hcG/nggS5/nEq0dfQcbxnV1V6vJTUIRLIL8f57uxkm MK2JkCQzu0u9sEFlsQmUD4+Japfm3Po= Received: by mail-yb1-f174.google.com with SMTP id 3f1490d57ef6-dbedb1ee3e4so5437600276.3 for ; Wed, 24 Jan 2024 09:14:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706116493; x=1706721293; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=L+uKNZ2WLPla/ZZDd2j6xgzXzpDBrpy/IS7/1AiSgck=; b=BKJ6nP6q7BvryT3q95fruP6BxV6zCq48teq1dg8dDd8/nRV/qSCnFaBrpd/Ya4nLXa haYRQ/XUtDKO9f+TEJK2XvclmHKmbCEyW61WzVZbR7mpqKipB9uHqQpLhw2PqxbsQ28s 7S+/DtyJGLF8BDao110wJ5+RpxYmy7ycT0aoHTq/qhz6McG7VqnkGASSlOi/TZUnjwXV EjHXh4cwUP6Iv1qZRTLN0Fk8NN+21loTDNPjo+opHkdqr9QVuDBfjsu6lmhf24mt0KW8 bsZaWMN2Q4BZTF93hCOyaVnLeKlU8a7r22kg8gnJZUr6LjGeq7PKV3LXj3LsBj8AYkOi j0kQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706116493; x=1706721293; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=L+uKNZ2WLPla/ZZDd2j6xgzXzpDBrpy/IS7/1AiSgck=; b=BbQgw6/+88PnYi0aXQszBRRtKJ2BdSLWRpTftd/dkJJCf8ajM4Ge2pg6hplGeIETe9 4m4hOrAGuM1+XjPGLES7qwxdiaopco7f/PU0ASwFKX4RG+s7vuODzvxv26e9/8s3EIrC 9C98vmB9QNu1wgMnI6SJMjO4nbSZqIRFcamf6+ABEPIcUI6DqkCo8z1DrXzMCYS5wZUH wC11lQK2H0EIOJ/MQ+3gfkecSK32E7NRzIbYnx7baUqd5p5rqeSXXjxa4LREu2tYNzsp exzC6cmvkj0UQhPJ9W5LV7H4AVFKwJ1oS+DyxzKRr8hHJ4qgvE2hMMY1/qSPJ/lgTj5I 5NHQ== X-Gm-Message-State: AOJu0YzupBuTg5ltZvL9NQqNc0ofpxccZyPphZUtG1gtcbfYWYc7G3dn hmLQt7J6vBJyTEEamj1mDnC5QKZo1HcJx6jOpxGegCqQNSwpXFvttW59iFmQXPSzi2QDo7nrDmg NRZ6/4KYF2Y2FYvyKOkhcoQgOpeAfmow7nWle X-Google-Smtp-Source: AGHT+IEyp81Xa3DtWhJJgEOzKJKQ9s20e1ls6AY//91B3LGvmFlTYsSm8FpcvIUZCb4scVB2oR6nq9DljRkSRMJJDKE= X-Received: by 2002:a25:c78f:0:b0:dc2:2f4f:757 with SMTP id w137-20020a25c78f000000b00dc22f4f0757mr978740ybe.7.1706116492806; Wed, 24 Jan 2024 09:14:52 -0800 (PST) MIME-Version: 1.0 References: <20240121214413.833776-1-tjmercier@google.com> In-Reply-To: From: "T.J. Mercier" Date: Wed, 24 Jan 2024 09:14:40 -0800 Message-ID: Subject: Re: [PATCH] Revert "mm:vmscan: fix inaccurate reclaim during proactive reclaim" To: Michal Hocko Cc: Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , android-mm@google.com, yuzhao@google.com, yangyifei03@kuaishou.com, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2E7D440006 X-Stat-Signature: tc65d6hcxkcbaagdqi4kqk6y5w4uz96y X-Rspam-User: X-HE-Tag: 1706116493-273838 X-HE-Meta: U2FsdGVkX1+IgEJkr52Jn4Um0+1AlOKaS6NNyf1bijxQiAlbSvFb8AYNUkFEQpiGonUq6vUJbGQbUaHeT7zwS0AS7aZFakLixE+dDW7zznpudk51RTtEuFdvD8qW2nBjQx8bDWzc+A9CoGBhP5wL1UrhUvYPp1A0Za55U2l7REvvPSHHFsEBmi/Z7YfkrXDycMvlqmDP9kYf4esGkvo7abNf68TNyX70nz+EfBdoZ43oaW323n1RNXDT3l2Jt93eNgtV2TkvoobPsMtKmNyxyZabqrOS8ycCtck3jV8ZR/yBxAWnB4N7COm4yA08cBhe/9et4G888AE02+loub0rjAb99xMna2h4/XUwL+DxBavqbriif+RysDmc6RwnhOc+pgo8sLer9AEal+5oYxv37oZzAZyniNvbKHVtQW857QUD5qU1UZPvWI8ikKMWNV7NaqJOWKuFgNhOZAqw7JLPTGIjZELRTNlA/B0SKeowPKDOiUOm9SmvLlqpyCpHTwZSySchyikGdmM9NJ9pghq50as1zNfsftFATwa6NLwPkQ8NqklczcWCtLBZcpLoGqpz6HJPE6z/xyoS00TkBZQ5R/La/q9aSzoqq7J7pnmUAIOXHpqKQUGmEU1nasbgSANuSUraY+oNF2K++qGp2faT3eMj8oR0wHr+FS7HhlMQ9NsMbumG4CEFAaFCasScwekD90hI0X+IIyEKMlgQv555ub34Dn6hhDNx+QgZE4mbEx5CCsuvLcPHm/mjAmrdPn9ZTd8xy+CpsVdieOy9Pk5Hz1DeqGy05oVt1lNmwYbpoFXkEQd0S3zkp1qbfaPbi0KOHT42bl2vbGcDjEPnQvtilVBwEKm9Mw4kn6hOkfrBgOMivTilbXb+3i0It3zzMhDttIq75wPl/bzWz20OTUxg2wtyVGTomy61XGIhtmZnxghrkfjmdz3QDRVUte6vcyWkMxxOB2Q1+i87Q7RDkPG 2hyRgIgw IZNvB6qXaM/Ta4Si7J0C8eG8gBhc0KeogSb/pCY7d8rR5XYArhA7MjyMkOuuK9qkji653gq+6cmafkwLY090mBg4/ci75AW11JNd++xBv81ly/ZbGEkeLka3HOC0Dp5/w13TP1aB4W28E2Lc9bfJ+gLTLLB8sULs0qSfkMuY7t0Si3hUBEZLqpBk2ERD1UhtDnjz4JtI2UIuYdfUxXfVx/BtzvslBanMSE/w2fnbABE7Pk4K1YBRDFzIa8qdGtSi2gSitPJy/BHJOkcauvwFUOlDB8Fm5gV5MdHF+Z/S1c54qby/1m/Xz6WPCJ4oJ9NfDBbgbNMcEPOqLvNJ3e7iIa2KRVKQ7WHyBW/UaAw1huiWo5ZSTgfhmaT+svQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 23, 2024 at 8:19=E2=80=AFAM Michal Hocko wrot= e: > > On Tue 23-01-24 05:58:05, T.J. Mercier wrote: > > On Tue, Jan 23, 2024 at 1:33=E2=80=AFAM Michal Hocko = wrote: > > > > > > On Sun 21-01-24 21:44:12, T.J. Mercier wrote: > > > > This reverts commit 0388536ac29104a478c79b3869541524caec28eb. > > > > > > > > Proactive reclaim on the root cgroup is 10x slower after this patch= when > > > > MGLRU is enabled, and completion times for proactive reclaim on muc= h > > > > smaller non-root cgroups take ~30% longer (with or without MGLRU). > > > > > > What is the reclaim target in these pro-active reclaim requests? > > > > Two targets: > > 1) /sys/fs/cgroup/memory.reclaim > > 2) /sys/fs/cgroup/uid_0/memory.reclaim (a bunch of Android system servi= ces) > > OK, I was not really clear. I was curious about nr_to_reclaim. > > > Note that lru_gen_shrink_node is used for 1, but shrink_node_memcgs is > > used for 2. > > > > The 10x comes from the rate of reclaim (~70k pages/sec vs ~6.6k > > pages/sec) for 1. After this revert the root reclaim took only about > > 10 seconds. Before the revert it's still running after about 3 minutes > > using a core at 100% the whole time, and I'm too impatient to wait > > longer to record times for comparison. > > > > The 30% comes from the average of a few runs for 2: > > Before revert: > > $ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time > > echo "" > /sys/fs/cgroup/uid_0/memory.reclaim' > > Ohh, so you want to reclaim all of it (resp. as much as possible). > Right, the main use-case here is we decide an application should be backgrounded and its cgroup frozen. Before freezing, reclaim as much as possible so that the frozen processes' RAM use is as low as possible while they're dormant. > [...] > > > > > After the patch the reclaim rate is > > > > consistently ~6.6k pages/sec due to the reduced nr_pages value caus= ing > > > > scan aborts as soon as SWAP_CLUSTER_MAX pages are reclaimed. The > > > > proactive reclaim doesn't complete after several minutes because > > > > try_to_free_mem_cgroup_pages is still capable of reclaiming pages i= n > > > > tiny SWAP_CLUSTER_MAX page chunks and nr_retries is never decrement= ed. > > > > > > I do not understand this part. How does a smaller reclaim target mana= ges > > > to have reclaimed > 0 while larger one doesn't? > > > > They both are able to make progress. The main difference is that a > > single iteration of try_to_free_mem_cgroup_pages with MGLRU ends soon > > after it reclaims nr_to_reclaim, and before it touches all memcgs. So > > a single iteration really will reclaim only about SWAP_CLUSTER_MAX-ish > > pages with MGLRU. WIthout MGLRU the memcg walk is not aborted > > immediately after nr_to_reclaim is reached, so a single call to > > try_to_free_mem_cgroup_pages can actually reclaim thousands of pages > > even when sc->nr_to_reclaim is 32. (I.E. MGLRU overreclaims less.) > > https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@google.com/ > > OK, I do see how try_to_free_mem_cgroup_pages might over reclaim but I > do not really follow how increasing the batch actually fixes the issue > that there is always progress being made and therefore memory_reclaim > takes ages to terminates? Oh, because the page reclaim rate with a small batch is just much lower than with a very large batch. We have to restart reclaim from fresh each time a batch is completed before we get to a place where we're actually freeing/swapping pages again. That setup cost is amortized over many more pages with a large batch size, but appears to be pretty significant for small batch sizes.