From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F21AC2BD09 for ; Mon, 24 Jun 2024 17:16:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2EB26B0390; Mon, 24 Jun 2024 13:16:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB60E6B0394; Mon, 24 Jun 2024 13:16:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C08F76B0398; Mon, 24 Jun 2024 13:16:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9F9AB6B0390 for ; Mon, 24 Jun 2024 13:16:19 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 1C819C11DD for ; Mon, 24 Jun 2024 17:16:19 +0000 (UTC) X-FDA: 82266435678.29.30FB4EA Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf25.hostedemail.com (Postfix) with ESMTP id 3DFFBA0021 for ; Mon, 24 Jun 2024 17:16:16 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=N01Uo4GZ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719249366; a=rsa-sha256; cv=none; b=3pnYDnSVOzFNhqNIgJWDg4UwhoILCTvAqKRkQt2lSGFNhPpMrlj8tl29iLYbg3YdGA64bz q7vzgnlsg0VLq6VXMWfrAl3xnWM/S8PFgoo/2c8BkKN84ROKI9jogDRoNxbjHjZovlZWpq kaykfKALj0YL2psN+XjLHTAqM43TI5k= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=N01Uo4GZ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719249366; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H9edwEjS5B7AQGy8AmllZkCean26qdfioKRwxdDtaZM=; b=d20iobKGqarAYauanPkPd6MvMvD/4/cKvtDNUpViWf38jUdXJsFn3w2RTa2ehTmjuehHIA J/0ipf1unBY0jOLGdPKqy2ZhEEKVRauyROALtjC+nSk2Z43Ly4XJadIiRmWjcjheTzZCGV 4sJzzHflBumKHRPJr1Jrpv717I78gfg= Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-57d1782679fso5387192a12.0 for ; Mon, 24 Jun 2024 10:16:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719249375; x=1719854175; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=H9edwEjS5B7AQGy8AmllZkCean26qdfioKRwxdDtaZM=; b=N01Uo4GZcm6EpgMkMyJWEyOpaTz2ZHMTVCBaDthrEmvIWp9heV3yulICyUcGb9gtQE PRTZtmHe/V/Wva3Eb1qwFi09j0XuMC1ZP1Hdx+57sjC9kwh02+vyyRKVI3SI2WKGy4bn lvU1NWBO3ucz7OpyS1TSbI3H90MYeI8DoAfde2gFq+TwfwzUgL78jejSTvuyREe2zIec fO/nCymnHLhU4EMiCBc+xmdei7mgyVccoZI7M6eFJerre0uDCdTzKy3juzkTmanfgXF/ ok6F+2Z4kS1UK/liRfUQSeKNx9F/K86TTEr8wR/dUNYxxIZoSDVmye7yZ3jlLjFKt2cS 90cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719249375; x=1719854175; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=H9edwEjS5B7AQGy8AmllZkCean26qdfioKRwxdDtaZM=; b=xFBfb6Nw+c8tSLOEwdrB4vA0xMvD6+E9OxP6al4qxu8ywVJYNprm5udHCMr4xVjIJi KYls/3rCVNI+jhhgwQd47e08WQlnIawVsze3bbMDb/kNSZ2KAUXMOtM+NDwO8z6YIK6E mgar+o+86IWgei/hk4yZISHuN8HJWd+O3r+YrtDsOytwwLXGBD1VRy2lDOEVtOV9eAY3 I1kR0Ts+0TR6rAvKSzt6dikd/peEVbSc56WHfsDRmSIC6uDmKD50WdZZkkcWF6gkVyal fwRnxANv4z7ZHLswBEsY1nTtvHqwcw5PRt0BObRFL2Qr8MFFFLXruKiI/xT51wV3cTwf xgLA== X-Forwarded-Encrypted: i=1; AJvYcCXOLuCa5dJEOZOuLonGtTiXNTw0DhKnrd+7c8Ae/LSJt8yxc7fkHrfx/E+EaHJJNsaXR0sCO4AMT9Kg6N/hWnHV+QM= X-Gm-Message-State: AOJu0YyBvd2MlAseFiRBznHHiKMNouob22jqGdXHe5KcCCa7LgyyZ1/w 3i9FPqO3y0iX/+KEczmg+KqFQpWOstJEY/fuRQxhVv4f2MMUOObmnVNM30+8B+bAu72p+tlLURG axkyp0wGRG3bGspP4pvI3MEKdoYJVfYVFJe94 X-Google-Smtp-Source: AGHT+IF3DivSB1Y0R37Xum45ilT5dPAswxJ1cca5C+af3pFyHB5n6SH2IuHQCHYHYtktDQdWe0pXFeLe9qTCaoCFqT0= X-Received: by 2002:a17:906:a8d:b0:a6f:e552:aae4 with SMTP id a640c23a62f3a-a7245b648aamr298567066b.25.1719249374740; Mon, 24 Jun 2024 10:16:14 -0700 (PDT) MIME-Version: 1.0 References: <20240615081257.3945587-1-shakeel.butt@linux.dev> In-Reply-To: From: Yosry Ahmed Date: Mon, 24 Jun 2024 10:15:38 -0700 Message-ID: Subject: Re: [PATCH] memcg: use ratelimited stats flush in the reclaim To: Shakeel Butt Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Jesper Dangaard Brouer , Yu Zhao , Muchun Song , Facebook Kernel Team , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3DFFBA0021 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: z7znuchwtsn4ja7co6c5c8ix5qg9onqh X-HE-Tag: 1719249376-73515 X-HE-Meta: U2FsdGVkX1/1GH9q4sRlCJeNcUViXtjfStZQo518rwIkba2LlQi3WAl+Qx4Km00D8khJXGhuastSqseCWsMnRwK1kL7xuikadO8ULU35V7Lw4Ad3oZoYQ0v0KVXtcsu85zzfMah+ZUREsTTKZoc6J4Um5JZ8fHWcCdq7tL/de3n0THpeV8EWd32iN+9D7RV8o783X4gj1X7TJJmXti7L7clb5Zrix/gBaV6shHrG1riu/RyNbInPV+y9PR/fRgL04e4aneSNMDOksrLkkw6Kq/b1Su4P3kIy2rMAjBo7Tp3nfVPS7teuB8c4kltqu3Dmxk7k//zTccEN1WusUO/0niaHepxJNJLW4TPJBPuWR1PtERXjRFmKW+4qpIvXRwMpzZSbYBS00cd5nreQ1x/Lwx36PKUdaVjrraGnlkjZRt2SgrDQ8D+siN13Va+XDY5to1vYZ/VfB1fQyXkUrQvqM6LdjmE9fgPHcPr0FyDaSZtYbhI3CTJVNmnK/l/PvBV3asInH8DbK6pd7RslfmgxDfPNG5E3prMOv6O9EVILzOYemk6QxlTHX3cSh0jLRBR1lkj7CuWVPVVtX/F5h6DupzvO7MjE3dR7/JOIJQM30ilsrJ2Asp34K9vxtSpG7GYt7RkfG7xQjnzEv6id5nMwSSkr7XzS+qutoDYh55esIUyUwyQOVMctEPjPmtNmKvR4opPzGOckTWtxFnIyL1s0a9qAPwIfhurjLsFBjqx0LxEjsZJHDVGT6jpPRXgO969CD+DTIcZ1ow0Jhrny+TyJjx7b5/SQvAFgQUMozLIa0jhDrp7zYXtQKzBzh74XjidBCLoS45Egnu+Oeev/BG3VpcIJ2Wb2Pt8RF6ZP+foS3P2iCTiPm0kPBL71uvOoYEcA1SgQb67R3a1CekqiTyVdb3dcEiou7C6iBIsNx+gU+4SqL5an2Rv/itFD/drnU/deNqHDxF3kVQrfPL4H2kU OZhL8y4c qKi7g3BWLJhT/kYMFOWEHSQG00gLgIVy8inj08z+r2cg1fqd3hU0LiBNJNW1DVsmU518PTvEnG3e9ZDYu6gHHIHGJlM0ML3Oepjzdj/VTl30YbBPEtYJ8neVZzMkfyWdhZNQlJ4GmLDFUKIp0TPhHLSsFiNdw6l4sUDOu1uLxZrH0Xl4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 24, 2024 at 10:02=E2=80=AFAM Shakeel Butt wrote: > > On Mon, Jun 24, 2024 at 05:57:51AM GMT, Yosry Ahmed wrote: > > > > and I will explain why below. I know it may be a necessary > > > > evil, but I would like us to make sure there is no other option bef= ore > > > > going forward with this. > > > > > > Instead of necessary evil, I would call it a pragmatic approach i.e. > > > resolve the ongoing pain with good enough solution and work on long t= erm > > > solution later. > > > > It seems like there are a few ideas for solutions that may address > > longer-term concerns, let's make sure we try those out first before we > > fall back to the short-term mitigation. > > > > Why? More specifically why try out other things before this patch? Both > can be done in parallel. This patch has been running in production at > Meta for several weeks without issues. Also I don't see how merging this > would impact us on working on long term solutions. The problem is that once this is merged, it will be difficult to change this back to a normal flush once other improvements land. We don't have a test that reproduces the problem that we can use to make sure it's safe to revert this change later, it's only using data from prod. Once this mitigation goes in, I think everyone will be less motivated to get more data from prod about whether it's safe to revert the ratelimiting later :) > > [...] > > > > Thanks for explaining this in such detail. It does make me feel > > better, but keep in mind that the above heuristics may change in the > > future and become more sensitive to stale stats, and very likely no > > one will remember that we decided that stale stats are fine > > previously. > > > > When was the last time this heuristic change? This heuristic was > introduced in 2008 for anon pages and extended to file pages in 2016. In > 2019 the ratio enforcement at 'reclaim root' was introduce. I am pretty > sure we will improve the whole rstat flushing thing within a year or so > :P Fair point, although I meant it's easy to miss that the flush is ratelimited and the stats are potentially stale in general :) > > > > > > > For the cache trim mode, inactive file LRU size is read and the kerne= l > > > scales it down based on the reclaim iteration (file >> sc->priority) = and > > > only checks if it is zero or not. Again precise information is not > > > needed. > > > > It sounds like it is possible that we enter the cache trim mode when > > we shouldn't if the stats are stale. Couldn't this lead to > > over-reclaiming file memory? > > > > Can you explain how this over-reclaiming file will happen? In one reclaim iteration, we could flush the stats, read the inactive file LRU size, confirm that (file >> sc->priority) > 0 and enter the cache trim mode, reclaiming file memory only. Let's assume that we reclaimed enough file memory such that the condition (file >> sc->priority) > 0 does not hold anymore. In a subsequent reclaim iteration, the flush could be skipped due to ratelimiting. Now we will enter the cache trim mode again and reclaim file memory only, even though the actual amount of file memory is low. This will cause over-reclaiming from file memory and dismissing anon memory that we should have reclaimed, which means that we will need additional reclaim iterations to actually free memory. I believe this scenario would be possible with ratelimiting, right? [..] > > > > > > Please note that this is not some user API which can not be changed > > > later. We can change and disect however we want. My only point is not= to > > > wait for the perfect solution and have some intermediate and good eno= ugh > > > solution. > > > > I agree that we shouldn't wait for a perfect solution, but it also > > seems like there are a few easy-ish solutions that we can discover > > first (Jesper's patch, investigating update paths, etc). If none of > > those pan out, we can fall back to the ratelimited flush, ideally with > > a plan on next steps for a longer-term solution. > > I think I already explain why there is no need to wait. One thing we > should agree on is that this is hard problem and will need multiple > iterations to comeup with a solution which is acceptable for most. Until > then I don't see any reason to block mitigations to reduce pain. Agreed, but I expressed above why I think we should explore other solutions first. Please correct me if I am wrong.