From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2949EC30658 for ; Mon, 24 Jun 2024 21:41:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 897AF6B0362; Mon, 24 Jun 2024 17:41:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 846AB6B0363; Mon, 24 Jun 2024 17:41:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E7536B0364; Mon, 24 Jun 2024 17:41:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 501B86B0362 for ; Mon, 24 Jun 2024 17:41:56 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id DA2674132D for ; Mon, 24 Jun 2024 21:41:55 +0000 (UTC) X-FDA: 82267104990.23.9BF145E Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com [209.85.167.46]) by imf02.hostedemail.com (Postfix) with ESMTP id F2D2380014 for ; Mon, 24 Jun 2024 21:41:53 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RIIshbLi; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719265300; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=INDzkogUr8vsj6gysN5YnrA7Il5jPfEYwvp559Io+80=; b=Jc9Cv06vsk15ocIpHLsT1PnW8AFOXx6tdAVO4xMY0Za394obx4r17bn3hH2M+f1AJL2xJW OfBNJM/uS8tNu/guK2nFXR0+vt5forROTUNJqxhjk3zneFk4xKM7udkq2HOQOl4mswNHDM aQF3yJZHAUzS3lQtB5tNjueNO0MqfIA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719265300; a=rsa-sha256; cv=none; b=Q5zz4BnZm8RUFIMp/+Vhu3d4On5h3BrOBZk/+orn2j0siXwfgI7uGqI3FYLDn8DwIz+EYo N0HOMrzEipGMTvdkzFozols1BoBCMxViIRF3WOq1OCdj1EWSrPExMQQhezfSmddVVc/LyN J1QKjYsmXgv9ICq3kB9bxZ0CydobxlI= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RIIshbLi; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-lf1-f46.google.com with SMTP id 2adb3069b0e04-52cdf2c7454so4009767e87.1 for ; Mon, 24 Jun 2024 14:41:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719265312; x=1719870112; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=INDzkogUr8vsj6gysN5YnrA7Il5jPfEYwvp559Io+80=; b=RIIshbLil2mZ9K2aRvRnI6C1pSfpOL4Y8CdL8f1DzSg67I5fXfa/lIYGnUJHc9+X3A 5rMZPY3p8uf3aPltnawZFF6HWkd2Ts/Wpv40QShmxw/VgRuHSbDCppsxOgxYA6H4jV57 SksVtMGN9mgzwui7MvZhpvjBhQ8eRBN20iPPDtMSyVP/K7V8WO1z1bFxPSh0lopIwH4Z b39kZLh6gDsHpLvti5rkmXX4T7HOtoSraUvH31cvYT/nzT7xsJSjNnFeOT5QsXxhNpoz 9GTe93DMIHLjthoNF9XHoQj+XC5k1s78m/gbEsOZEyVH72eFMyAUS1fyH5rvvCl+k/bf CGLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719265312; x=1719870112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=INDzkogUr8vsj6gysN5YnrA7Il5jPfEYwvp559Io+80=; b=uHLwpOhfXrYPGoX0YCtN41r59LMdo0tg9ygp3iys37llXj5mw5cNvpVjQjrYUzQrAv v9RqhrIUISmVdfLJQYTUtb1kzDQ/mBjotEo1rsWR+AY3zqGJPKAgP8DZXpV2ci2yLqOb sOAzd7n9vjKUTYGkb/0WfOVa86jWdCpUwSzgQkJfA1qLmf3FkKIGq1+2jJorbvJ6BrG5 /ZeFAGPeGoOTXTdIpaUsxYF6Xj/FGmOohM4+BCi5DOui0bJK+MD3sc7LTaJRDux581L8 K1lq+wNTaTbaTUftU5eupeeVHdgwRoV90VWrKzVn7F8ZavVONXSF07U9I3F2IBPYMvMg NhaQ== X-Forwarded-Encrypted: i=1; AJvYcCU9ZjwsDjQ1/LRclAy8IfDy/ZDir0sUPKQyw+BP3nP73snNIkImV4KDPT6ckCursTj99clwJg7PATmWqv3xjbFgWWM= X-Gm-Message-State: AOJu0YxB4oU3O7QfvDg+iGiLNHWdCbTpBDJAUvJuzS0zuMPfffFchYgx vZYSL3a8eB8llMqctQMNhFgbw7yE7M+Japznv/kXPLfZ9B99RKJRNn10Jm9768GEd7VTjL64u16 EaprCWKINConAqTeiuPF/1BiEFVasxYA2Hl/k X-Google-Smtp-Source: AGHT+IH4biSTMKC8IZqq6WCcH755yTV9BaCzTuN71r1TiTY+7lgmnJ215jo2AQXkmTd2z3feJGWR+YBWppOFTzYDEsg= X-Received: by 2002:a05:6512:745:b0:52c:8b69:e039 with SMTP id 2adb3069b0e04-52ce182bd0amr3911904e87.4.1719265311700; Mon, 24 Jun 2024 14:41:51 -0700 (PDT) MIME-Version: 1.0 References: <20240615081257.3945587-1-shakeel.butt@linux.dev> In-Reply-To: From: Yosry Ahmed Date: Mon, 24 Jun 2024 14:41:12 -0700 Message-ID: Subject: Re: [PATCH] memcg: use ratelimited stats flush in the reclaim To: Shakeel Butt Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Jesper Dangaard Brouer , Yu Zhao , Muchun Song , Facebook Kernel Team , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F2D2380014 X-Stat-Signature: 6s1mg8meersrb1poymb4gj9fstu3ek71 X-HE-Tag: 1719265313-441989 X-HE-Meta: U2FsdGVkX19SyeifIbHmWVYQJNbQgPvIMSC6tsdGYQCPhkA2TQjofMFilYcXVD4VEPFVfr3lxBDoWq4HDM1dahWdfl23yVMsejWBYrTBSN9QLcMndnOSafp+3pai0dlWM7yQ+aze4c6lE59dIPl9Y6f8y7kNx20GYmxw7U657pngL7rDo+1N6LdtuhNWReqS3K77d0O3UCAgtElhQJ208p3iVdWYUIsML/VDpdDmM4AvE0K/HvDIFa0lP9SZLeCk33C8y2w9OxYUHjeM/AhgAqEC5LMzS8ce9b0/lYOpioAPQacJEQriR2eX/1Edg/7Y80RzSszO3AEPBK+g1ePHpZNH4EBVxpjJhfw/eSSSkb0XRgLb83M+IttEDZo5aBJEWk6wjIjN7bAS6c3q0t5KHv/XkPpCXlqfbSxGuMylXzb5Hq6BUUXkbyjYJLhWIqhpc2H7tSn7y/fHVG1O2d565+bauNVNoAHBAoA8FiMfT4K4LXC9NitQkfNMuiAbTsK4chLJn0ZBTu2NM9UNcw6Ez0VZwD2Xu1xEONb0SypZ62hHK+yZmPFeGaObdgh/2baZlksbW4gkZ4+i3GPcvq2JyTIqyMtdhR8PZaVxAyQ9mUUO4MLu92OYzI27NT/bGa493OI6EG4xyIFJh8ROUB/jewnB1ygvOhHzJPSXHDGNY47whbuaZ2e4RpLvAcaGSvJLnHT5iylCYg0EMWs/S/T7tBRLrudJiKoH2UH3Mg1XzbjgAX8uIqVaGZcd0cUqOVso38HEh3IMWoeYzFHgd0Omri8l20CvAqELc1dtdk9/Kau1tuvXX+ErYGibxWUKkeDTbyjU4VC5k7FuiC7xE/ZOXSvMJROJxJtRq41Pw1PBeQBD6138cU4tsTac9aoq/f8L0ldY/tu2r3P50GzASIN55yYvWEQIJmJFFljZfWkbrS97vvgNdfNuL0kcLlJpJ4TcKzEn19JspuJoRSLiVKo 5D0z19on y3XwUMNxZX0lQanXll3gpZwxGV6n1vZiqa7aYMYo5cLZtD250PY1uthVkeQQEj+8siWGTki0cIrVquR5SScN7+OPGuz+swL/cppip+P0fsXhL5OdZKNDNfqtXDh4Xk240VIAqAqf5BfgLuTFsLtDtBCoEMDwPBZZ0Dq3DoBTeuLJzf6A= X-Bogosity: Ham, tests=bogofilter, spamicity=0.158926, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 24, 2024 at 1:01=E2=80=AFPM Shakeel Butt wrote: > > On Mon, Jun 24, 2024 at 12:06:28PM GMT, Yosry Ahmed wrote: > > On Mon, Jun 24, 2024 at 11:59=E2=80=AFAM Shakeel Butt wrote: > > > > > > On Mon, Jun 24, 2024 at 10:15:38AM GMT, Yosry Ahmed wrote: > > > > On Mon, Jun 24, 2024 at 10:02=E2=80=AFAM Shakeel Butt wrote: > > > > > > > > > > On Mon, Jun 24, 2024 at 05:57:51AM GMT, Yosry Ahmed wrote: > > > > > > > > and I will explain why below. I know it may be a necessary > > > > > > > > evil, but I would like us to make sure there is no other op= tion before > > > > > > > > going forward with this. > > > > > > > > > > > > > > Instead of necessary evil, I would call it a pragmatic approa= ch i.e. > > > > > > > resolve the ongoing pain with good enough solution and work o= n long term > > > > > > > solution later. > > > > > > > > > > > > It seems like there are a few ideas for solutions that may addr= ess > > > > > > longer-term concerns, let's make sure we try those out first be= fore we > > > > > > fall back to the short-term mitigation. > > > > > > > > > > > > > > > > Why? More specifically why try out other things before this patch= ? Both > > > > > can be done in parallel. This patch has been running in productio= n at > > > > > Meta for several weeks without issues. Also I don't see how mergi= ng this > > > > > would impact us on working on long term solutions. > > > > > > > > The problem is that once this is merged, it will be difficult to > > > > change this back to a normal flush once other improvements land. We > > > > don't have a test that reproduces the problem that we can use to ma= ke > > > > sure it's safe to revert this change later, it's only using data fr= om > > > > prod. > > > > > > > > > > I am pretty sure the work on long term solution would be iterative wh= ich > > > will involve many reverts and redoing things differently. So, I think= it > > > is understandable that we may need to revert or revert the reverts. > > > > > > > Once this mitigation goes in, I think everyone will be less motivat= ed > > > > to get more data from prod about whether it's safe to revert the > > > > ratelimiting later :) > > > > > > As I said I don't expect "safe in prod" as a strict requirement for a > > > change. > > > > If everyone agrees that we can experiment with reverting this change > > later without having to prove that it is safe, then I think it's fine. > > Let's document this in the commit log though, so that whoever tries to > > revert this in the future (if any) does not have to re-explain all of > > this :) > > Sure. > > > > > [..] > > > > > > > > > > > > > > For the cache trim mode, inactive file LRU size is read and t= he kernel > > > > > > > scales it down based on the reclaim iteration (file >> sc->pr= iority) and > > > > > > > only checks if it is zero or not. Again precise information i= s not > > > > > > > needed. > > > > > > > > > > > > It sounds like it is possible that we enter the cache trim mode= when > > > > > > we shouldn't if the stats are stale. Couldn't this lead to > > > > > > over-reclaiming file memory? > > > > > > > > > > > > > > > > Can you explain how this over-reclaiming file will happen? > > > > > > > > In one reclaim iteration, we could flush the stats, read the inacti= ve > > > > file LRU size, confirm that (file >> sc->priority) > 0 and enter th= e > > > > cache trim mode, reclaiming file memory only. Let's assume that we > > > > reclaimed enough file memory such that the condition (file >> > > > > sc->priority) > 0 does not hold anymore. > > > > > > > > In a subsequent reclaim iteration, the flush could be skipped due t= o > > > > ratelimiting. Now we will enter the cache trim mode again and recla= im > > > > file memory only, even though the actual amount of file memory is l= ow. > > > > This will cause over-reclaiming from file memory and dismissing ano= n > > > > memory that we should have reclaimed, which means that we will need > > > > additional reclaim iterations to actually free memory. > > > > > > > > I believe this scenario would be possible with ratelimiting, right? > > > > > > > > > > So, the (old_file >> sc->priority) > 0 is true but the (new_file >> > > > sc->priority) > is false. In the next iteration, (old_file >> > > > (sc->priority-1)) > 0 will still be true but somehow (new_file >> > > > (sc->priority-1)) > 0 is false. It can happen if in the previous > > > iteration, somehow kernel has reclaimed more than double what it was > > > supposed to reclaim or there are concurrent reclaimers. In addition t= he > > > nr_reclaim is still less than nr_to_reclaim and there is no file > > > deactivation request. > > > > > > Yeah it can happen but a lot of wierd conditions need to happen > > > concurrently for this to happen. > > > > Not necessarily sc->priority-1. Consider two separate sequential > > reclaim attempts. At the same priority, the first reclaim attempt > > could rightfully enter cache trim mode, while the second one > > wrongfully enters cache trim mode due to stale stats, over-reclaim > > file memory, and stall longer to actually reclaim the anon memory. > > > > For two different reclaim attempts even more things need to go wrong. > Anyways we are talking too much in abstract here and focusing on the > corner cases which almost all heuristics have. Unless there is a clear > explanation that the corner case probability will be increased, I don't > think spending time discussing it is useful. > > > I am sure such a scenario is not going to be common, but I am also > > sure if it happens it will be a huge pain to debug. > > > > If others agree that this is fine, let's document this with a comment > > and in the commit log. I am not sure how common the cache trim mode is > > in practice to understand the potential severity of such problems. > > There may also be other consequences that I am not aware of. > > What is your definition of "others" though? I am just interested to hear more opinions. If others (e.g. people in the CC) agree with you that this is the approach we should be taking then I won't stand in the way. If others share my concerns, then maybe we should not proceed. It seemed like at least Jesper had some concerns as well. If no one cares enough to voice their opinions then I suppose it's up to yo= u :)