From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CCB2C52D7B for ; Tue, 13 Aug 2024 21:59:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E294D6B0082; Tue, 13 Aug 2024 17:59:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD91D6B0083; Tue, 13 Aug 2024 17:59:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA0A46B0085; Tue, 13 Aug 2024 17:59:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AD2C06B0082 for ; Tue, 13 Aug 2024 17:59:33 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 589261C2A73 for ; Tue, 13 Aug 2024 21:59:33 +0000 (UTC) X-FDA: 82448589426.20.FA42E9A Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) by imf25.hostedemail.com (Postfix) with ESMTP id 4249EA0003 for ; Tue, 13 Aug 2024 21:59:31 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ihv5B5nh; spf=pass (imf25.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723586315; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ue2knfvsIDz2WfMwE8l0p1USDW2XxsIyv+dv3fPEPdw=; b=5kIFaUSZzJBEDPDtPHdQyOI/yzr7M2Q/+62eeulvPbOW82H069+IiiJw2yBfDf/8QMUVqu XqMsfsTBLGKWOcFRvlYXfndLt9vCDEWhabT/p/PVj7bM3OHtMFFBQIjvBEcK98ZVtYGSZg ig96Nsxn9vVH+gkQC5cyPzWnDhSx6Vk= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ihv5B5nh; spf=pass (imf25.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723586315; a=rsa-sha256; cv=none; b=sbaynBPeKnansyjhvFKw0WPqxCMMWnBv/96isMbD1CTWnQVM1X66s/hiKrteNaRpebv/r5 BoB7LuAufhs0QQICkxohj/DyaMI4vP/DSm235s29du3ntVz5vVVDXec3Iv73NiHksjt7rX Vi5gqptsuU84FsJ11saPnp6AQHwK2B0= Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-a7b2dbd81e3so756552366b.1 for ; Tue, 13 Aug 2024 14:59:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723586369; x=1724191169; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ue2knfvsIDz2WfMwE8l0p1USDW2XxsIyv+dv3fPEPdw=; b=ihv5B5nhM+/9OihzWWqYIQ2a7nxBInaGg6p1bU+syCfeGRqiTC6Td41mgN7R5y57Om 2Inzf+ysTqPVQkksBfhdAeJHpBE2S+86PBdu88zDLdRSbQKHYiwAolA/6mtvsQ1xyM+u +rCU1u3bxgHlK2CnUS62Cvke8oS5t76s3PcK+gemyzgMuT/haB3e6UBFLJ0dIOM06xKC 0EyPJwESCUBKIBdjrgZ2HI6vtAHAjClDDdOgl/EcrkaLpZZj1UU08Q7BY5cDdzzdYJcC Dy9PBwQgljMtPGeq8ZkkHPbKyUNMrBdvUWERrzDndkH8uqlt9hckyu29sQcjnsZIRcra d3VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723586369; x=1724191169; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ue2knfvsIDz2WfMwE8l0p1USDW2XxsIyv+dv3fPEPdw=; b=vEY3HNThNwiUr229olTeRUB4MwCnukfnpFkYs08VN89+d9eCg3lU6KpEBZU4sTyIQD 0lyKs6NBv3UV3V1D8vOFDlLjv8MX1udPun8Vtk20PC9oRuUMMti74F/MkSj48VOU6qoo HVbfZQCRgP3pXCGV7UXsYute1a7uBGCMu68E9TWBP5MCgaYCJ9pnfgl2Wmu87QydWU/8 aQQOCr2n6gqaftP8nD9Dmu3NfwNEhzaXXgm/P7J585PWZiKZaA3ofRLpI4NXakrDKtJF 28Z/pAqSirZ5tFEG7egHd59qy2+vUOABxVvMDF+CLLQWxaFFIMCAeuPHKiTtQocfDBZL W75w== X-Forwarded-Encrypted: i=1; AJvYcCVsss8lhP7BYWNYFYzfoYwjAVrJw6cMz4sn/0CwT/TaKOfmQkZE4OXQ8Wg3ByvVAqhQnclWSDjHXKnKBA6L0MzK7/4= X-Gm-Message-State: AOJu0Yxf4rvYm2Bleh9+hgndCstOIAqzg3HgxPQynI72zaU1XC7M0CKi 4TxMstrU8ZgsSdA4Ywbb6lq3PEVXJBoMW2HIq26zIW0bkR9uIeBnA+Ujs2IjVyVlwynLWeN86bZ qB5U9lgji1P8aQXPbXcuezCrDmZSJ9MfZtxQS X-Google-Smtp-Source: AGHT+IEBRN1K/plIqsXb/K1MyLsFF+JdwEJJrbd5ljEbZT4hYhRrTUCF07gzxpJnAS2CZizqvknI2SOE39UIDkrMt84= X-Received: by 2002:a17:906:d7d2:b0:a77:f2c5:84bf with SMTP id a640c23a62f3a-a8366c2f1eamr54414566b.2.1723586368927; Tue, 13 Aug 2024 14:59:28 -0700 (PDT) MIME-Version: 1.0 References: <20240813215358.2259750-1-shakeel.butt@linux.dev> In-Reply-To: <20240813215358.2259750-1-shakeel.butt@linux.dev> From: Yosry Ahmed Date: Tue, 13 Aug 2024 14:58:51 -0700 Message-ID: Subject: Re: [PATCH v2] memcg: use ratelimited stats flush in the reclaim To: Shakeel Butt Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Jesper Dangaard Brouer , Yu Zhao , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Meta kernel team , cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: fibmdcyk6pcuzwjydjgspxad3t9bk3qz X-Rspam-User: X-Rspamd-Queue-Id: 4249EA0003 X-Rspamd-Server: rspam02 X-HE-Tag: 1723586371-626155 X-HE-Meta: U2FsdGVkX1+eB/kiEAT9F1UjosyCoVa29x2Y+4nnLheAZSQxPkj6WWkZkWRqEESDtIvIOK/aXSOLqDVG51XkALTZvK1xMo6OeciNNnEmOm4fmqRq10fqBhvaR84f5xQ32fpd+IIpBctkJS9T0iIBgMj9GnT2c9Z0rtYJO70nGy6eXe/Has9d1PQC/bPJVv05kNbqZeEx9yqfPJAfFpqNH127VDGL1vUWVy7G7asjPvZ48tAdQ8tWeS9D5ZGnJu6lAsqGA8aenbBKwGsDQxrl17rjF+nFlmy67L+v9id4enUdYeawD+1Qu1UVaQ/ka3mCHWX0Fu6CdxiXFA1KG1G3hToC4bT2WBnBTJgWEZ9SSX4e4NkR0Xjtv/6Q2Zwnao8i4hwj+zHMJ1aZ1hj6Lbx4VNRJhF9cxgZRgJ5W4rwu3ewasFDQtwI+t3Vu0OotiJEXhFjNF5G0IbprRdOm/S7k45/HeHK9mMrGjk29QIEwS1ZuQOPtECWyHMrr8Ehb2NDEEBHj4yRXJ6QHqGGQQw010cltEzzKMUJts9n67Q4qzYBG/h0yP9J6uhVGnECsMmiSQiN3eJZD+incFh2ICUh05WxXLiv+z8exGF7sYjEk8lIJaS4Pahcfry/ChKMxL8SsGVBiR633ikJn7T/Sf/1NroJQJ1qN+2xx23JQX6O1vSrXTx6CkR0qbzuRCbG/3GpwdXO8w7yxMFAzvbx41aseVF9jsWkIdkyqU2Pp3QGtPYcSD2l1Zt/CFsb7IayQ2DZ0gssx3wubj1t84N7mEZDsZBdn13H9LRvJSe/rvhpJvt50tySG5GPN3Q4whjIXn6RJoiKj/epLePfxdQvO9R8PF+i38eGAvO/JLIHmELuRBFdwaFulmv6zhSaemPYZse2F2my7wMCSoz1w8dTMG/rUrorXP3KpbcbvbmrUBbo3UlcI1V0ctsYBDSqMwbys37uK2lCEdPEuZ2Q3uQK6MeA +YWjyUyM 5JzeHaRbBXnwnqXVtShF1DnvVq5e7yWoZlPS3eQFmT5uATWPe8J5tZx6zGxGo3VXhS14usOZBLRpCul3yOzlUXv+/dJ65aSAhi7wLoaSXXVo7cyorLRqURgj9/Li8IP8d95fG9TmjF5UGQR+CNk/WbA97tkiwQhk7c7re+I1Z3BdPh+qXGxI9dWHsTPj5ICElkOBFtLuz5yiELlEhvqjW9j8wbhfakcj1tkUzoXR0SadX22k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 13, 2024 at 2:54=E2=80=AFPM Shakeel Butt wrote: > > The Meta prod is seeing large amount of stalls in memcg stats flush > from the memcg reclaim code path. At the moment, this specific callsite > is doing a synchronous memcg stats flush. The rstat flush is an > expensive and time consuming operation, so concurrent relaimers will > busywait on the lock potentially for a long time. Actually this issue is > not unique to Meta and has been observed by Cloudflare [1] as well. For > the Cloudflare case, the stalls were due to contention between kswapd > threads running on their 8 numa node machines which does not make sense > as rstat flush is global and flush from one kswapd thread should be > sufficient for all. Simply replace the synchronous flush with the > ratelimited one. > > One may raise a concern on potentially using 2 sec stale (at worst) > stats for heuristics like desirable inactive:active ratio and preferring > inactive file pages over anon pages but these specific heuristics do not > require very precise stats and also are ignored under severe memory > pressure. > > More specifically for this code path, the stats are needed for two > specific heuristics: > > 1. Deactivate LRUs > 2. Cache trim mode > > The deactivate LRUs heuristic is to maintain a desirable inactive:active > ratio of the LRUs. The specific stats needed are WORKINGSET_ACTIVATE* > and the hierarchical LRU size. The WORKINGSET_ACTIVATE* is needed to > check if there is a refault since last snapshot and the LRU size are > needed for the desirable ratio between inactive and active LRUs. See the > table below on how the desirable ratio is calculated. > > /* total target max > * memory ratio inactive > * ------------------------------------- > * 10MB 1 5MB > * 100MB 1 50MB > * 1GB 3 250MB > * 10GB 10 0.9GB > * 100GB 31 3GB > * 1TB 101 10GB > * 10TB 320 32GB > */ > > The desirable ratio only changes at the boundary of 1 GiB, 10 GiB, > 100 GiB, 1 TiB and 10 TiB. There is no need for the precise and accurate > LRU size information to calculate this ratio. In addition, if > deactivation is skipped for some LRU, the kernel will force deactive on > the severe memory pressure situation. > > For the cache trim mode, inactive file LRU size is read and the kernel > scales it down based on the reclaim iteration (file >> sc->priority) and > only checks if it is zero or not. Again precise information is not > needed. > > This patch has been running on Meta fleet for several months and we have > not observed any issues. Please note that MGLRU is not impacted by this > issue at all as it avoids rstat flushing completely. > > Link: https://lore.kernel.org/all/6ee2518b-81dd-4082-bdf5-322883895ffc@ke= rnel.org [1] > Signed-off-by: Shakeel Butt Just curious, does Jesper's patch help with this problem? > --- > Changes since v1: > - Updated the commit message. > > mm/vmscan.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 008b62abf104..82318464cd5e 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2282,10 +2282,11 @@ static void prepare_scan_control(pg_data_t *pgdat= , struct scan_control *sc) > target_lruvec =3D mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat)= ; > > /* > - * Flush the memory cgroup stats, so that we read accurate per-me= mcg > - * lruvec stats for heuristics. > + * Flush the memory cgroup stats in rate-limited way as we don't = need > + * most accurate stats here. We may switch to regular stats flush= ing > + * in the future once it is cheap enough. > */ > - mem_cgroup_flush_stats(sc->target_mem_cgroup); > + mem_cgroup_flush_stats_ratelimited(sc->target_mem_cgroup); > > /* > * Determine the scan balance between anon and file LRUs. > -- > 2.43.5 >