From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD816C52D7C for ; Tue, 13 Aug 2024 22:30:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 69F156B0082; Tue, 13 Aug 2024 18:30:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 64F646B0083; Tue, 13 Aug 2024 18:30:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 516F56B0092; Tue, 13 Aug 2024 18:30:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 344D06B0082 for ; Tue, 13 Aug 2024 18:30:51 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C6174C0AD7 for ; Tue, 13 Aug 2024 22:30:50 +0000 (UTC) X-FDA: 82448668260.27.4886A9D Received: from out-170.mta1.migadu.com (out-170.mta1.migadu.com [95.215.58.170]) by imf21.hostedemail.com (Postfix) with ESMTP id 4273D1C0004 for ; Tue, 13 Aug 2024 22:30:47 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=mBRKlPf1; spf=pass (imf21.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.170 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723588176; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+aS5ftMGKo/Ru7Ti/UB5xv32zI9bpFTAyHk1po8m4DE=; b=S8KQMMkFttI6apY4ans6F0vLGLn17atreY/tbA7puEoHbQLPLSLwUFgsRcTMjM8ALhlRsn CBZbZoWTHhqP5XDi0zCTBNYv4bAsjT3Fzyby8tcpTEAeaZZ5wj0SjyrLgptVkAt730anOq whjOcJGIF1eUTW4IRqfl/zHnShOHsnQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723588176; a=rsa-sha256; cv=none; b=DpdBULhQPx8ExZk6sCw0AfWhvzx+bKkRKjIjLG4wSNMmf6eF95mZnUP28btHK4OtsaSDof prrn5qusoLG/mkvCSBcY9sz/VcdtfEQTqh5L6+07b6Hbz8PB4CWG23dn5TllzolC8IpkoD 07S31f2RDOn9ugozX44fzY31r6J8GT8= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=mBRKlPf1; spf=pass (imf21.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.170 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Date: Tue, 13 Aug 2024 15:30:40 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1723588245; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+aS5ftMGKo/Ru7Ti/UB5xv32zI9bpFTAyHk1po8m4DE=; b=mBRKlPf19qmSOJKwWoZtfmEMPwu4TlmUiCfA0BhVZYOH71LHF+YSnZoHvNl7uW6RssnpVb bsl025VbXhHtN977rOKoz7tcGu9vw1qzO6HpTczdlJGeiiPiLiWjDYOtkZANgMGVEoLY9c v9612sDqGPK+yFjIpDes0ugxnCoukaA= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Yosry Ahmed Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Jesper Dangaard Brouer , Yu Zhao , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Meta kernel team , cgroups@vger.kernel.org Subject: Re: [PATCH v2] memcg: use ratelimited stats flush in the reclaim Message-ID: References: <20240813215358.2259750-1-shakeel.butt@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 4273D1C0004 X-Stat-Signature: so1bwss41js1z1enzdghjate1wj5jb7z X-HE-Tag: 1723588247-399157 X-HE-Meta: U2FsdGVkX1/kZQa92tCFtp7EKXw5yvNXjizymo7eWY9UT9zlbJOZX1A14G3T9uWlHQkmbmIFp9DsAUheSWK2WBwCQ9U1uDdS2aRI4tX1ElKY2BN6gfH5N0t1bcxp7rSIaLx7+9r0mz0cni+vXNxP/pFEkK8CPtAu/prPe7T93OcmhwIkaWj+XXtviYqaymm7M3ssxtGGsdTiIh81R+sJkEuM40Duwigj87AdUKw970z+3rfoVMDuuh0Fjnp5YQ98a7NkqKGG30jhYkC6Jj4QnJyNomMXv3bwmPcQh42jYCMhfF5uEZK6Sq3nGIdTnhJcF275Tkui/Jff/oyaKuHYG5iw8uxayi35JjDbev6wdZcyY3dQLbvYKdO5SSSoIW74Xa1stCxyP9AMfNRz23BzJu8C/zRZhf52aFmUfPBrMfIPb8nejxLNNq2t1t6NMSzoDjrHCP4ZQRacDlVoL1uNxrZRx+mopra0zvoTXeuMY5YSrOtowoCk3GkfPtNa10VdkTcGw+CF1tSfA/CEpyzqIPmbv2HAGYiasTPdgyspewmKJ1doJrTG3r6dKyE5ggH2t8yEEmG8Xu6l3ELsam8CmxX5Ybd1PC89NBxdq9AqaWxplMAaxiqUe8q/Nh7F8No6ZjUN8RAdCgisolPfOBKgocJJnIGuVojY1PSgS/XN5LaymWPKFsCghVOl5r1hxOu4LzxgqDISfEU19QgsFZVF2W57izyXC95dDA7w3VVwNvDzFVXkvYtCw3dgzo0WcYvXOq8dRkBkwPPaQCvsyt/aZaodCcV1BRT5GAnwZpYxLvylPATjrh8nB3HKjJmol6eh7KDXMyiX5cglfRlYT5z2YjiLQns1SZI1OJWwjyuAknB0lIT20xewtXQlaCAqVoQmEGqFrmUT1NQWkctqCxvL7EuuCEEp4yu1HOfeCiLiOV7iydP+VwQe3BaTHFIax2uLLo92ngpOVx+zB6ajuJk rbx2yROF IXcgH2EoCQj2jJus9mZ+NEKFwoPxgOJjg+gxz0J1nVI3B7G0d29R1LRPJcBALJGwDdkz3QoCVIRa749MNcb/1e8J8I48w+CU6HbEa4hLSF0lkSzLkCOg3/NWxXUTsEeJ28T2ZXi7zIDRceZd+5OrIAoPZuP7LtclVTvzfXb7lNyHc5DlZtghS4xobQMqoJAq4JnIzsW/Cw6O4J5diVlV8OEz9sVY+SPMJbTCGy7fhXTwHPLe9XZtehc/CBw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 13, 2024 at 02:58:51PM GMT, Yosry Ahmed wrote: > On Tue, Aug 13, 2024 at 2:54 PM Shakeel Butt wrote: > > > > The Meta prod is seeing large amount of stalls in memcg stats flush > > from the memcg reclaim code path. At the moment, this specific callsite > > is doing a synchronous memcg stats flush. The rstat flush is an > > expensive and time consuming operation, so concurrent relaimers will > > busywait on the lock potentially for a long time. Actually this issue is > > not unique to Meta and has been observed by Cloudflare [1] as well. For > > the Cloudflare case, the stalls were due to contention between kswapd > > threads running on their 8 numa node machines which does not make sense > > as rstat flush is global and flush from one kswapd thread should be > > sufficient for all. Simply replace the synchronous flush with the > > ratelimited one. > > > > One may raise a concern on potentially using 2 sec stale (at worst) > > stats for heuristics like desirable inactive:active ratio and preferring > > inactive file pages over anon pages but these specific heuristics do not > > require very precise stats and also are ignored under severe memory > > pressure. > > > > More specifically for this code path, the stats are needed for two > > specific heuristics: > > > > 1. Deactivate LRUs > > 2. Cache trim mode > > > > The deactivate LRUs heuristic is to maintain a desirable inactive:active > > ratio of the LRUs. The specific stats needed are WORKINGSET_ACTIVATE* > > and the hierarchical LRU size. The WORKINGSET_ACTIVATE* is needed to > > check if there is a refault since last snapshot and the LRU size are > > needed for the desirable ratio between inactive and active LRUs. See the > > table below on how the desirable ratio is calculated. > > > > /* total target max > > * memory ratio inactive > > * ------------------------------------- > > * 10MB 1 5MB > > * 100MB 1 50MB > > * 1GB 3 250MB > > * 10GB 10 0.9GB > > * 100GB 31 3GB > > * 1TB 101 10GB > > * 10TB 320 32GB > > */ > > > > The desirable ratio only changes at the boundary of 1 GiB, 10 GiB, > > 100 GiB, 1 TiB and 10 TiB. There is no need for the precise and accurate > > LRU size information to calculate this ratio. In addition, if > > deactivation is skipped for some LRU, the kernel will force deactive on > > the severe memory pressure situation. > > > > For the cache trim mode, inactive file LRU size is read and the kernel > > scales it down based on the reclaim iteration (file >> sc->priority) and > > only checks if it is zero or not. Again precise information is not > > needed. > > > > This patch has been running on Meta fleet for several months and we have > > not observed any issues. Please note that MGLRU is not impacted by this > > issue at all as it avoids rstat flushing completely. > > > > Link: https://lore.kernel.org/all/6ee2518b-81dd-4082-bdf5-322883895ffc@kernel.org [1] > > Signed-off-by: Shakeel Butt > > Just curious, does Jesper's patch help with this problem? If you are asking if I have tested Jesper's patch in Meta's production then no, I have not tested it. Also I have not taken a look at the latest from Jesper as I was stuck in some other issues.