From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24F3AC433FE for ; Mon, 17 Oct 2022 18:52:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B2D06B0074; Mon, 17 Oct 2022 14:52:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 662A16B0075; Mon, 17 Oct 2022 14:52:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5520B6B0078; Mon, 17 Oct 2022 14:52:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 46D716B0074 for ; Mon, 17 Oct 2022 14:52:42 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2186016094E for ; Mon, 17 Oct 2022 18:52:42 +0000 (UTC) X-FDA: 80031337764.21.EF65739 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf20.hostedemail.com (Postfix) with ESMTP id 7D61F1C0036 for ; Mon, 17 Oct 2022 18:52:41 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id DADA25BFCA; Mon, 17 Oct 2022 18:52:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1666032759; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=uIg/vWRDzwcg+FnbrRQRZsB7kMPtfjh8GsB556SPdCE=; b=opkb3ve0AqetQ24+LxR7h4nCUtfSNEtlQ16c+8jm7TtbDpuPTQc24EklqJhdYisPgtwIq+ 1hzt/V+jP3veWOnLweikP1UqTd6OsekH8QTMgpz1dzxJPs/N0pS1xWl7zohBsLC0rWCfHI yVO3G4J3tCyfvLTwQY8zDhjJ1Sp9+wI= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A118B13398; Mon, 17 Oct 2022 18:52:39 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id AqNGJnekTWPhBgAAMHmgww (envelope-from ); Mon, 17 Oct 2022 18:52:39 +0000 Date: Mon, 17 Oct 2022 20:52:38 +0200 From: Michal =?iso-8859-1?Q?Koutn=FD?= To: Yosry Ahmed Cc: Tejun Heo , Zefan Li , Johannes Weiner , Michal Hocko , Shakeel Butt , Roman Gushchin , Andrew Morton , Linux-MM , Cgroups , Greg Thelen Subject: Re: [RFC] memcg rstat flushing optimization Message-ID: <20221017185238.GA7699@blackbody.suse.cz> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="4Ckj6UjgE2iN1+kY" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=opkb3ve0; spf=pass (imf20.hostedemail.com: domain of mkoutny@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mkoutny@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666032761; a=rsa-sha256; cv=none; b=Zx386ccuhtzs9m43q5i3pma/6WZLaaKitSPwYHMuoSjWW93SgcArcIF1GgOeS5Ms3UATDD SzMs9lgRwGBf9DGg+bdBD1KbiV0Zmw+dybQHBfKNoShkKSU3MrbdTs/OP1DVSGsxPGmRr5 fAy3++VyUr0UBsJ34NVzUXX+yLAl1qs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666032761; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uIg/vWRDzwcg+FnbrRQRZsB7kMPtfjh8GsB556SPdCE=; b=FHSGYsm33/Duyhi9iJAAG6nN+7s60jh+G6ILrttgSezmsuzAJyzO13HRCsy7u3UsPEIMyi i80oXkelzb95h+5vw+k8ywn35xcupEFvvW68FjiklQNtg4EdiB6082O2CChia0r8wOXQ+p qIzDAwwt76y85T6zkG6E3U1WkEVJX+Q= X-Stat-Signature: xw89aabox8zt5rw1czjtkp7898dmhrdy X-Rspamd-Queue-Id: 7D61F1C0036 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=opkb3ve0; spf=pass (imf20.hostedemail.com: domain of mkoutny@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mkoutny@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1666032761-7798 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --4Ckj6UjgE2iN1+kY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello. On Tue, Oct 04, 2022 at 06:17:40PM -0700, Yosry Ahmed wrote: > Sorry for the long email :) (I'll get to other parts sometime in the future. Sorry for my latency :) > We have recently ran into a hard lockup on a machine with hundreds of > CPUs and thousands of memcgs during an rstat flush. > [...] I only respond with some remarks to this particular case. > As you can imagine, with a sufficiently large number of > memcgs and cpus, a call to mem_cgroup_flush_stats() might be slow, or > in an extreme case like the one we ran into, cause a hard lockup > (despite periodically flushing every 4 seconds). Is this your modification from the upstream value of FLUSH_TIME (that's every 2 s)? In the mailthread, you also mention >10s for hard-lockups. That sounds scary (even with the once per 4 seconds) since with large enough update tree (and update activity) periodic flush couldn't keep up. Also, it seems to be kind of bad feedback, the longer a (periodic) flush takes, the lower is the frequency of them and the more updates may accumulate. I.e. one spike in update activity can get the system into a spiral of long flushes that won't recover once the activity doesn't drop much more.=20 (2nd point should have been about some memcg_check_events() optimization or THRESHOLDS_EVENTS_TARGET justifying delayed flush but I've found none to= be applicable. Just noting that v2 fortunetly doesn't have the threshold notifications.) Regards, Michal --4Ckj6UjgE2iN1+kY Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- iHUEARYIAB0WIQTrXXag4J0QvXXBmkMkDQmsBEOquQUCY02kdAAKCRAkDQmsBEOq ucSYAQC1iiT2OoWMBWVeABnkerHapSlZb9R02QB2KaKYuq0IeQEAhZVf5gQ3FK2e 7yTBVL+HCfkrSeToyI19ckXOnIIGBwA= =s/Rd -----END PGP SIGNATURE----- --4Ckj6UjgE2iN1+kY--