From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA206C433EF for ; Sat, 26 Feb 2022 00:58:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 20A568D0002; Fri, 25 Feb 2022 19:58:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B95B8D0001; Fri, 25 Feb 2022 19:58:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A9208D0002; Fri, 25 Feb 2022 19:58:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0242.hostedemail.com [216.40.44.242]) by kanga.kvack.org (Postfix) with ESMTP id ECD8D8D0001 for ; Fri, 25 Feb 2022 19:58:45 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A86609CD6A for ; Sat, 26 Feb 2022 00:58:45 +0000 (UTC) X-FDA: 79183121010.29.F2B1405 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf14.hostedemail.com (Postfix) with ESMTP id 1D887100005 for ; Sat, 26 Feb 2022 00:58:44 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0BDF661AA0; Sat, 26 Feb 2022 00:58:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0AD25C340E7; Sat, 26 Feb 2022 00:58:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1645837123; bh=7jPwtC/LdmlqEz69iDS2lkoVpynptg4SU3kS78wjJ68=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=FNSyEFmnICgX6UEvl7Wu3fgt3zLc9lRn+wQwfYO3ZFHGcFj0Tm7HvAngITqEdnO9R oG/CeNsDZvz5lj6xIDYfZQa1MaaE7B3vXsgFrdppQ755Rrva+OEKwsXHQr0zIiPWzz SzfnJMh87Ix34h4oB3OTLsO7HE5ZdJctfEvRYRuk= Date: Fri, 25 Feb 2022 16:58:42 -0800 From: Andrew Morton To: Shakeel Butt Cc: =?ISO-8859-1?Q?"Michal_Koutn=FD"?= , Johannes Weiner , Michal Hocko , Roman Gushchin , Ivan Babrou , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Daniel Dao , stable@vger.kernel.org Subject: Re: [PATCH] memcg: async flush memcg stats from perf sensitive codepaths Message-Id: <20220225165842.561d3a475310aeab86a2d653@linux-foundation.org> In-Reply-To: <20220226002412.113819-1-shakeelb@google.com> References: <20220226002412.113819-1-shakeelb@google.com> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1D887100005 X-Stat-Signature: byba997wggse5r19s9sorfwijkpgypyh Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=FNSyEFmn; spf=pass (imf14.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-HE-Tag: 1645837124-185732 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 25 Feb 2022 16:24:12 -0800 Shakeel Butt wrote: > Daniel Dao has reported [1] a regression on workloads that may trigger > a lot of refaults (anon and file). The underlying issue is that flushing > rstat is expensive. Although rstat flush are batched with (nr_cpus * > MEMCG_BATCH) stat updates, it seems like there are workloads which > genuinely do stat updates larger than batch value within short amount of > time. Since the rstat flush can happen in the performance critical > codepaths like page faults, such workload can suffer greatly. > > The easiest fix for now is for performance critical codepaths trigger > the rstat flush asynchronously. This patch converts the refault codepath > to use async rstat flush. In addition, this patch has premptively > converted mem_cgroup_wb_stats and shrink_node to also use the async > rstat flush as they may also similar performance regressions. Gee we do this trick a lot and gee I don't like it :( a) if we're doing too much work then we're doing too much work. Punting that work over to a different CPU or thread doesn't alter that - it in fact adds more work. b) there's an assumption here that the flusher is able to keep up with the producer. What happens if that isn't the case? Do we simply wind up the deferred items until the system goes oom? What happens if there's a producer running on each CPU? Can the flushers keep up? Pathologically, what happens if the producer is running task_is_realtime() on a single-CPU system? Or if there's a task_is_realtime() producer running on every CPU? The flusher never gets to run and we're dead? An obvious fix is to limit the permissible amount of windup (to what?) and at some point, do the flushing synchronously anyway. Or we just don't do any this at all and put up with the cost of the current code. I mean, this "fix" is kind of fake anyway, isn't it? Pushing the 4-10ms delay onto a different CPU will just disrupt something else which wanted to run on that CPU. The overall effect is to hide the impact from one particular testcase, but is the benefit really a real one?