From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D637FEB64DD for ; Wed, 9 Aug 2023 08:51:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C9AF6B0071; Wed, 9 Aug 2023 04:51:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 579FF6B0075; Wed, 9 Aug 2023 04:51:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 440A78E0001; Wed, 9 Aug 2023 04:51:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 30C486B0071 for ; Wed, 9 Aug 2023 04:51:18 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D87E1C08C6 for ; Wed, 9 Aug 2023 08:51:17 +0000 (UTC) X-FDA: 81103946994.10.08DA3F4 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf29.hostedemail.com (Postfix) with ESMTP id B90DB120014 for ; Wed, 9 Aug 2023 08:51:15 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=qnZnaGnS; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf29.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691571076; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FNFE+i+WwtgcHn5Rj+LGuGTsx7FtrFv3sviTgeepiYo=; b=pW9xGECRyYGd1EnZHEYbrY3FSQE6qOy3Xjym9i6s07ETyQQJYcnr30roUDDUJJj0/adBdb +AEEYAuZ3FIasgKWh6pT1+b4qpB1Kk8d163GAgWY57WE0jw/dXEOI9FkEzomjeqjcUDL2b ikPNlsO5zAychpWxDl+bDxu/x38mP0Y= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=qnZnaGnS; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf29.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691571076; a=rsa-sha256; cv=none; b=yJSqCR7ToWuOQrq0lPJD8hh9YlujJmiCw9qlU4SnTTLy4zosaPGKe1AScNLACnBfeCucfV r/WXmA5jC44LxSQil24NR+fXR20sIiLcijbgf2MiKo4oOZlHoh6NlwAc/eBH6L20/nTQtQ FJMxNZhKRjlfuFXNvI9Y7mISjCCZ+O8= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id EAD911F45A; Wed, 9 Aug 2023 08:51:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1691571073; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FNFE+i+WwtgcHn5Rj+LGuGTsx7FtrFv3sviTgeepiYo=; b=qnZnaGnS4J4MTbOYIfnDmj0JDfOtjkYuNESsya/JKY/fC0CJea40S3ZNfi5BL2ROwQ99Yo A+Kqlj/kNqEaQ99nNFSp2J0jViYuoMPjBWN7zmPu6XUws9D0aQ9un1VhrKrqFlaPndu73+ +ZNVOAqk0JnQiUgLgxo9m+62w5cHj/8= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id CC99513251; Wed, 9 Aug 2023 08:51:13 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id M2P6LoFT02RCfQAAMHmgww (envelope-from ); Wed, 09 Aug 2023 08:51:13 +0000 Date: Wed, 9 Aug 2023 10:51:13 +0200 From: Michal Hocko To: Yosry Ahmed Cc: Johannes Weiner , Roman Gushchin , Shakeel Butt , Andrew Morton , Muchun Song , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: memcg: provide accurate stats for userspace reads Message-ID: References: <20230809045810.1659356-1-yosryahmed@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230809045810.1659356-1-yosryahmed@google.com> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B90DB120014 X-Stat-Signature: qc43z36gzctwat8nzo7beg5aakaabzpj X-Rspam-User: X-HE-Tag: 1691571075-359538 X-HE-Meta: U2FsdGVkX19wIOxftL9ZJt2nXF8hcm6HOCIvrBt6n6i+fZKQrZEvgJzTCtQUZNnwt84uJNQ8l+39PZ8l+EKD5qyQde2Kz+aULMIhpXbzSlafJpcD4wb8WJLkjI5usdJORqBxo03DGJtzvcEqUtTMhwMf1c6zAdpmrj+JpXyFX6Ep5tZXCAiAc1/9BYhe0N7W5uvlre5CRqdq0dcEYyXr6fVWAq8CN0a5ckwVislr/f6mB0+HQ9eXbD9rcwnChQbkQCZoW+h3jApia9vAqkvBH5gP5qTnFwBk7/nuOJWH/JFDxx7YXMGOrDvR9uWLpk0vASgQZAw8hFsEV9y0SOYhz+D3k2iQIYiE7O5mgDRZPbpmm10zchCuOjqWKcyTlleZINnrd781IFmgOKA22+IJKTb4SHhLzB0QytA1hgSzscATF4s2EkuE5qIcxeTdJRz1OQVpuHjKWx2ZU55CCdM/MeRykmFo6Ag14Pix+lefKyaXgf5P7Cpti5OW+lsDxknOUtOFySfOvqhl2vlbgt1K87e3mLALZvyErZKSwy8KIwB5wXKoeBqj14S0JOMif/mtayoxQspMrjwVzXx+ATyg4fM2lfLqkfCEXP4staYif9ErdVY189a6uAO12tNacVTT/KKcG5nC1XvKOqIAElcn1p59D2wYwFIskqsNoxVCr7Yzac8z/B0E9xP8k2SSJj3jw1kEoc0Lxu9JgdB+l3u6OwIm0auftngwgDDssUNbljKaG2aFbWmEczKfenM6mw5BDD2R9B1WWG5vAFCbcyc759wB3SgWK9rgeOK0+gfDESc9HHTufE0UjCLF/5j5k3HSBgarEJw/cIN8edRqGH3oA3Bbzj1ZLHhQSwraCjZG4Oj23gattBIRcifOaZNAJ/nCFzPo5+s+j8j6JUxlICTzOUu/gDipXCCg62JsLcXwEUp6yHXJbKs2tcgJNa9joVp2An55ZHUs84z8MtoZUWr cTF+V2R0 L7iAN6rpEJ9Z9s4mPstpTL5Djmvcyrdwp2q9pmJoH+puEWZ+AzJexq7nX1Hm1b4W+gHLp/eNYE5IXwW/fvXbtBK3wisyuHuc9IHR9xMWd8A+hydciwrTln+Zpqlr/7mcgntEHT5UHC+AQMUDKyoqSVu0iWA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 09-08-23 04:58:10, Yosry Ahmed wrote: > Over time, the memcg code added multiple optimizations to the stats > flushing path that introduce a tradeoff between accuracy and > performance. In some contexts (e.g. dirty throttling, refaults, etc), a > full rstat flush of the stats in the tree can be too expensive. Such > optimizations include [1]: > (a) Introducing a periodic background flusher to keep the size of the > update tree from growing unbounded. > (b) Allowing only one thread to flush at a time, and other concurrent > flushers just skip the flush. This avoids a thundering herd problem > when multiple reclaim/refault threads attempt to flush the stats at > once. > (c) Only executing a flush if the magnitude of the stats updates exceeds > a certain threshold. > > These optimizations were necessary to make flushing feasible in > performance-critical paths, and they come at the cost of some accuracy > that we choose to live without. On the other hand, for flushes invoked > when userspace is reading the stats, the tradeoff is less appealing > This code path is not performance-critical, and the inaccuracies can > affect userspace behavior. For example, skipping flushing when there is > another ongoing flush is essentially a coin flip. We don't know if the > ongoing flush is done with the subtree of interest or not. I am not convinced by this much TBH. What kind of precision do you really need and how much off is what we provide? More expensive read of stats from userspace is quite easy to notice and usually reported as a regression. So you should have a convincing argument that an extra time spent is really worth it. AFAIK there are many monitoring (top like) tools which simply read those files regularly just to show numbers and they certainly do not need a high level of precision. [...] > @@ -639,17 +639,24 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) > } > } > > -static void do_flush_stats(void) > +static void do_flush_stats(bool full) > { > + if (!atomic_read(&stats_flush_ongoing) && > + !atomic_xchg(&stats_flush_ongoing, 1)) > + goto flush; > + > /* > - * We always flush the entire tree, so concurrent flushers can just > - * skip. This avoids a thundering herd problem on the rstat global lock > - * from memcg flushers (e.g. reclaim, refault, etc). > + * We always flush the entire tree, so concurrent flushers can choose to > + * skip if accuracy is not critical. Otherwise, wait for the ongoing > + * flush to complete. This avoids a thundering herd problem on the rstat > + * global lock from memcg flushers (e.g. reclaim, refault, etc). > */ > - if (atomic_read(&stats_flush_ongoing) || > - atomic_xchg(&stats_flush_ongoing, 1)) > - return; > - > + while (full && atomic_read(&stats_flush_ongoing) == 1) { > + if (!cond_resched()) > + cpu_relax(); You are reinveting a mutex with spinning waiter. Why don't you simply make stats_flush_ongoing a real mutex and make use try_lock for !full flush and normal lock otherwise? > + } > + return; > +flush: > WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); > > cgroup_rstat_flush(root_mem_cgroup->css.cgroup); [...] -- Michal Hocko SUSE Labs