From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2E2A9EB64D9
	for <linux-mm@archiver.kernel.org>; Thu,  6 Jul 2023 06:20:52 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 8C7CB8D0002; Thu,  6 Jul 2023 02:20:51 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 878D88D0001; Thu,  6 Jul 2023 02:20:51 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 741818D0002; Thu,  6 Jul 2023 02:20:51 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id 625138D0001
	for <linux-mm@kvack.org>; Thu,  6 Jul 2023 02:20:51 -0400 (EDT)
Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id 3037E1204C8
	for <linux-mm@kvack.org>; Thu,  6 Jul 2023 06:20:51 +0000 (UTC)
X-FDA: 80980188702.08.8DD2C37
Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202])
	by imf26.hostedemail.com (Postfix) with ESMTP id 5F382140015
	for <linux-mm@kvack.org>; Thu,  6 Jul 2023 06:20:49 +0000 (UTC)
Authentication-Results: imf26.hostedemail.com;
	dkim=pass header.d=google.com header.s=20221208 header.b="a/gZc1hv";
	spf=pass (imf26.hostedemail.com: domain of 3QF2mZAgKCL8xmfpjjqglttlqj.htrqnsz2-rrp0fhp.twl@flex--shakeelb.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3QF2mZAgKCL8xmfpjjqglttlqj.htrqnsz2-rrp0fhp.twl@flex--shakeelb.bounces.google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688624449; a=rsa-sha256;
	cv=none;
	b=SuuVdAte+CdaMfTNUDpPOJtiXZ/vTJiOsXGkkNkNnjVPOsFpaR601UIgRFM7fKSqFqb72t
	rloWSuAUZRhxUNp6P7lZzoxfbbHmzYkHRyouTsGXohe4u3FfMNkZTam118KSNnic32k+4m
	PMNFCeV+r+AQmUDpRcViCqs35nnKA1A=
ARC-Authentication-Results: i=1;
	imf26.hostedemail.com;
	dkim=pass header.d=google.com header.s=20221208 header.b="a/gZc1hv";
	spf=pass (imf26.hostedemail.com: domain of 3QF2mZAgKCL8xmfpjjqglttlqj.htrqnsz2-rrp0fhp.twl@flex--shakeelb.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3QF2mZAgKCL8xmfpjjqglttlqj.htrqnsz2-rrp0fhp.twl@flex--shakeelb.bounces.google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1688624449;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=054dwjt9BZCou3BvPjMRnfFzZ2wjHCy2lrA2F8To2Yk=;
	b=0IDkMBML0lGECjzpAOaWZiPsOkE2PFcydK3r2sF/FrbL/VmABS6GwyuuZRSnFamjSBTXxC
	5ooy/Qj66uUMnzFbwfS62YhNkTftGygY4yg5TaX6sahiC2RfoDJX9/jPy4Kn8KGrorZwqj
	BqbKW5x+V3tGXR2ueTGTrRXq1FHmqJE=
Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-5578082558eso390859a12.1
        for <linux-mm@kvack.org>; Wed, 05 Jul 2023 23:20:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20221208; t=1688624448; x=1691216448;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=054dwjt9BZCou3BvPjMRnfFzZ2wjHCy2lrA2F8To2Yk=;
        b=a/gZc1hvT7JRiiUYzG1SPVOHiHFMZ5cv81dBONAX0K6rQb/SqodZ/lLZoD6G628ExP
         YAd7fHKCO8Uce6GfN3NdeaykplGQ29jTP1s1duFpDpYznOqIoHbToAyL5+/e4rjop4px
         RnYjO1Dq2V76xE62yjTCc5gppCq4Ilyvg6B+wC/at9D5beVGj/dSGIKww8CivxZHgMPs
         xUAr63iI0RUxB/1CrizQ0kDz4z+hUrxNojnH+C506j0HK6jQWpTT4OiCsEUo2etb1Qaj
         qq9nOEvhORD0DyWKW/sSLHCUa79qo5POjO0faDZksJV99qlPER3hM1mrmWsqRtzP1voK
         kuCg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1688624448; x=1691216448;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=054dwjt9BZCou3BvPjMRnfFzZ2wjHCy2lrA2F8To2Yk=;
        b=TptTAS4Y8rysANPfDFQ5KA8Dewr/cAtxssSrGd9zOTXG+soIeHIwgOoeUd24YCf0fJ
         DwOEt7MhtKhTFRW81kUMpXvlm7jGgjXb5LMqcpBLgExAqmjyFtMpzWsrnB/4JbbSWeeC
         9vZLV76h4KJx7Pm+gjzkrww/vKX7tiqNuZJKefWTRzpBe2l+wdpVl/bTZtbvGY1VVxBo
         WYyZ840/AezJAlnS/lebPUTTOyXCM8rfyQjC9V+UCD6NqLurJ2UZJfM7kcWPyP7Ae0Lv
         LFx7Q6KGZcbsTdoWnAZv4lsYZvdPaz+ZXoWwLRD7iTUqRUPa3YVH9XU+rrBWmQy0y2YF
         kkAg==
X-Gm-Message-State: ABy/qLbdgGD2TrRe0zA90AkCMmufNQoIYpwXa1Z23TzsOyhPeylZAc+9
	Q3PUTtAyAjBDIxM+dSrLEx30N+ucpIYdcw==
X-Google-Smtp-Source: APBJJlFzXEYN6IPpBFkmO+8r0LKe3KxYt8mAuNjzTYJLriYpEm63md407jdKnq12s3zBrfw6dE9T/SconNIdSw==
X-Received: from shakeelb.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:262e])
 (user=shakeelb job=sendgmr) by 2002:a63:751d:0:b0:553:8668:dc40 with SMTP id
 q29-20020a63751d000000b005538668dc40mr509175pgc.6.1688624448157; Wed, 05 Jul
 2023 23:20:48 -0700 (PDT)
Date: Thu, 6 Jul 2023 06:20:45 +0000
In-Reply-To: <CABWYdi0c6__rh-K7dcM_pkf9BJdTRtAU08M43KO9ME4-dsgfoQ@mail.gmail.com>
Mime-Version: 1.0
References: <CABWYdi0c6__rh-K7dcM_pkf9BJdTRtAU08M43KO9ME4-dsgfoQ@mail.gmail.com>
Message-ID: <20230706062045.xwmwns7cm4fxd7iu@google.com>
Subject: Re: Expensive memory.stat + cpu.stat reads
From: Shakeel Butt <shakeelb@google.com>
To: Ivan Babrou <ivan@cloudflare.com>
Cc: cgroups@vger.kernel.org, Linux MM <linux-mm@kvack.org>, 
	kernel-team <kernel-team@cloudflare.com>, Johannes Weiner <hannes@cmpxchg.org>, 
	Michal Hocko <mhocko@kernel.org>, Roman Gushchin <roman.gushchin@linux.dev>, 
	Muchun Song <muchun.song@linux.dev>, Andrew Morton <akpm@linux-foundation.org>, 
	linux-kernel <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="us-ascii"
X-Rspamd-Server: rspam08
X-Rspamd-Queue-Id: 5F382140015
X-Stat-Signature: 9xxekuwrzbcrd1ij5wr1brxk45xzo1cq
X-Rspam-User: 
X-HE-Tag: 1688624449-235887
X-HE-Meta: U2FsdGVkX18WaBhLDKOG/0lk60HSffc1/OHOV4YgnN4DRKumlMXVKXIe+M8Ltb+O/yAeTkeui902GpUe05GsyS2hQiXeNC8S70w67IIY3f764rNaWnpTidnVpz93AgjnDv+p1sjunLKmWoPSwzI7bvMGQ9LIKbI8W4pYUaKWlM3xK9HYI4/1flsLzPL1cw6qfIAKcDwhsAVUoVLKC9+9yR14dbB+YaKogITzKqzhQSaFQeIua/DhueSw8xoH4rvhPLieXxVJKHfyLBnZ2Hx2D/RFwaub4MjK2Ctd67Uhm9zcb6mqv0vX0RKVKpf+Bh+vDmE3eUiG1oBIGBIxZMlBqyYTRYnDfQCzIt6+oi3tWHjV1Gw2bZpDXL1McYoeUZvzum6XFO+Z/Q96Ke1gyyqpFTMsqqOzKsD5xNF1umn93N1f91vbGgPcmIwz5A1SaClny2WfMmfsoStpsNowp2Gvet0r7r2WGzjGJVbyr4upLyb2NnO/+TeL2ufmBLnJw5DVrsgDJQzxU3ImNySjs7IMrquoHKeEQFZ0BGsy7+qqY5Zi4gBgPRvl8UktmNhiq+0zwC2MhJghYmEPjIGMCM2bCF1ZB6LrTznJJSb+/D7LVyH+T9ypr8xScus2fajlSvS63qwj4uO4zY74pJu7RD5R8ZeA9ogkPSqbXWq2Y1zXJUP9DunXHzyEcif1tykhv26IH327f8aDPDNns3BXd8C95l0DOptec99aXyzYa8egyMg2w2wmGz40WQV/eHmhxmk/HRp1Kt4AWsO2dyGPn0ffGbLRBoXIXgJPOm72vbIM6uZQBigzYB2pLwDgWy/aig1P+L7JU1v6imkXYni79Yn9yydUdzjazRZ4hjagyyyJNsUMn327w9bG8YSPdolG1GlVXqKdeQO4POvAxixBoEPejBDQ7QfROesnu8VxU2EoM16nCih5Eni9Vg+Hk0JrD/LXR1dsKDcGg5HlJkwMNXw
 3/Eguoo6
 WVQLvKoR3afpou92JaEPu3dNBpvP5h9ZeDAdSzqr3GHMcKeuwx4lcCZUhcdqYmGfNmtl56NXMNFNI2JrK2cLKQDjC4x0fVO5EDmegpnE+Ah9Fljgp5L0tNGYu4yyyzCLWKMefcWo3U0947Y8V55NMOZQO7+DvnnikX1gyJqRxjU0XqAESY7rSBgaUUh+3w/+aJ9In9pzvtnBmdJ/upBkmrD4uqH+G5IxyGB/JGsjdo3Z8tywRpkrTN0jLkl7r3TinrsY52F37IIyEKvx/UG7oqI3vZOnMa5B0X4hNuqG1dj0aey7wIMJKRBAyqt4OmW5VP4CbKGr3BOeK0+LnCzDqfrpp0pYSbQBKrSwsby9RtqYdz5+YpKmJRbApfVMWi+QXTwAGkwjwMqXradHvpile5sXqx0sQoILjZZCKlBoBFbQLu92k5BpfDd6b9G1WbvHyeA8hPptRwTia/e5HHwt5F3BeNW5S4pX1AzQSWXjKa8IsApJ91EE8GQVcjosb33Ij2Sgii1grrwLUyf4fGSMpSk3CRswfEZcMLq4Xqf/CavbqDXd0PrDABYvrMdBZVGXefRsZD5nkOveiMsTF0XrP+RnU9lNfIHq6Rf+ysLL9auu2DMIyWSMkQu64bA==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000099, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Jun 30, 2023 at 04:22:28PM -0700, Ivan Babrou wrote:
> Hello,
> 
> We're seeing CPU load issues with cgroup stats retrieval. I made a
> public gist with all the details, including the repro code (which
> unfortunately requires heavily loaded hardware) and some flamegraphs:
> 
> * https://gist.github.com/bobrik/5ba58fb75a48620a1965026ad30a0a13
> 
> I'll repeat the gist of that gist here. Our repro has the following
> output after a warm-up run:
> 
> completed:  5.17s [manual / mem-stat + cpu-stat]
> completed:  5.59s [manual / cpu-stat + mem-stat]
> completed:  0.52s [manual / mem-stat]
> completed:  0.04s [manual / cpu-stat]
> 
> The first two lines do effectively the following:
> 
> for _ in $(seq 1 1000); do cat /sys/fs/cgroup/system.slice/memory.stat
> /sys/fs/cgroup/system.slice/cpu.stat > /dev/null
> 
> The latter two are the same thing, but via two loops:
> 
> for _ in $(seq 1 1000); do cat /sys/fs/cgroup/system.slice/cpu.stat >
> /dev/null; done
> for _ in $(seq 1 1000); do cat /sys/fs/cgroup/system.slice/memory.stat
> > /dev/null; done
> 
> As you might've noticed from the output, splitting the loop into two
> makes the code run 10x faster. This isn't great, because most
> monitoring software likes to get all stats for one service before
> reading the stats for the next one, which maps to the slow and
> expensive way of doing this.
> 
> We're running Linux v6.1 (the output is from v6.1.25) with no patches
> that touch the cgroup or mm subsystems, so you can assume vanilla
> kernel.
> 
> From the flamegraph it just looks like rstat flushing takes longer. I
> used the following flags on an AMD EPYC 7642 system (our usual pick
> cpu-clock was blaming spinlock irqrestore, which was questionable):
> 
> perf -e cycles -g --call-graph fp -F 999 -- /tmp/repro
> 
> Naturally, there are two questions that arise:
> 
> * Is this expected (I guess not, but good to be sure)?
> * What can we do to make this better?
> 
> I am happy to try out patches or to do some tracing to help understand
> this better.

Hi Ivan,

Thanks a lot, as always, for reporting this. This is not expected and
should be fixed. Is the issue easy to repro or some specific workload or
high load/traffic is required? Can you repro this with the latest linus
tree? Also do you see any difference of root's cgroup.stat where this
issue happens vs good state?

BTW I am away for next month with very limited connectivity, so expect
slow response.

thanks,
Shakeel