From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E2A9EB64D9 for ; Thu, 6 Jul 2023 06:20:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C7CB8D0002; Thu, 6 Jul 2023 02:20:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 878D88D0001; Thu, 6 Jul 2023 02:20:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 741818D0002; Thu, 6 Jul 2023 02:20:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 625138D0001 for ; Thu, 6 Jul 2023 02:20:51 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3037E1204C8 for ; Thu, 6 Jul 2023 06:20:51 +0000 (UTC) X-FDA: 80980188702.08.8DD2C37 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf26.hostedemail.com (Postfix) with ESMTP id 5F382140015 for ; Thu, 6 Jul 2023 06:20:49 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="a/gZc1hv"; spf=pass (imf26.hostedemail.com: domain of 3QF2mZAgKCL8xmfpjjqglttlqj.htrqnsz2-rrp0fhp.twl@flex--shakeelb.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3QF2mZAgKCL8xmfpjjqglttlqj.htrqnsz2-rrp0fhp.twl@flex--shakeelb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688624449; a=rsa-sha256; cv=none; b=SuuVdAte+CdaMfTNUDpPOJtiXZ/vTJiOsXGkkNkNnjVPOsFpaR601UIgRFM7fKSqFqb72t rloWSuAUZRhxUNp6P7lZzoxfbbHmzYkHRyouTsGXohe4u3FfMNkZTam118KSNnic32k+4m PMNFCeV+r+AQmUDpRcViCqs35nnKA1A= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="a/gZc1hv"; spf=pass (imf26.hostedemail.com: domain of 3QF2mZAgKCL8xmfpjjqglttlqj.htrqnsz2-rrp0fhp.twl@flex--shakeelb.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3QF2mZAgKCL8xmfpjjqglttlqj.htrqnsz2-rrp0fhp.twl@flex--shakeelb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688624449; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=054dwjt9BZCou3BvPjMRnfFzZ2wjHCy2lrA2F8To2Yk=; b=0IDkMBML0lGECjzpAOaWZiPsOkE2PFcydK3r2sF/FrbL/VmABS6GwyuuZRSnFamjSBTXxC 5ooy/Qj66uUMnzFbwfS62YhNkTftGygY4yg5TaX6sahiC2RfoDJX9/jPy4Kn8KGrorZwqj BqbKW5x+V3tGXR2ueTGTrRXq1FHmqJE= Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-5578082558eso390859a12.1 for ; Wed, 05 Jul 2023 23:20:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688624448; x=1691216448; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=054dwjt9BZCou3BvPjMRnfFzZ2wjHCy2lrA2F8To2Yk=; b=a/gZc1hvT7JRiiUYzG1SPVOHiHFMZ5cv81dBONAX0K6rQb/SqodZ/lLZoD6G628ExP YAd7fHKCO8Uce6GfN3NdeaykplGQ29jTP1s1duFpDpYznOqIoHbToAyL5+/e4rjop4px RnYjO1Dq2V76xE62yjTCc5gppCq4Ilyvg6B+wC/at9D5beVGj/dSGIKww8CivxZHgMPs xUAr63iI0RUxB/1CrizQ0kDz4z+hUrxNojnH+C506j0HK6jQWpTT4OiCsEUo2etb1Qaj qq9nOEvhORD0DyWKW/sSLHCUa79qo5POjO0faDZksJV99qlPER3hM1mrmWsqRtzP1voK kuCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688624448; x=1691216448; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=054dwjt9BZCou3BvPjMRnfFzZ2wjHCy2lrA2F8To2Yk=; b=TptTAS4Y8rysANPfDFQ5KA8Dewr/cAtxssSrGd9zOTXG+soIeHIwgOoeUd24YCf0fJ DwOEt7MhtKhTFRW81kUMpXvlm7jGgjXb5LMqcpBLgExAqmjyFtMpzWsrnB/4JbbSWeeC 9vZLV76h4KJx7Pm+gjzkrww/vKX7tiqNuZJKefWTRzpBe2l+wdpVl/bTZtbvGY1VVxBo WYyZ840/AezJAlnS/lebPUTTOyXCM8rfyQjC9V+UCD6NqLurJ2UZJfM7kcWPyP7Ae0Lv LFx7Q6KGZcbsTdoWnAZv4lsYZvdPaz+ZXoWwLRD7iTUqRUPa3YVH9XU+rrBWmQy0y2YF kkAg== X-Gm-Message-State: ABy/qLbdgGD2TrRe0zA90AkCMmufNQoIYpwXa1Z23TzsOyhPeylZAc+9 Q3PUTtAyAjBDIxM+dSrLEx30N+ucpIYdcw== X-Google-Smtp-Source: APBJJlFzXEYN6IPpBFkmO+8r0LKe3KxYt8mAuNjzTYJLriYpEm63md407jdKnq12s3zBrfw6dE9T/SconNIdSw== X-Received: from shakeelb.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:262e]) (user=shakeelb job=sendgmr) by 2002:a63:751d:0:b0:553:8668:dc40 with SMTP id q29-20020a63751d000000b005538668dc40mr509175pgc.6.1688624448157; Wed, 05 Jul 2023 23:20:48 -0700 (PDT) Date: Thu, 6 Jul 2023 06:20:45 +0000 In-Reply-To: Mime-Version: 1.0 References: Message-ID: <20230706062045.xwmwns7cm4fxd7iu@google.com> Subject: Re: Expensive memory.stat + cpu.stat reads From: Shakeel Butt To: Ivan Babrou Cc: cgroups@vger.kernel.org, Linux MM , kernel-team , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , linux-kernel Content-Type: text/plain; charset="us-ascii" X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 5F382140015 X-Stat-Signature: 9xxekuwrzbcrd1ij5wr1brxk45xzo1cq X-Rspam-User: X-HE-Tag: 1688624449-235887 X-HE-Meta: U2FsdGVkX18WaBhLDKOG/0lk60HSffc1/OHOV4YgnN4DRKumlMXVKXIe+M8Ltb+O/yAeTkeui902GpUe05GsyS2hQiXeNC8S70w67IIY3f764rNaWnpTidnVpz93AgjnDv+p1sjunLKmWoPSwzI7bvMGQ9LIKbI8W4pYUaKWlM3xK9HYI4/1flsLzPL1cw6qfIAKcDwhsAVUoVLKC9+9yR14dbB+YaKogITzKqzhQSaFQeIua/DhueSw8xoH4rvhPLieXxVJKHfyLBnZ2Hx2D/RFwaub4MjK2Ctd67Uhm9zcb6mqv0vX0RKVKpf+Bh+vDmE3eUiG1oBIGBIxZMlBqyYTRYnDfQCzIt6+oi3tWHjV1Gw2bZpDXL1McYoeUZvzum6XFO+Z/Q96Ke1gyyqpFTMsqqOzKsD5xNF1umn93N1f91vbGgPcmIwz5A1SaClny2WfMmfsoStpsNowp2Gvet0r7r2WGzjGJVbyr4upLyb2NnO/+TeL2ufmBLnJw5DVrsgDJQzxU3ImNySjs7IMrquoHKeEQFZ0BGsy7+qqY5Zi4gBgPRvl8UktmNhiq+0zwC2MhJghYmEPjIGMCM2bCF1ZB6LrTznJJSb+/D7LVyH+T9ypr8xScus2fajlSvS63qwj4uO4zY74pJu7RD5R8ZeA9ogkPSqbXWq2Y1zXJUP9DunXHzyEcif1tykhv26IH327f8aDPDNns3BXd8C95l0DOptec99aXyzYa8egyMg2w2wmGz40WQV/eHmhxmk/HRp1Kt4AWsO2dyGPn0ffGbLRBoXIXgJPOm72vbIM6uZQBigzYB2pLwDgWy/aig1P+L7JU1v6imkXYni79Yn9yydUdzjazRZ4hjagyyyJNsUMn327w9bG8YSPdolG1GlVXqKdeQO4POvAxixBoEPejBDQ7QfROesnu8VxU2EoM16nCih5Eni9Vg+Hk0JrD/LXR1dsKDcGg5HlJkwMNXw 3/Eguoo6 WVQLvKoR3afpou92JaEPu3dNBpvP5h9ZeDAdSzqr3GHMcKeuwx4lcCZUhcdqYmGfNmtl56NXMNFNI2JrK2cLKQDjC4x0fVO5EDmegpnE+Ah9Fljgp5L0tNGYu4yyyzCLWKMefcWo3U0947Y8V55NMOZQO7+DvnnikX1gyJqRxjU0XqAESY7rSBgaUUh+3w/+aJ9In9pzvtnBmdJ/upBkmrD4uqH+G5IxyGB/JGsjdo3Z8tywRpkrTN0jLkl7r3TinrsY52F37IIyEKvx/UG7oqI3vZOnMa5B0X4hNuqG1dj0aey7wIMJKRBAyqt4OmW5VP4CbKGr3BOeK0+LnCzDqfrpp0pYSbQBKrSwsby9RtqYdz5+YpKmJRbApfVMWi+QXTwAGkwjwMqXradHvpile5sXqx0sQoILjZZCKlBoBFbQLu92k5BpfDd6b9G1WbvHyeA8hPptRwTia/e5HHwt5F3BeNW5S4pX1AzQSWXjKa8IsApJ91EE8GQVcjosb33Ij2Sgii1grrwLUyf4fGSMpSk3CRswfEZcMLq4Xqf/CavbqDXd0PrDABYvrMdBZVGXefRsZD5nkOveiMsTF0XrP+RnU9lNfIHq6Rf+ysLL9auu2DMIyWSMkQu64bA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000099, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jun 30, 2023 at 04:22:28PM -0700, Ivan Babrou wrote: > Hello, > > We're seeing CPU load issues with cgroup stats retrieval. I made a > public gist with all the details, including the repro code (which > unfortunately requires heavily loaded hardware) and some flamegraphs: > > * https://gist.github.com/bobrik/5ba58fb75a48620a1965026ad30a0a13 > > I'll repeat the gist of that gist here. Our repro has the following > output after a warm-up run: > > completed: 5.17s [manual / mem-stat + cpu-stat] > completed: 5.59s [manual / cpu-stat + mem-stat] > completed: 0.52s [manual / mem-stat] > completed: 0.04s [manual / cpu-stat] > > The first two lines do effectively the following: > > for _ in $(seq 1 1000); do cat /sys/fs/cgroup/system.slice/memory.stat > /sys/fs/cgroup/system.slice/cpu.stat > /dev/null > > The latter two are the same thing, but via two loops: > > for _ in $(seq 1 1000); do cat /sys/fs/cgroup/system.slice/cpu.stat > > /dev/null; done > for _ in $(seq 1 1000); do cat /sys/fs/cgroup/system.slice/memory.stat > > /dev/null; done > > As you might've noticed from the output, splitting the loop into two > makes the code run 10x faster. This isn't great, because most > monitoring software likes to get all stats for one service before > reading the stats for the next one, which maps to the slow and > expensive way of doing this. > > We're running Linux v6.1 (the output is from v6.1.25) with no patches > that touch the cgroup or mm subsystems, so you can assume vanilla > kernel. > > From the flamegraph it just looks like rstat flushing takes longer. I > used the following flags on an AMD EPYC 7642 system (our usual pick > cpu-clock was blaming spinlock irqrestore, which was questionable): > > perf -e cycles -g --call-graph fp -F 999 -- /tmp/repro > > Naturally, there are two questions that arise: > > * Is this expected (I guess not, but good to be sure)? > * What can we do to make this better? > > I am happy to try out patches or to do some tracing to help understand > this better. Hi Ivan, Thanks a lot, as always, for reporting this. This is not expected and should be fixed. Is the issue easy to repro or some specific workload or high load/traffic is required? Can you repro this with the latest linus tree? Also do you see any difference of root's cgroup.stat where this issue happens vs good state? BTW I am away for next month with very limited connectivity, so expect slow response. thanks, Shakeel