From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A2FFC433F5 for ; Mon, 28 Mar 2022 09:22:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A59868D0002; Mon, 28 Mar 2022 05:22:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A09D58D0001; Mon, 28 Mar 2022 05:22:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D1C98D0002; Mon, 28 Mar 2022 05:22:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0001.hostedemail.com [216.40.44.1]) by kanga.kvack.org (Postfix) with ESMTP id 77FCD8D0001 for ; Mon, 28 Mar 2022 05:22:44 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 32DA418288A4F for ; Mon, 28 Mar 2022 09:22:44 +0000 (UTC) X-FDA: 79293255048.20.013ECE1 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf18.hostedemail.com (Postfix) with ESMTP id AE61D1C003E for ; Mon, 28 Mar 2022 09:22:43 +0000 (UTC) Received: by mail-pj1-f54.google.com with SMTP id bx24-20020a17090af49800b001c6872a9e4eso14856739pjb.5 for ; Mon, 28 Mar 2022 02:22:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tH6vHR9kCH+NPukJnjAaBd+mpjJ8sCuBLDfcwcVYg4I=; b=Gomwsgqcc5G4y4sCHxDYeBsL0kU62rY/xNhOxwhAcqftO8/kCgQ5J57BFSIgnhQiGf ubMotdKqupuh5Hzae+8D7+mnC+/IdK+hfC8qgCZkggge0kLvPM9ebDt8Yj22oguNZdQ9 N6xNkTeXk0uFjLWtc7v8+UVuym6t96OzehARRawtd4H8hmQ9ctBNvjcgJc1T7hFlOLGC CMWEVKiehCtlseLN/+E+Hf3/4U+ho3QOgIdFR03zUb757cTkMPVui+WdgyVO1UL+mugy jRue3rN1i5bcrnohe8NFKWShUw9VGbbEJcuJSdOwfi250k4QUQsJXeWf/WJYwWg9vb8/ j1bQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tH6vHR9kCH+NPukJnjAaBd+mpjJ8sCuBLDfcwcVYg4I=; b=R7U2wZlIqp01TiR2chHKfAvTF8gbZLyxwvUfxA8DDzix93vCVsnBL4wjoGaTtpSBr9 hC9Jlh0DahqJuVHzcKfxfBmpg8ICGBL8KMrk2g1L/OkudRvjtIuO0z+dBsTJXrusPAcK g0ZBtHI/CwigEAEqYcC4racGIKd2fmEvusrDMw+2X4xpDQmhnL0J2tszJlSZGlVEQfuR PT59aQB8wPkaGajWkkSQS8fEoICTPGKnzek9Z42pD89PN2GFV46DCJ3t71WKV+PeyboH do+OjfXR1QZOh+va/r2+0W4bPLIdBDINni/xlVqcfMd+rK6od5JqyzW7PP6Sz5yrBg35 +21g== X-Gm-Message-State: AOAM530lGpafSDhAWhieAwWhzAm82/D4feNmUnpz7V5ZlcmkTK4aQZ47 VK25DGoULrQN4kMvwWWbC0Cr7HAXzXJU9MdgGWmoRg== X-Google-Smtp-Source: ABdhPJxWUfGIn8epdRIti+VLoKKHJZ2mD8kQKlwwmj143rS8qojVUag82QRaAB0zL9zm9z1GZWz4dgUUB+ZXX5hJLV4= X-Received: by 2002:a17:902:b902:b0:154:bb05:ddb9 with SMTP id bf2-20020a170902b90200b00154bb05ddb9mr25759949plb.14.1648459362481; Mon, 28 Mar 2022 02:22:42 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yosry Ahmed Date: Mon, 28 Mar 2022 02:22:06 -0700 Message-ID: Subject: Re: [RFC bpf-next] Hierarchical Cgroup Stats Collection Using BPF To: Yonghong Song Cc: Tejun Heo , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Johannes Weiner , Hao Luo , Shakeel Butt , Stanislav Fomichev , David Rientjes , bpf , KP Singh , cgroups@vger.kernel.org, Linux-MM Content-Type: text/plain; charset="UTF-8" X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Gomwsgqc; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: AE61D1C003E X-Stat-Signature: dk4y4ufyh6yhbeaodu41jiup3tpaju6b X-HE-Tag: 1648459363-640549 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 22, 2022 at 11:09 AM Yonghong Song wrote: > > > > On 3/16/22 9:35 AM, Yosry Ahmed wrote: > > Hi Tejun, > > > > Thanks for taking the time to read my proposal! Sorry for the late > > reply. This email skipped my inbox for some reason. > > > > On Sun, Mar 13, 2022 at 10:35 PM Tejun Heo wrote: > >> > >> Hello, > >> > >> On Wed, Mar 09, 2022 at 12:27:15PM -0800, Yosry Ahmed wrote: > >> ... > >>> These problems are already addressed by the rstat aggregation > >>> mechanism in the kernel, which is primarily used for memcg stats. We > >> > >> Not that it matters all that much but I don't think the above statement is > >> true given that sched stats are an integrated part of the rstat > >> implementation and io was converted before memcg. > >> > > > > Excuse my ignorance, I am new to kernel development. I only saw calls > > to cgroup_rstat_updated() in memcg and io and assumed they were the > > only users. Now I found cpu_account_cputime() :) > > > >>> - For every cgroup, we will either use flags to distinguish BPF stats > >>> updates from normal stats updates, or flush both anyway (memcg stats > >>> are periodically flushed anyway). > >> > >> I'd just keep them together. Usually most activities tend to happen > >> together, so it's cheaper to aggregate all of them in one go in most cases. > > > > This makes sense to me, thanks. > > > >> > >>> - Provide flags to enable/disable using per-cpu arrays (for stats that > >>> are not updated frequently), and enable/disable hierarchical > >>> aggregation (for non-hierarchical stats, they can still make benefit > >>> of the automatic entries creation & deletion). > >>> - Provide different hierarchical aggregation operations : SUM, MAX, MIN, etc. > >>> - Instead of an array as the map value, use a struct, and let the user > >>> provide an aggregator function in the form of a BPF program. > >> > >> I'm more partial to the last option. It does make the usage a bit more > >> compilcated but hopefully it shouldn't be too bad with good examples. > >> > >> I don't have strong opinions on the bpf side of things but it'd be great to > >> be able to use rstat from bpf. > > > > It indeed gives more flexibility but is more complicated. Also, I am > > not sure about the overhead to make calls to BPF programs in every > > aggregation step. Looking forward to get feedback on the bpf side of > > things. > > Hi, Yosry, I heard this was discussed in bpf office hour which I > didn't attend. Could you summarize the conclusion and what is the > step forward? We also have an internal tool which collects cgroup > stats and this might help us as well. Thanks! > > > > >> > >> Thanks. > >> > >> -- > >> tejun Hi Yonghong, Hao has already done an excellent job summarizing the outcome of the meeting. The idea I have is basically to introduce "rstat flushing" BPF programs. BPF programs that collect and display stats would use helpers to call cgroup_rstat_flush() and cgroup_rstat_updated() (or similar). rstat would then make calls to the "rstat flushing" BPF programs during flushes, similar to calls to css_rstat_flush(). I will work on an RFC patch(es) for this soon. Let me know if you have any comments/suggestions/feedback. Thanks!