From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22804CFC273 for ; Tue, 15 Oct 2024 08:03:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A78F6B0085; Tue, 15 Oct 2024 04:03:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 632716B0088; Tue, 15 Oct 2024 04:03:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 399FF6B0089; Tue, 15 Oct 2024 04:03:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1032A6B0085 for ; Tue, 15 Oct 2024 04:03:35 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BDB8B140D79 for ; Tue, 15 Oct 2024 08:03:25 +0000 (UTC) X-FDA: 82675096728.17.7921B47 Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf16.hostedemail.com (Postfix) with ESMTP id A46B418000D for ; Tue, 15 Oct 2024 08:03:25 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XAtY4Byz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728979307; a=rsa-sha256; cv=none; b=g7vZlE+MwR5HJJDSdnuesA867tVy83z8RtTrT7WMuhTQ9spGUrSQarq6dElw93ZlQk/cBS DhEqjN6pc6hwSygColEO6+OzszhWbuUun3jnezvq8SQMipVRcj6JFvXPdpIyIX459IlRfM O6FCK0rAoSgPWOaXGk5pUv2tYPATHD4= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XAtY4Byz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728979307; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N3sV+MUjmkm/3ouMm/VZqChWo6gbDHYg+6Mp7oGExyw=; b=H4IgJrq4XB5QPE3rNKaxDlk9BnBm2Z1kqKII7R7+SyrNkE9QB1wMt4Z+XWiWqNPdHhV6oP Casiq9LnagvxQxZekcJjHaMqMFIokx/e7+W0rb26YI+iDaDyOEQIZfjLg85FZsq9H2ZCQq yBR90Dzk09LsU7K4iamvpfPGNTWTTBE= Received: by mail-lf1-f49.google.com with SMTP id 2adb3069b0e04-539e7e73740so2156742e87.3 for ; Tue, 15 Oct 2024 01:03:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728979411; x=1729584211; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=N3sV+MUjmkm/3ouMm/VZqChWo6gbDHYg+6Mp7oGExyw=; b=XAtY4Byzo1CBfMcoTvkoVxF/kdChZ1lToxisDuuRy6kV6UKXaKRBJRO1Mu+aW6bZaI YEcXmbn+aLdSF5cFbq5Jx/TyTMbN0x1iPrxoXENrlC/JDaLxGV45MR9LCVO7bT52xsx0 Chjs3pq/CimwZYiWZmpMICe7OlYKcGwpxbDH8sezBv8NgmAs6CYbxowDETbspBX9ppMt LWOOHMss4nTDLDsPj5Qv6ZbUqF4ixGKrWKD3qe5T8/Fky36nO8bMOz/8BW5+voRjNmz1 JsoXjOVqBNVRHjCSXm9xRkZuNifJieNdF/3AmoLfH3P1/OfXTq+4WUHcZhxz5md5xw38 v6OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728979411; x=1729584211; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=N3sV+MUjmkm/3ouMm/VZqChWo6gbDHYg+6Mp7oGExyw=; b=StsARSmaL5mpcqk7oxeRM8H0zSyTxMRN2kkx6Px9o6Ahw2DE2YrC1jshzRlvAUA118 ADL3RUN8kXmf5RYOVnSuR7VYm1vXEZdRhboV43RKgNQVK0NUzDHd44kXvaa/ucLX1qfo UWCN4/NV6k4f2DUukrDKoenENkY5xTX7VLYW9vcNAPpkpwPmJtmy9eBifpRnvPwgoULI XMdRunmOwrugGDyt2NBwR86vLUkGLzBjLkFxEkmbAuNJhCZR3ZXlvErL7JuH/cfCkbjC gygJ6+sbcF8Js4XpSLU+26xRNkE0jOl50N7/hvMQFpnkHo27FnWRbQtozoJ82JvTpP1A 8gUA== X-Forwarded-Encrypted: i=1; AJvYcCU3GKK/bFh7tO+P3lnO4Yaf0yO7DOfgF3WXAI9FsU/tsV6ZAjwLRF1vyAvC2B5PaVp+QJdRe1q1Pw==@kvack.org X-Gm-Message-State: AOJu0Yyj7RN8RsWJNKfKgOJcKxgDR8aENLvaKpNB6qRlxYg6JIS9zvkW Vc6BeHuAU4OirgFYiAchPZb2H1Wla8AzHWnF0iJQThn+31y5WTGIVnrqBuOYALZFdPF/GuxG4A9 IQ/T7yrZg+R4Jb85q5394ghhd3H4fWYTnMMb8 X-Google-Smtp-Source: AGHT+IFuAWCk0O3DiGt2jLIMMRZ+47uA58eN9z+L/2D1jR0fxGLSvk7WvgKy8tyjVoe9CKGvpjScJIML7omeA5rPTic= X-Received: by 2002:a05:6512:3f02:b0:536:5625:511a with SMTP id 2adb3069b0e04-539da59516cmr7172856e87.47.1728979410474; Tue, 15 Oct 2024 01:03:30 -0700 (PDT) MIME-Version: 1.0 References: <20241010003550.3695245-1-shakeel.butt@linux.dev> <20241009210848.43adb0c3@gandalf.local.home> In-Reply-To: From: Yosry Ahmed Date: Tue, 15 Oct 2024 01:02:52 -0700 Message-ID: Subject: Re: [PATCH] memcg: add tracing for memcg stat updates To: Daniel Xu Cc: Shakeel Butt , Steven Rostedt , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , JP Kobryn , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team , bpf@vger.kernel.org, Martin KaFai Lau Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: A46B418000D X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: eotfx166r638euh97ob9kuna4wt3q7np X-HE-Tag: 1728979405-989061 X-HE-Meta: U2FsdGVkX1+5IgNg16wyq1q9Z4FQeks2xg+RunLJCXLy8fW/uZ+hHNjv9aTKPrU21y48M2Q9zsqCSJWXVxViWO1/CPL5kPBkw3WOS5FWqYyzhY6n/0eWwOQiau4+31xV1XJESasmBZnJNj5tcQe4yZkZB3MJoHVSX44kUoBpbK1wJbHgfxXUMY/5r8xtK/147JKdM7Rqe+xImpYJfNvhQ2FY2b4w6yPvxjpH0NGTOSrqjUr1Z/PRgV+O0eAqXq0Y2cC7DJhEC5/zNuT2oKYEutdTrQtaPKBaFYI42Ga55NivXCNl2eS0YIBfZ5I2J/4pYRKGAZkiAwcnMANoYruaZWYvd86HhJXT9z+eJ6kvbGNDI95pK3AAZHyUFiAA6ElV89O52bbOM7v+ZeNiScPPSlohq+FHcUcwDGr/7y9tVhIT4SEiopGuMaFWidHycmGwfj3PAuWQdcAA9eGilDvu0XXMQjwgjK43mMHWdEZqMqHe8P+TQyw6mENbJrO/TRqfLXCqaVA20xHY/TguevchZU524MHYy0lyqrzpFGDd4lkTT9X30ef9fEs8V7To0O0kbJROIYb/VuaEUc9CFmzu8OxA+/hOkzUFnFF4APD0gN8cfEPDemkldQF/85PdBVNgwjs+vxAQRZ3AhlVDFPIhpTvY8YuJXieAEBuGlldIjWq0pzlB6n1uIwpf4M9nJoM3FBxpPI4ZV/vZF4eI9D48lnGndhqFWjPaYhRjX1B1KvroUa73I743yJKB4XOuyUUJRyP+e8nUUkZbl1MwYBbsb+iGy+iXshuV3qoNSxJq+ubpsHRAs31h8Sugf003CCRhwcurEkIb68SG39cjf1WDk2nLM6+joS53PpYE17NLfWEf1exZpcWBQuVHDXPnMMZKgRsvfWbgwxTAVLFl2QpK+V/gRzWfgU++0pMzDfCxM4C4ZfP88CGr+NzZqnEnj3rz/nSh9+ErqsVrVC3T9No x/BjPKUz da/URi095meC8BmNgOzS+8N0lKQkjaVVN8fMnwRsrcI+P8/3h1fStqt9k6/mC2I5C0hdXgWgyfCCExXbtk+DNvazovxz+uzAlYEkWShbdr2+mkzIAC9/D2EQmThoA9H6Wry8K4UZcbZrfsPRoFd5z9hzedfGUAwRqMqdmYGUofxftWfbb6RGxgiVSDqRc5sGMEMy7FXIzDi3OPjUh4RpJHmXkxUa9s6gh01902cNrMzcelDnXhqVQNNTA0qxvJm5L2WKC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > > > > > > @@ -682,7 +686,9 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, > > > > > > > return; > > > > > > > > > > > > > > __this_cpu_add(memcg->vmstats_percpu->state[i], val); > > > > > > > - memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val)); > > > > > > > + val = memcg_state_val_in_pages(idx, val); > > > > > > > + memcg_rstat_updated(memcg, val); > > > > > > > + trace_mod_memcg_state(memcg, idx, val); > > > > > > > > > > > > Is it too unreasonable to include the stat name? > > > > > > > > > > > > The index has to be correlated with the kernel config and perhaps even > > > > > > version. It's not a big deal, but if performance is not a concern when > > > > > > tracing is enabled anyway, maybe we can lookup the name here (or in > > > > > > TP_fast_assign()). > > > > > > > > > > What name? Is it looked up from idx? If so, you can do it on the reading of > > > > > > Does reading side mean the one reading /sys/kernel/tracing/trace will do > > > the translation from enums to string? > > > > > > > > the trace event where performance is not an issue. See the __print_symbolic() > > > > > and friends in samples/trace_events/trace-events-sample.h > > > > > > > > Yeah they can be found using idx. Thanks for referring us to > > > > __print_symbolic(), I suppose for this to work we need to construct an > > > > array of {idx, name}. I think we can replace the existing memory_stats > > > > and memcg1_stats/memcg1_stat_names arrays with something that we can > > > > reuse for tracing, so we wouldn't need to consume extra space. > > > > > > > > Shakeel, what do you think? > > > > > > Cc Daniel & Martin > > > > > > I was planning to use bpftrace which can use dwarf/btf to convert the > > > raw int to its enum string. Martin provided the following command to > > > extract the translation from the kernel. > > > > > > $ bpftool btf dump file /sys/kernel/btf/vmlinux | grep -A10 node_stat_item > > > [2264] ENUM 'node_stat_item' encoding=UNSIGNED size=4 vlen=46 > > > 'NR_LRU_BASE' val=0 > > > 'NR_INACTIVE_ANON' val=0 > > > 'NR_ACTIVE_ANON' val=1 > > > 'NR_INACTIVE_FILE' val=2 > > > 'NR_ACTIVE_FILE' val=3 > > > 'NR_UNEVICTABLE' val=4 > > > 'NR_SLAB_RECLAIMABLE_B' val=5 > > > 'NR_SLAB_UNRECLAIMABLE_B' val=6 > > > 'NR_ISOLATED_ANON' val=7 > > > 'NR_ISOLATED_FILE' val=8 > > > ... > > > > > > My point is userspace tools can use existing infra to extract this > > > information. > > > > > > However I am not against adding __print_symbolic() (but without any > > > duplication), so users reading /sys/kernel/tracing/trace directly can > > > see more useful information as well. Please post a follow up patch after > > > this one. > > > > I briefly looked into this and I think it would be annoying to have > > this, unfortunately. Even if we rework the existing arrays with memcg > > stat names to be in a format conforming to tracing, we would need to > > move them out to a separate header to avoid a circular dependency. > > > > Additionally, for __count_memcg_events() things will be more > > complicated because the names are not in an array in memcontrol.c, but > > we use vm_event_name() and the relevant names are part of a larger > > array, vmstat_text, which we would need to rework similarly. > > > > I think this would be easier to implement if we can somehow provide a > > callback that returns the name based on the index, rather than an > > array. But even then, we would need to specify a different callback > > for each event, so it won't be as simple as specifying the callback in > > the event class. > > > > All in all, unless we realize there is something that is fundamentally > > more difficult to do in userspace, I think it's not worth adding this > > unfortunately :/ > > Turned out to be quite straightforward to do in userspace: > https://github.com/bpftrace/bpftrace/pull/3515 . > > A nice property is the resolution occurs out of line and saves the > kernel some cycles in the fast path. This is really neat, thanks for doing this! Native support in bpftrace is definitely better than manually correlating the values, especially with large enums with 10s of values like the ones we are talking about here.