From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84E43CA0EC6 for ; Mon, 11 Sep 2023 22:17:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1CD346B00A4; Mon, 11 Sep 2023 18:17:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 17D716B00A5; Mon, 11 Sep 2023 18:17:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0452B6B00A6; Mon, 11 Sep 2023 18:17:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EA3AF6B00A4 for ; Mon, 11 Sep 2023 18:17:11 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B8372140BAB for ; Mon, 11 Sep 2023 22:17:11 +0000 (UTC) X-FDA: 81225728262.26.CCE9E73 Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by imf07.hostedemail.com (Postfix) with ESMTP id CD26A40025 for ; Mon, 11 Sep 2023 22:17:09 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RZKQd2ES; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694470630; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LEy15wwFS0JVFa3fWkOLjP4htEcsy3rcKHhw++IpAZ4=; b=b+p1bnt5FOdOg/vGWP+dloUSHrBfhAODYSesoRYaAL/9C7UE20igstmM0YJ6zAtBiA+SVR cYJz1AqPW5BGM1y6JAc4NJeucGY57apD9aJImpwBog82l8G/IO4R217l30eO+ONIH/SVsS B91E0L85dDt1nRytw1b1U9AsgRVJJns= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694470630; a=rsa-sha256; cv=none; b=g+PEpsjovF+ZanBxLRfLcf5ugZO7pTjXfYHkMAVPKqGa212JEViGDs7KO1cyNXLCabdVxX YyoYIVTP/CtxCotJQ0hTg7c103cs1WS2XdYEYK4ndIwaYA5YkvGz2XY0TgAIUL8x1UYri1 0wG3H4ssXBMn9pIfj9bLVCi8NFtIDHg= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RZKQd2ES; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-502934c88b7so7881049e87.2 for ; Mon, 11 Sep 2023 15:17:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694470628; x=1695075428; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LEy15wwFS0JVFa3fWkOLjP4htEcsy3rcKHhw++IpAZ4=; b=RZKQd2ESO79bY+3QVuJhYNH6TsmplCjYDJok+kMbWg7d8nUyQHTTGCvzpvMq1BGWVS pDkmxXPspXTh8rPXmau3MAMvX2rLjZ7DOGuFyI18/4NhNlcjVfVZdjKHfN8r9oSDZ0To whPChqDoDixCiWxEH5K93uFPtTy2jOuQdJsfQLbfOFuwEQhgc2Q1CHpEPZi3ck+f9U6i 5jzoVxoWXGGCD3AR/WPFmGlkvAgpnnoUOzXm8QvB8AThwYVbiLuK44CojeYfQX11J1I8 ZAy/4RAGsz52wzk60E6OVJ3KbKVnW7cAGXIrncIajko6NaV++rvFmJU3Wmy6CkDVUT2F GRvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694470628; x=1695075428; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LEy15wwFS0JVFa3fWkOLjP4htEcsy3rcKHhw++IpAZ4=; b=hzFa7EanvKcBbJ8g5OiKHWfBQ0/J1duuf+hkXV3X79aCrzpIJYdfo3SykO0dw+1SLk zuMM9NugXCco61KWqP5nd86SEpAol3JSpdt0lx6YQIDfqAKcNr5gGrosIdfATgNu5A7Z EWRJKnhzJ38vjbO0zD3FoP0hRHHIPHUn07p7tavpiG9ibHcRsYiMtndDRthc6RIcSSjt ia1s9BCdW6J6iTiVlAlHkraBR5tw6/fgCEATwMmMdeqLCqfhaWBif0DJjPloj1pFeiw4 xUBZ62r/OTZmsvJm/nBDW/SzgVrTb1KjTTVrotaDIydnKqKH4XvuyDyflXHT2iSw0jr1 Ymjg== X-Gm-Message-State: AOJu0YwA8YOI5dGbkfXsQprfU6r6mAqIIDHFn2EUPV1Z67sBQKU+72kB 7XhBZFTHYdTaE1GhqErQI/fK8Wmm05DPhWXzigJbrg== X-Google-Smtp-Source: AGHT+IEEhJ2Oc9vOdd9yQHK2XjvJLQAIQ5mVHzAHMG71Zg+fdF4XuEjhFW8bg7PI+LY13J/6L+07IuGYjk1H1eIh+eI= X-Received: by 2002:a05:6512:1094:b0:500:7c51:4684 with SMTP id j20-20020a056512109400b005007c514684mr11645309lfg.56.1694470627685; Mon, 11 Sep 2023 15:17:07 -0700 (PDT) MIME-Version: 1.0 References: <20230911075437.74027-1-zeil@nebius.com> <20230911075437.74027-2-zeil@nebius.com> In-Reply-To: <20230911075437.74027-2-zeil@nebius.com> From: Yosry Ahmed Date: Mon, 11 Sep 2023 15:16:28 -0700 Message-ID: Subject: Re: [RFC PATCH 1/3] cgroup: list all subsystem states in debugfs files To: "Yakunin, Dmitry (Nebius)" Cc: "cgroups@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , NB-Core Team , "tj@kernel.org" , "hannes@cmpxchg.org" , "mhocko@kernel.org" , Konstantin Khlebnikov , Andrey Ryabinin Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: CD26A40025 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: j15miygmsp3jyepds7cx1yyujs966tkr X-HE-Tag: 1694470629-646452 X-HE-Meta: U2FsdGVkX1/NoqT7EKpDlfNIAnvRAQDVmfnQ7rB+LP6s92TuttAktkBc/slJ5yFeR6dfP8VHoy7cN0oovTPoMDiumcLTVnjbR+kVvXrjBBp6phCOWuVC3eo6yHi4E8himOPQR/JfzhFECCQZjXexNwo6EzFmpGJU+TSoNcxv6qHogn0hM1IsPqFKF4LduEs5IAVtdSATGRUqRmZupfhOi3u/1eAjzWy6ImJYgTR89BGi65ScvzEQRV8V3wP8nIltyPraqWTN/rmDouPZ09Y++9jGRYG37N9IUsic/OuutrL0vNdBywLHAubMmxywWGnzBdQxAehA4hbYj/fx8jNjVIt21ElvVlj7vJol1blxU3ZAqUvGSV7Y2OxKIeqHSASRScjkxyI2TqWDLxWI1GATf4dFZ1CnrN5K2Fetc83Zo4lC+8ZS3Ndb28dDXO4rU6EwrpuCkYGzavEMLA75oPvD3CkE43qg56KIV0Kgvzmo060Zt85fF+eEpBnceWOP9CtWyrGbaXSDTbCqNlY8KJDm4lgVPr6wjyau8vP4exZQAgmvtl+WDVlrUHaijKdkHD0zqYrNAjHZXWn+BPrZsAN6sIFr0npwSghfTJ19LfIst04GzvMq8HDCIWf5EKxubF67KRw/cCrsXmUmL65Vil9r6mfEEO9a9OW+VUE2HHRrrNMVurKIAEsQXhKTRg9BFbe5VeskyzcLk5JP04OKDFlmwuoRz0gc/BJcCyGWwKcEyTpfdk7DLKQ9fNGwHviq1HU8yXLpW8YUDegCYQNsK5o08xGK44a/75OT5WgcwRTQCfb4Z/ALmpySC0uCRaTPq1KaOr+YDqHQLWTY3oarggfHI3Qj4E+jdkytVSu2ZAchKJUI6zz3D/ott6ZgTuVB9MDkc8uJP590eNMgu09p5LkdyoR4KmPpvSMfyzaRca2MzbQ8gtnRJ74NYFjUIaEYs+doz8jxdzJnm5hBAhHji+i hReryB+5 eiLLYW+4441Whi/QIjDYcfuhcY3bAE2F1eZwAVxe+xOENFxcGXiaWpaR9ZRdW5J+pI4Fyz7z8CaOsmqCfp5vImQL6bg08Yl1YCwh+nIqunPTf5V4dw1wi/MIhDWOcpM04gDV0F/Xg2+iPfOa4iiCyzY515SONt0xMmtKbLIivGpmbffuvBYKkrcXDlRQUq0mjGMvYLfIi7CMJaQqxRIrkfV2PVb1cqf1ETtjnjQFU5tU8OawGBJgw9BUP7vfldGvAb50jB0xfSh0M2Yxp9MmAPV3rRzYPyLGYpL63IM1HNomJpdg3aEELtzhHVTp0fnlbQV0HkeftbjvEfN16IsTa6r5C0yVSPmOGLiYtXrjMznLhsnQdxRHxdUnU/oFDItpYE8rKkHSs9mlusIVvzpuJ42J8EnJLp0CRqgjcMvXRUbZvPf7MYDNueBJgv/ksExeCysTJaaMICAINAaVQ0X83XX3kK+VK7j0GvRFRRMSDyE9pBnw4R0jqrkjP+FBFKoLz8XHQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 11, 2023 at 12:55=E2=80=AFAM Yakunin, Dmitry (Nebius) wrote: > > After removing cgroup subsystem state could leak or live in background > forever because it is pinned by some reference. For example memory cgroup > could be pinned by pages in cache or tmpfs. > > This patch adds common debugfs interface for listing basic state for each > controller. Controller could define callback for dumping own attributes. > > In file /sys/kernel/debug/cgroup/ each line shows state in > format: =3D... [-- =3D... ] > > Common attributes: > > css - css pointer > cgroup - cgroup pointer > id - css id > ino - cgroup inode > flags - css flags > refcnt - css atomic refcount, for online shows huge bias > path - cgroup path > > This patch adds memcg attributes: > > mem_id - 16-bit memory cgroup id > memory - charged pages > memsw - charged memory+swap for v1 and swap for v2 > kmem - charged kernel pages > tcpmem - charged tcp pages > shmem - shmem/tmpfs pages > > Link: https://lore.kernel.org/lkml/153414348591.737150.142299609139532765= 15.stgit@buzz > Suggested-by: Konstantin Khlebnikov > Reviewed-by: Andrey Ryabinin > Signed-off-by: Dmitry Yakunin FWIW, I was just recently working on a debugfs directly that exposes a list of all zombie memcgs as well as the "memory.stat" output for all of them. This entails a file at /sys/kernel/debug/zombie_memcgs/all that contains a list of zombie memcgs (with indentation to reflect the hierarchy) and an id for each of them. This id can be used to index per-memcg directories at /sys/kernel/debug/zombie_memcgs//, which include debug files. The only one we have so far is /sys/kernel/debug/zombie_memcgs//memory.stat. If there is interest in this, I can share more information. > --- > include/linux/cgroup-defs.h | 1 + > kernel/cgroup/cgroup.c | 101 ++++++++++++++++++++++++++++++++++++ > mm/memcontrol.c | 14 +++++ > 3 files changed, 116 insertions(+) > > diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h > index 8a0d5466c7be..810bd300cbee 100644 > --- a/include/linux/cgroup-defs.h > +++ b/include/linux/cgroup-defs.h > @@ -673,6 +673,7 @@ struct cgroup_subsys { > void (*exit)(struct task_struct *task); > void (*release)(struct task_struct *task); > void (*bind)(struct cgroup_subsys_state *root_css); > + void (*css_dump)(struct cgroup_subsys_state *css, struct seq_file= *m); > > bool early_init:1; > > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index 625d7483951c..fb9931ff7570 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -40,6 +40,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -7068,3 +7069,103 @@ static int __init cgroup_sysfs_init(void) > subsys_initcall(cgroup_sysfs_init); > > #endif /* CONFIG_SYSFS */ > + > +#ifdef CONFIG_DEBUG_FS > +void *css_debugfs_seqfile_start(struct seq_file *m, loff_t *pos) > +{ > + struct cgroup_subsys *ss =3D m->private; > + struct cgroup_subsys_state *css; > + int id =3D *pos; > + > + rcu_read_lock(); > + css =3D idr_get_next(&ss->css_idr, &id); > + *pos =3D id; > + return css; > +} > + > +void *css_debugfs_seqfile_next(struct seq_file *m, void *v, loff_t *pos) > +{ > + struct cgroup_subsys *ss =3D m->private; > + struct cgroup_subsys_state *css; > + int id =3D *pos + 1; > + > + css =3D idr_get_next(&ss->css_idr, &id); > + *pos =3D id; > + return css; > +} > + > +void css_debugfs_seqfile_stop(struct seq_file *m, void *v) > +{ > + rcu_read_unlock(); > +} > + > +int css_debugfs_seqfile_show(struct seq_file *m, void *v) > +{ > + struct cgroup_subsys *ss =3D m->private; > + struct cgroup_subsys_state *css =3D v; > + /* data is NULL for root cgroup_subsys_state */ > + struct percpu_ref_data *data =3D css->refcnt.data; > + size_t buflen; > + char *buf; > + int len; > + > + seq_printf(m, "css=3D%pK cgroup=3D%pK id=3D%d ino=3D%lu flags=3D%= #x refcnt=3D%lu path=3D", > + css, css->cgroup, css->id, cgroup_ino(css->cgroup), > + css->flags, data ? atomic_long_read(&data->count) : 0)= ; > + > + buflen =3D seq_get_buf(m, &buf); > + if (buf) { > + len =3D cgroup_path(css->cgroup, buf, buflen); > + seq_commit(m, len < buflen ? len : -1); > + } > + > + if (ss->css_dump) { > + seq_puts(m, " -- "); > + ss->css_dump(css, m); > + } > + > + seq_putc(m, '\n'); > + return 0; > +} > + > +static const struct seq_operations css_debug_seq_ops =3D { > + .start =3D css_debugfs_seqfile_start, > + .next =3D css_debugfs_seqfile_next, > + .stop =3D css_debugfs_seqfile_stop, > + .show =3D css_debugfs_seqfile_show, > +}; > + > +static int css_debugfs_open(struct inode *inode, struct file *file) > +{ > + int ret =3D seq_open(file, &css_debug_seq_ops); > + struct seq_file *m =3D file->private_data; > + > + if (!ret) > + m->private =3D inode->i_private; > + return ret; > +} > + > +static const struct file_operations css_debugfs_fops =3D { > + .open =3D css_debugfs_open, > + .read =3D seq_read, > + .llseek =3D seq_lseek, > + .release =3D seq_release, > +}; > + > +static int __init css_debugfs_init(void) > +{ > + struct cgroup_subsys *ss; > + struct dentry *dir; > + int ssid; > + > + dir =3D debugfs_create_dir("cgroup", NULL); > + if (dir) { > + for_each_subsys(ss, ssid) > + debugfs_create_file(ss->name, 0644, dir, ss, > + &css_debugfs_fops); > + } > + > + return 0; > +} > +late_initcall(css_debugfs_init); > +#endif /* CONFIG_DEBUG_FS */ > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4b27e245a055..7b3d4a10ac63 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5654,6 +5654,20 @@ static void mem_cgroup_css_rstat_flush(struct cgro= up_subsys_state *css, int cpu) > } > } > > +static void mem_cgroup_css_dump(struct cgroup_subsys_state *css, > + struct seq_file *m) > +{ > + struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); > + > + seq_printf(m, "mem_id=3D%u memory=3D%lu memsw=3D%lu kmem=3D%lu tc= pmem=3D%lu shmem=3D%lu", > + mem_cgroup_id(memcg), > + page_counter_read(&memcg->memory), > + page_counter_read(&memcg->memsw), > + page_counter_read(&memcg->kmem), > + page_counter_read(&memcg->tcpmem), > + memcg_page_state(memcg, NR_SHMEM)); > +} > + > #ifdef CONFIG_MMU > /* Handlers for move charge at task migration. */ > static int mem_cgroup_do_precharge(unsigned long count) > -- > 2.25.1 > >