From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71785C433E2 for ; Tue, 15 Sep 2020 13:54:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D8C57221E3 for ; Tue, 15 Sep 2020 13:54:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="R9uJhZs2" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D8C57221E3 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 25233900057; Tue, 15 Sep 2020 09:54:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 22AA990004C; Tue, 15 Sep 2020 09:54:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13ED8900057; Tue, 15 Sep 2020 09:54:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0177.hostedemail.com [216.40.44.177]) by kanga.kvack.org (Postfix) with ESMTP id F205090004C for ; Tue, 15 Sep 2020 09:54:01 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id AB34F2AAE for ; Tue, 15 Sep 2020 13:54:01 +0000 (UTC) X-FDA: 77265439482.29.sun73_060db4527111 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 51E93180868DC for ; Tue, 15 Sep 2020 13:54:01 +0000 (UTC) X-HE-Tag: sun73_060db4527111 X-Filterd-Recvd-Size: 11944 Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Sep 2020 13:54:00 +0000 (UTC) Received: by mail-lf1-f68.google.com with SMTP id y17so3175536lfa.8 for ; Tue, 15 Sep 2020 06:54:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ePEgJYcBWrhr7muyl2siRCTK9L57Ry0nh+Btj8rtlQc=; b=R9uJhZs2koe9dinXPuA2NgCgCBbx9zPE8BC4JQmGK0zjFtHAOJ0OvmDNcE+W3yfLT+ Z1VQZKj4NIfbUGOq/I7NDMouDden3fCHbC10bIH3DRi2dlh7n1BhOKSAuDfRFm74rJEg l4nnhv501aw81OpJsEEKylFYiEwfqMqdzHLAynbWjtjy5QMMHUpiC6dvaE/Sv0wdIl00 RAjGRqncpfHC+LV8n4EcDewNqLJhVLFljplof3+F2kiPTcUng6gYw0xXiWlDS+5Le+TJ T7Zs4gicVeidpuCHbk9Y0s4P7LELaY/CZCAKtpa56e/VhSssmbocaQRwRtjNjDvydEd+ wFCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ePEgJYcBWrhr7muyl2siRCTK9L57Ry0nh+Btj8rtlQc=; b=DaR1w7L/tCbqPGcVaIB8lIp8NzsqzyPh/pOhwhCkEKERIo39QKx5UNh/EEHAdTArz9 BuZQ3wM3WwfkSkuUm7Kt96mUFRXu4FjC0orwTcLH//D6ob1Bd35pOn49js2QS/j6Jdca 6aPgZSSbA/NuCNqobjsO1HlNCGYnBsC//KVULLE4rtNYWtHfz/CHuyJ8ngqzb8g6Pd2y X+aDq4mZW9quO5t1CjRbZZg6T3spmCkRgVW2aaQwt+nmHAPPdzzcpgLGBWhXgCQSdpho qWAyx7HGq7EMd/fKDMajywedlzJ6i/Odvl0blcyNxV/fjHkUpdRhG/T+bMWAm2JSyg5X hL3Q== X-Gm-Message-State: AOAM531dIrONM64E4v9yZ7glFFkeoFQ7aJ28HtUp7dWYQnVQT7wg7XZi VK8SARoUlFeNG0GpbQ76NlT+LJRFPN2NG/8R1Rsf2DWYTT4= X-Google-Smtp-Source: ABdhPJxpL+DWyC0FoTO1YAWYVnlZAdQJ5WxGvtfVW7gwpqnEEE8vaCj5NbgMNP60cY78yTLqrL4BOkDVYACEyGq6G/w= X-Received: by 2002:a19:771d:: with SMTP id s29mr2365496lfc.521.1600178038699; Tue, 15 Sep 2020 06:53:58 -0700 (PDT) MIME-Version: 1.0 References: <20200915055825.5279-1-songmuchun@bytedance.com> In-Reply-To: <20200915055825.5279-1-songmuchun@bytedance.com> From: Shakeel Butt Date: Tue, 15 Sep 2020 06:53:47 -0700 Message-ID: Subject: Re: [PATCH v4] mm: memcontrol: Add the missing numa_stat interface for cgroup v2 To: Muchun Song Cc: Tejun Heo , Li Zefan , Johannes Weiner , Jonathan Corbet , Michal Hocko , Vladimir Davydov , Andrew Morton , Roman Gushchin , Randy Dunlap , Cgroups , linux-doc@vger.kernel.org, LKML , Linux MM Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 51E93180868DC X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 14, 2020 at 10:59 PM Muchun Song wrote: > > In the cgroup v1, we have a numa_stat interface. This is useful for > providing visibility into the numa locality information within an > memcg since the pages are allowed to be allocated from any physical > node. One of the use cases is evaluating application performance by > combining this information with the application's CPU allocation. > But the cgroup v2 does not. So this patch adds the missing information. > > Signed-off-by: Muchun Song > Suggested-by: Shakeel Butt Small nits below. Reviewed-by: Shakeel Butt > --- > changelog in v4: > 1. Fix some document problems pointed out by Randy Dunlap. > 2. Remove memory_numa_stat_format() suggested by Shakeel Butt. > > changelog in v3: > 1. Fix compiler error on powerpc architecture reported by kernel test robot. > 2. Fix a typo from "anno" to "anon". > > changelog in v2: > 1. Add memory.numa_stat interface in cgroup v2. > > Documentation/admin-guide/cgroup-v2.rst | 72 +++++++++++++++++++++ > mm/memcontrol.c | 86 +++++++++++++++++++++++++ > 2 files changed, 158 insertions(+) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 6be43781ec7f..bcb7b202e88d 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1368,6 +1368,78 @@ PAGE_SIZE multiple when read back. > collapsing an existing range of pages. This counter is not > present when CONFIG_TRANSPARENT_HUGEPAGE is not set. > > + memory.numa_stat > + A read-only flat-keyed file which exists on non-root cgroups. > + > + This breaks down the cgroup's memory footprint into different > + types of memory, type-specific details, and other information > + per node on the state of the memory management system. > + > + This is useful for providing visibility into the NUMA locality > + information within an memcg since the pages are allowed to be > + allocated from any physical node. One of the use cases is evaluating use case > + application performance by combining this information with the > + application's CPU allocation. > + > + All memory amounts are in bytes. > + > + The output format of memory.numa_stat is:: > + > + type N0= N1= ... I would remove 'pages' here as it can be confusing. Just ... > + > + The entries are ordered to be human readable, and new entries > + can show up in the middle. Don't rely on items remaining in a > + fixed position; use the keys to look up specific values! > + > + anon > + Amount of memory per node used in anonymous mappings such > + as brk(), sbrk(), and mmap(MAP_ANONYMOUS). > + > + file > + Amount of memory per node used to cache filesystem data, > + including tmpfs and shared memory. > + > + kernel_stack > + Amount of memory per node allocated to kernel stacks. > + > + shmem > + Amount of cached filesystem data per node that is swap-backed, > + such as tmpfs, shm segments, shared anonymous mmap()s. > + > + file_mapped > + Amount of cached filesystem data per node mapped with mmap(). > + > + file_dirty > + Amount of cached filesystem data per node that was modified but > + not yet written back to disk. > + > + file_writeback > + Amount of cached filesystem data per node that was modified and > + is currently being written back to disk. > + > + anon_thp > + Amount of memory per node used in anonymous mappings backed by > + transparent hugepages. > + > + inactive_anon, active_anon, inactive_file, active_file, unevictable > + Amount of memory, swap-backed and filesystem-backed, > + per node on the internal memory management lists used > + by the page reclaim algorithm. > + > + As these represent internal list state (e.g. shmem pages are on > + anon memory management lists), inactive_foo + active_foo may not > + be equal to the value for the foo counter, since the foo counter > + is type-based, not list-based. > + > + slab_reclaimable > + Amount of memory per node used for storing in-kernel data > + structures which might be reclaimed, such as dentries and > + inodes. > + > + slab_unreclaimable > + Amount of memory per node used for storing in-kernel data > + structures which cannot be reclaimed on memory pressure. > + > memory.swap.current > A read-only single value file which exists on non-root > cgroups. > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 75cd1a1e66c8..ff919ef3b57b 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6425,6 +6425,86 @@ static int memory_stat_show(struct seq_file *m, void *v) > return 0; > } > > +#ifdef CONFIG_NUMA > +struct numa_stat { > + const char *name; > + unsigned int ratio; > + enum node_stat_item idx; > +}; > + > +static struct numa_stat numa_stats[] = { > + { "anon", PAGE_SIZE, NR_ANON_MAPPED }, > + { "file", PAGE_SIZE, NR_FILE_PAGES }, > + { "kernel_stack", 1024, NR_KERNEL_STACK_KB }, > + { "shmem", PAGE_SIZE, NR_SHMEM }, > + { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED }, > + { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY }, > + { "file_writeback", PAGE_SIZE, NR_WRITEBACK }, > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > + /* > + * The ratio will be initialized in numa_stats_init(). Because > + * on some architectures, the macro of HPAGE_PMD_SIZE is not > + * constant(e.g. powerpc). > + */ > + { "anon_thp", 0, NR_ANON_THPS }, > +#endif > + { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON }, > + { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON }, > + { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE }, > + { "active_file", PAGE_SIZE, NR_ACTIVE_FILE }, > + { "unevictable", PAGE_SIZE, NR_UNEVICTABLE }, > + { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B }, > + { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B }, > +}; > + > +static int __init numa_stats_init(void) > +{ > + int i; > + > + for (i = 0; i < ARRAY_SIZE(numa_stats); i++) { > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > + if (numa_stats[i].idx == NR_ANON_THPS) > + numa_stats[i].ratio = HPAGE_PMD_SIZE; > +#endif > + } > + > + return 0; > +} > +pure_initcall(numa_stats_init); > + > +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg, > + unsigned int nid, > + enum node_stat_item idx) > +{ > + VM_BUG_ON(nid >= nr_node_ids); > + return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx); > +} > + > +static int memory_numa_stat_show(struct seq_file *m, void *v) > +{ > + int i; > + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); > + > + for (i = 0; i < ARRAY_SIZE(numa_stats); i++) { > + int nid; > + > + seq_printf(m, "%s", numa_stats[i].name); > + for_each_node_state(nid, N_MEMORY) { > + u64 size; > + > + size = memcg_node_page_state(memcg, nid, > + numa_stats[i].idx); > + VM_WARN_ON_ONCE(!numa_stats[i].ratio); > + size *= numa_stats[i].ratio; > + seq_printf(m, " N%d=%llu", nid, size); > + } > + seq_putc(m, '\n'); > + } > + > + return 0; > +} > +#endif > + > static int memory_oom_group_show(struct seq_file *m, void *v) > { > struct mem_cgroup *memcg = mem_cgroup_from_seq(m); > @@ -6502,6 +6582,12 @@ static struct cftype memory_files[] = { > .name = "stat", > .seq_show = memory_stat_show, > }, > +#ifdef CONFIG_NUMA > + { > + .name = "numa_stat", > + .seq_show = memory_numa_stat_show, > + }, > +#endif > { > .name = "oom.group", > .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, > -- > 2.20.1 >