linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Randy Dunlap <rdunlap@infradead.org>
To: Muchun Song <songmuchun@bytedance.com>,
	tj@kernel.org, lizefan@huawei.com, hannes@cmpxchg.org,
	corbet@lwn.net, mhocko@kernel.org, vdavydov.dev@gmail.com,
	akpm@linux-foundation.org, shakeelb@google.com, guro@fb.com
Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v4] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
Date: Tue, 15 Sep 2020 08:44:55 -0700	[thread overview]
Message-ID: <a3e2a7bf-ae5a-9ca8-74f9-57af795f0380@infradead.org> (raw)
In-Reply-To: <20200915055825.5279-1-songmuchun@bytedance.com>

Hi,

On 9/14/20 10:58 PM, Muchun Song wrote:
> In the cgroup v1, we have a numa_stat interface. This is useful for
> providing visibility into the numa locality information within an
> memcg since the pages are allowed to be allocated from any physical
> node. One of the use cases is evaluating application performance by
> combining this information with the application's CPU allocation.
> But the cgroup v2 does not. So this patch adds the missing information.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Suggested-by: Shakeel Butt <shakeelb@google.com>
> ---
>  changelog in v4:
>  1. Fix some document problems pointed out by Randy Dunlap.
>  2. Remove memory_numa_stat_format() suggested by Shakeel Butt.
> 
>  changelog in v3:
>  1. Fix compiler error on powerpc architecture reported by kernel test robot.
>  2. Fix a typo from "anno" to "anon".
> 
>  changelog in v2:
>  1. Add memory.numa_stat interface in cgroup v2.
> 
>  Documentation/admin-guide/cgroup-v2.rst | 72 +++++++++++++++++++++
>  mm/memcontrol.c                         | 86 +++++++++++++++++++++++++
>  2 files changed, 158 insertions(+)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 6be43781ec7f..bcb7b202e88d 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1368,6 +1368,78 @@ PAGE_SIZE multiple when read back.
>  		collapsing an existing range of pages. This counter is not
>  		present when CONFIG_TRANSPARENT_HUGEPAGE is not set.
>  
> +  memory.numa_stat
> +	A read-only flat-keyed file which exists on non-root cgroups.
> +
> +	This breaks down the cgroup's memory footprint into different
> +	types of memory, type-specific details, and other information
> +	per node on the state of the memory management system.
> +
> +	This is useful for providing visibility into the NUMA locality
> +	information within an memcg since the pages are allowed to be
> +	allocated from any physical node. One of the use cases is evaluating
> +	application performance by combining this information with the
> +	application's CPU allocation.
> +
> +	All memory amounts are in bytes.
> +
> +	The output format of memory.numa_stat is::
> +
> +	  type N0=<bytes in node 0 pages> N1=<bytes in node 1 pages> ...

I'm OK with Shakeel's suggested change here.

> +	The entries are ordered to be human readable, and new entries
> +	can show up in the middle. Don't rely on items remaining in a
> +	fixed position; use the keys to look up specific values!
> +
> +	  anon
> +		Amount of memory per node used in anonymous mappings such
> +		as brk(), sbrk(), and mmap(MAP_ANONYMOUS).
> +
> +	  file
> +		Amount of memory per node used to cache filesystem data,
> +		including tmpfs and shared memory.
> +
> +	  kernel_stack
> +		Amount of memory per node allocated to kernel stacks.
> +
> +	  shmem
> +		Amount of cached filesystem data per node that is swap-backed,
> +		such as tmpfs, shm segments, shared anonymous mmap()s.
> +
> +	  file_mapped
> +		Amount of cached filesystem data per node mapped with mmap().
> +
> +	  file_dirty
> +		Amount of cached filesystem data per node that was modified but
> +		not yet written back to disk.
> +
> +	  file_writeback
> +		Amount of cached filesystem data per node that was modified and
> +		is currently being written back to disk.
> +
> +	  anon_thp
> +		Amount of memory per node used in anonymous mappings backed by
> +		transparent hugepages.
> +
> +	  inactive_anon, active_anon, inactive_file, active_file, unevictable
> +		Amount of memory, swap-backed and filesystem-backed,
> +		per node on the internal memory management lists used
> +		by the page reclaim algorithm.
> +
> +		As these represent internal list state (e.g. shmem pages are on
> +		anon memory management lists), inactive_foo + active_foo may not
> +		be equal to the value for the foo counter, since the foo counter
> +		is type-based, not list-based.
> +
> +	  slab_reclaimable
> +		Amount of memory per node used for storing in-kernel data
> +		structures which might be reclaimed, such as dentries and
> +		inodes.
> +
> +	  slab_unreclaimable
> +		Amount of memory per node used for storing in-kernel data
> +		structures which cannot be reclaimed on memory pressure.
> +
>    memory.swap.current
>  	A read-only single value file which exists on non-root
>  	cgroups.
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 75cd1a1e66c8..ff919ef3b57b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6425,6 +6425,86 @@ static int memory_stat_show(struct seq_file *m, void *v)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_NUMA
> +struct numa_stat {
> +	const char *name;
> +	unsigned int ratio;
> +	enum node_stat_item idx;
> +};
> +
> +static struct numa_stat numa_stats[] = {
> +	{ "anon", PAGE_SIZE, NR_ANON_MAPPED },
> +	{ "file", PAGE_SIZE, NR_FILE_PAGES },
> +	{ "kernel_stack", 1024, NR_KERNEL_STACK_KB },
> +	{ "shmem", PAGE_SIZE, NR_SHMEM },
> +	{ "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
> +	{ "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
> +	{ "file_writeback", PAGE_SIZE, NR_WRITEBACK },
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	/*
> +	 * The ratio will be initialized in numa_stats_init(). Because
> +	 * on some architectures, the macro of HPAGE_PMD_SIZE is not
> +	 * constant(e.g. powerpc).
> +	 */
> +	{ "anon_thp", 0, NR_ANON_THPS },
> +#endif
> +	{ "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
> +	{ "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
> +	{ "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
> +	{ "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
> +	{ "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
> +	{ "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
> +	{ "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
> +};
> +
> +static int __init numa_stats_init(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +		if (numa_stats[i].idx == NR_ANON_THPS)
> +			numa_stats[i].ratio = HPAGE_PMD_SIZE;
> +#endif
> +	}

Although the loop may be needed sometime in the future due to
other changes.. why couldn't it be like this for now?


> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> +		if (numa_stats[i].idx == NR_ANON_THPS)
> +			numa_stats[i].ratio = HPAGE_PMD_SIZE;
> +	}
> +#endif


> +
> +	return 0;
> +}
> +pure_initcall(numa_stats_init);


thanks.
-- 
~Randy



  parent reply	other threads:[~2020-09-15 15:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-15  5:58 Muchun Song
2020-09-15 13:53 ` Shakeel Butt
2020-09-15 15:44 ` Randy Dunlap [this message]
2020-09-15 16:01   ` [External] " Muchun Song
2020-09-15 16:19     ` Randy Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a3e2a7bf-ae5a-9ca8-74f9-57af795f0380@infradead.org \
    --to=rdunlap@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox