linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeel.butt@linux.dev>
To: Chen Yu <yu.c.chen@intel.com>
Cc: peterz@infradead.org, akpm@linux-foundation.org,
	mkoutny@suse.com,  mingo@redhat.com, tj@kernel.org,
	hannes@cmpxchg.org, corbet@lwn.net,  mgorman@suse.de,
	mhocko@kernel.org, muchun.song@linux.dev,
	 roman.gushchin@linux.dev, tim.c.chen@intel.com,
	aubrey.li@intel.com, libo.chen@oracle.com,
	 kprateek.nayak@amd.com, vineethr@linux.ibm.com,
	venkat88@linux.ibm.com, ayushjai@amd.com,
	 cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	yu.chen.surf@foxmail.com
Subject: Re: [PATCH v5 2/2] sched/numa: add statistics of numa balance task
Date: Fri, 23 May 2025 16:42:50 -0700	[thread overview]
Message-ID: <cx4s4pnw5ymr4bxxmvrkhc457krq46eh6zamlr4ikp7tn3jsno@xzchjlnnawe5> (raw)
In-Reply-To: <7ef90a88602ed536be46eba7152ed0d33bad5790.1748002400.git.yu.c.chen@intel.com>

On Fri, May 23, 2025 at 08:51:15PM +0800, Chen Yu wrote:
> On systems with NUMA balancing enabled, it has been found
> that tracking task activities resulting from NUMA balancing
> is beneficial. NUMA balancing employs two mechanisms for task
> migration: one is to migrate a task to an idle CPU within its
> preferred node, and the other is to swap tasks located on
> different nodes when they are on each other's preferred nodes.
> 
> The kernel already provides NUMA page migration statistics in
> /sys/fs/cgroup/mytest/memory.stat and /proc/{PID}/sched. However,
> it lacks statistics regarding task migration and swapping.
> Therefore, relevant counts for task migration and swapping should
> be added.
> 
> The following two new fields:
> 
> numa_task_migrated
> numa_task_swapped
> 
> will be shown in /sys/fs/cgroup/{GROUP}/memory.stat, /proc/{PID}/sched
> and /proc/vmstat

Hmm these are scheduler events, how are these relevant to memory cgroup
or vmstat? Any reason to not expose these in cpu.stat?

> 
> Introducing both per-task and per-memory cgroup (memcg) NUMA
> balancing statistics facilitates a rapid evaluation of the
> performance and resource utilization of the target workload.
> For instance, users can first identify the container with high
> NUMA balancing activity and then further pinpoint a specific
> task within that group, and subsequently adjust the memory policy
> for that task. In short, although it is possible to iterate through
> /proc/$pid/sched to locate the problematic task, the introduction
> of aggregated NUMA balancing activity for tasks within each memcg
> can assist users in identifying the task more efficiently through
> a divide-and-conquer approach.
> 
> As Libo Chen pointed out, the memcg event relies on the text
> names in vmstat_text, and /proc/vmstat generates corresponding items
> based on vmstat_text. Thus, the relevant task migration and swapping
> events introduced in vmstat_text also need to be populated by
> count_vm_numa_event(), otherwise these values are zero in
> /proc/vmstat.
> 
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
> v4->v5:
> no change.
> v3->v4:
> Populate the /prov/vmstat otherwise the items are all zero.
> (Libo)
> v2->v3:
> Remove unnecessary p->mm check because kernel threads are
> not supported by Numa Balancing. (Libo Chen)
> v1->v2:
> Update the Documentation/admin-guide/cgroup-v2.rst. (Michal)
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 6 ++++++
>  include/linux/sched.h                   | 4 ++++
>  include/linux/vm_event_item.h           | 2 ++
>  kernel/sched/core.c                     | 9 +++++++--
>  kernel/sched/debug.c                    | 4 ++++
>  mm/memcontrol.c                         | 2 ++
>  mm/vmstat.c                             | 2 ++
>  7 files changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 1a16ce68a4d7..d346f3235945 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1670,6 +1670,12 @@ The following nested keys are defined.
>  	  numa_hint_faults (npn)
>  		Number of NUMA hinting faults.
>  
> +	  numa_task_migrated (npn)
> +		Number of task migration by NUMA balancing.
> +
> +	  numa_task_swapped (npn)
> +		Number of task swap by NUMA balancing.
> +
>  	  pgdemote_kswapd
>  		Number of pages demoted by kswapd.
>  
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f96ac1982893..1c50e30b5c01 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -549,6 +549,10 @@ struct sched_statistics {
>  	u64				nr_failed_migrations_running;
>  	u64				nr_failed_migrations_hot;
>  	u64				nr_forced_migrations;
> +#ifdef CONFIG_NUMA_BALANCING
> +	u64				numa_task_migrated;
> +	u64				numa_task_swapped;
> +#endif
>  
>  	u64				nr_wakeups;
>  	u64				nr_wakeups_sync;
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 9e15a088ba38..91a3ce9a2687 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -66,6 +66,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>  		NUMA_HINT_FAULTS,
>  		NUMA_HINT_FAULTS_LOCAL,
>  		NUMA_PAGE_MIGRATE,
> +		NUMA_TASK_MIGRATE,
> +		NUMA_TASK_SWAP,
>  #endif
>  #ifdef CONFIG_MIGRATION
>  		PGMIGRATE_SUCCESS, PGMIGRATE_FAIL,
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c81cf642dba0..62b033199e9c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3352,6 +3352,10 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  #ifdef CONFIG_NUMA_BALANCING
>  static void __migrate_swap_task(struct task_struct *p, int cpu)
>  {
> +	__schedstat_inc(p->stats.numa_task_swapped);
> +	count_vm_numa_event(NUMA_TASK_SWAP);
> +	count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
> +
>  	if (task_on_rq_queued(p)) {
>  		struct rq *src_rq, *dst_rq;
>  		struct rq_flags srf, drf;
> @@ -7953,8 +7957,9 @@ int migrate_task_to(struct task_struct *p, int target_cpu)
>  	if (!cpumask_test_cpu(target_cpu, p->cpus_ptr))
>  		return -EINVAL;
>  
> -	/* TODO: This is not properly updating schedstats */
> -
> +	__schedstat_inc(p->stats.numa_task_migrated);
> +	count_vm_numa_event(NUMA_TASK_MIGRATE);
> +	count_memcg_event_mm(p->mm, NUMA_TASK_MIGRATE);
>  	trace_sched_move_numa(p, curr_cpu, target_cpu);
>  	return stop_one_cpu(curr_cpu, migration_cpu_stop, &arg);
>  }
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 56ae54e0ce6a..f971c2af7912 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -1206,6 +1206,10 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
>  		P_SCHEDSTAT(nr_failed_migrations_running);
>  		P_SCHEDSTAT(nr_failed_migrations_hot);
>  		P_SCHEDSTAT(nr_forced_migrations);
> +#ifdef CONFIG_NUMA_BALANCING
> +		P_SCHEDSTAT(numa_task_migrated);
> +		P_SCHEDSTAT(numa_task_swapped);
> +#endif
>  		P_SCHEDSTAT(nr_wakeups);
>  		P_SCHEDSTAT(nr_wakeups_sync);
>  		P_SCHEDSTAT(nr_wakeups_migrate);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c96c1f2b9cf5..cdaab8a957f3 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -463,6 +463,8 @@ static const unsigned int memcg_vm_event_stat[] = {
>  	NUMA_PAGE_MIGRATE,
>  	NUMA_PTE_UPDATES,
>  	NUMA_HINT_FAULTS,
> +	NUMA_TASK_MIGRATE,
> +	NUMA_TASK_SWAP,
>  #endif
>  };
>  
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 4c268ce39ff2..ed08bb384ae4 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1347,6 +1347,8 @@ const char * const vmstat_text[] = {
>  	"numa_hint_faults",
>  	"numa_hint_faults_local",
>  	"numa_pages_migrated",
> +	"numa_task_migrated",
> +	"numa_task_swapped",
>  #endif
>  #ifdef CONFIG_MIGRATION
>  	"pgmigrate_success",
> -- 
> 2.25.1
> 


  reply	other threads:[~2025-05-23 23:43 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-23 12:48 [PATCH v5 0/2] sched/numa: add statistics of numa balance task migration Chen Yu
2025-05-23 12:51 ` [PATCH v5 1/2] sched/numa: fix task swap by skipping kernel threads Chen Yu
2025-05-23 23:22   ` Shakeel Butt
2025-05-23 12:51 ` [PATCH v5 2/2] sched/numa: add statistics of numa balance task Chen Yu
2025-05-23 23:42   ` Shakeel Butt [this message]
2025-05-24  9:07     ` Chen, Yu C
2025-05-24 17:32       ` Shakeel Butt
2025-05-25 12:35         ` Chen, Yu C
2025-05-27 17:48           ` Shakeel Butt
2025-05-29  5:04             ` Chen, Yu C
2025-05-26 13:35     ` Michal Koutný
2025-05-27  9:20       ` Chen, Yu C
2025-05-27 18:15         ` Shakeel Butt
2025-06-02 16:53           ` Michal Koutný
2025-06-03 14:46             ` Chen, Yu C
2025-06-17  9:30               ` Michal Koutný
2025-06-19 13:03                 ` Chen, Yu C
2025-06-19 14:06                   ` Michal Koutný
2025-05-23 22:06 ` [PATCH v5 0/2] sched/numa: add statistics of numa balance task migration Andrew Morton
2025-05-23 23:52   ` Shakeel Butt
2025-05-28  0:21     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cx4s4pnw5ymr4bxxmvrkhc457krq46eh6zamlr4ikp7tn3jsno@xzchjlnnawe5 \
    --to=shakeel.butt@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=aubrey.li@intel.com \
    --cc=ayushjai@amd.com \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=hannes@cmpxchg.org \
    --cc=kprateek.nayak@amd.com \
    --cc=libo.chen@oracle.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=peterz@infradead.org \
    --cc=roman.gushchin@linux.dev \
    --cc=tim.c.chen@intel.com \
    --cc=tj@kernel.org \
    --cc=venkat88@linux.ibm.com \
    --cc=vineethr@linux.ibm.com \
    --cc=yu.c.chen@intel.com \
    --cc=yu.chen.surf@foxmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox