From: Chen Yu <yu.c.chen@intel.com>
To: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Huang, Ying" <ying.huang@linux.alibaba.com>,
Tim Chen <tim.c.chen@intel.com>, Aubrey Li <aubrey.li@intel.com>,
Michael Wang <yun.wang@linux.alibaba.com>,
Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
David Rientjes <rientjes@google.com>,
Raghavendra K T <raghavendra.kt@amd.com>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Chen Yu <yu.c.chen@intel.com>
Subject: [RFC PATCH 1/3] sched/numa: Introduce numa balance task migration and swap in schedstats
Date: Tue, 25 Feb 2025 22:00:01 +0800 [thread overview]
Message-ID: <1847c5ef828ad4835a35e3a54b88d2e13bce0eea.1740483690.git.yu.c.chen@intel.com> (raw)
In-Reply-To: <cover.1740483690.git.yu.c.chen@intel.com>
There is a requirement to track task activities during NUMA
balancing. NUMA balancing has two mechanisms for task migration:
one is to migrate the task to an idle CPU in its preferred node,
and the other is to swap tasks on different nodes if they are
on each other's preferred node. The kernel already has NUMA page
migration statistics. Add the task migration and swap count
described above in the per-task/cgroup scope. The data will be
displayed at
/sys/fs/cgroup/mytest/memory.stat and
/proc/{PID}/sched.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
include/linux/sched.h | 4 ++++
include/linux/vm_event_item.h | 2 ++
kernel/sched/core.c | 10 ++++++++--
kernel/sched/debug.c | 4 ++++
mm/memcontrol.c | 2 ++
mm/vmstat.c | 2 ++
6 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9632e3318e0d..01faa608ed7c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -527,6 +527,10 @@ struct sched_statistics {
u64 nr_failed_migrations_running;
u64 nr_failed_migrations_hot;
u64 nr_forced_migrations;
+#ifdef CONFIG_NUMA_BALANCING
+ u64 nr_numa_migrations;
+ u64 nr_numa_swap;
+#endif
u64 nr_wakeups;
u64 nr_wakeups_sync;
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index f70d0958095c..aef817474781 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -64,6 +64,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
NUMA_HINT_FAULTS,
NUMA_HINT_FAULTS_LOCAL,
NUMA_PAGE_MIGRATE,
+ NUMA_TASK_MIGRATE,
+ NUMA_TASK_SWAP,
#endif
#ifdef CONFIG_MIGRATION
PGMIGRATE_SUCCESS, PGMIGRATE_FAIL,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 165c90ba64ea..44efc725054a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3348,6 +3348,11 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
#ifdef CONFIG_NUMA_BALANCING
static void __migrate_swap_task(struct task_struct *p, int cpu)
{
+ __schedstat_inc(p->stats.nr_numa_swap);
+
+ if (p->mm)
+ count_memcg_events_mm(p->mm, NUMA_TASK_SWAP, 1);
+
if (task_on_rq_queued(p)) {
struct rq *src_rq, *dst_rq;
struct rq_flags srf, drf;
@@ -7901,8 +7906,9 @@ int migrate_task_to(struct task_struct *p, int target_cpu)
if (!cpumask_test_cpu(target_cpu, p->cpus_ptr))
return -EINVAL;
- /* TODO: This is not properly updating schedstats */
-
+ __schedstat_inc(p->stats.nr_numa_migrations);
+ if (p->mm)
+ count_memcg_events_mm(p->mm, NUMA_TASK_MIGRATE, 1);
trace_sched_move_numa(p, curr_cpu, target_cpu);
return stop_one_cpu(curr_cpu, migration_cpu_stop, &arg);
}
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index ef047add7f9e..ed801cc00bf1 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1204,6 +1204,10 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
P_SCHEDSTAT(nr_failed_migrations_running);
P_SCHEDSTAT(nr_failed_migrations_hot);
P_SCHEDSTAT(nr_forced_migrations);
+#ifdef CONFIG_NUMA_BALANCING
+ P_SCHEDSTAT(nr_numa_migrations);
+ P_SCHEDSTAT(nr_numa_swap);
+#endif
P_SCHEDSTAT(nr_wakeups);
P_SCHEDSTAT(nr_wakeups_sync);
P_SCHEDSTAT(nr_wakeups_migrate);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 46f8b372d212..496b5edc3db6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -460,6 +460,8 @@ static const unsigned int memcg_vm_event_stat[] = {
NUMA_PAGE_MIGRATE,
NUMA_PTE_UPDATES,
NUMA_HINT_FAULTS,
+ NUMA_TASK_MIGRATE,
+ NUMA_TASK_SWAP,
#endif
};
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 16bfe1c694dd..d6651778e4bf 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1339,6 +1339,8 @@ const char * const vmstat_text[] = {
"numa_hint_faults",
"numa_hint_faults_local",
"numa_pages_migrated",
+ "numa_task_migrated",
+ "numa_task_swaped",
#endif
#ifdef CONFIG_MIGRATION
"pgmigrate_success",
--
2.25.1
next prev parent reply other threads:[~2025-02-25 14:05 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-25 13:59 [RFC PATCH 0/3] sched/numa: Introduce per cgroup numa balance control Chen Yu
2025-02-25 14:00 ` Chen Yu [this message]
2025-02-25 14:00 ` [RFC PATCH 2/3] " Chen Yu
2025-03-07 22:54 ` Tim Chen
2025-03-10 15:36 ` Chen Yu
2025-02-25 14:00 ` [RFC PATCH 3/3] sched/numa: Allow intervale memory allocation for numa balance Chen Yu
2025-03-05 14:38 ` [RFC PATCH 0/3] sched/numa: Introduce per cgroup numa balance control Kaiyang Zhao
2025-03-10 15:12 ` Chen Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1847c5ef828ad4835a35e3a54b88d2e13bce0eea.1740483690.git.yu.c.chen@intel.com \
--to=yu.c.chen@intel.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=aubrey.li@intel.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=kaiyang2@cs.cmu.edu \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=muchun.song@linux.dev \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tim.c.chen@intel.com \
--cc=vincent.guittot@linaro.org \
--cc=ying.huang@linux.alibaba.com \
--cc=yun.wang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox