From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66A16C54ED0 for ; Fri, 23 May 2025 23:43:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C15C36B007B; Fri, 23 May 2025 19:43:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BAE796B0082; Fri, 23 May 2025 19:43:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC4D96B0085; Fri, 23 May 2025 19:43:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8F2336B007B for ; Fri, 23 May 2025 19:43:10 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E6C1A1D43FE for ; Fri, 23 May 2025 23:43:09 +0000 (UTC) X-FDA: 83475800898.08.5975D23 Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) by imf14.hostedemail.com (Postfix) with ESMTP id 0ED7D100007 for ; Fri, 23 May 2025 23:43:07 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="qFuuEvm/"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf14.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748043788; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Fw4eN0qQQXqMxLwyqIdk47YqQp0VMxRQquWBawW7/XA=; b=AEUfVYJr6/5uUylThnodM1DnXuPdfN1U89yzlA704jRnFk7iP15PtznF095NUu/fSm2yOQ pMUAyPLz/WmA7Ef2/WyxuXSxPHUTm2wJ7gYzN6GghIpLz1UsvX2TKJUDOgLQ64Pjc2/klt +EP0qOnt1TrpuLUKIW3tkvV2ziICu5o= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748043788; a=rsa-sha256; cv=none; b=LuZ6R5XvVVZkZeIkXT8xqzrM5gGWsLOpFtpwIlLH3RiM+1JECZkLWbMwBag1Ek5++Ws5Bj 3E4hh8PSVKDyy4Q6arNFQxnY6ycbN3pcg1N7ke+kLPSF1t7mZtnSukAQWXY19IzXEKxLo0 STuIqgLOseelsYyf/UXXRpeVXJj0tZE= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="qFuuEvm/"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf14.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev Date: Fri, 23 May 2025 16:42:50 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1748043784; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Fw4eN0qQQXqMxLwyqIdk47YqQp0VMxRQquWBawW7/XA=; b=qFuuEvm/rG2JcjdBOv6o9n1algz2BxVu7HuqslSM4C1W89BMFdGsBDEiWAAK/F9LvRIAyr Zqv+USOL2PRJ3WyD0aZsm2Eljq07oW7YZJ1ZtxXCH+riKdBl4xKqKYH0OqZc9cIRFX4Mtr shtzjwK0PQAVVMMySa3bl4Yrr163c5U= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Chen Yu Cc: peterz@infradead.org, akpm@linux-foundation.org, mkoutny@suse.com, mingo@redhat.com, tj@kernel.org, hannes@cmpxchg.org, corbet@lwn.net, mgorman@suse.de, mhocko@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, tim.c.chen@intel.com, aubrey.li@intel.com, libo.chen@oracle.com, kprateek.nayak@amd.com, vineethr@linux.ibm.com, venkat88@linux.ibm.com, ayushjai@amd.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, yu.chen.surf@foxmail.com Subject: Re: [PATCH v5 2/2] sched/numa: add statistics of numa balance task Message-ID: References: <7ef90a88602ed536be46eba7152ed0d33bad5790.1748002400.git.yu.c.chen@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7ef90a88602ed536be46eba7152ed0d33bad5790.1748002400.git.yu.c.chen@intel.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 0ED7D100007 X-Stat-Signature: m8c8mi9i9hx1dhb3bfgyj93o3tssp7oa X-Rspam-User: X-HE-Tag: 1748043787-854606 X-HE-Meta: U2FsdGVkX1/avboa5GTtc6lLYic0TwterljwMj56TBUol77Pxb/jhMS2IRs9dPUGf/Xhu0PJ36P+jcSNwUstZcycf6u+TilwkJDVnOzhj8JJs1rEiuL2CHbVJwrvEVmFUayr+p2QrAWzBRfQQB46GMx052Ab4+Piu/tQNMoXk7K7Jk43/zNKKY63XFz3yXxJH9wbTyD1VdtVmVvlma0OCV46cy+JNPn9RjBdCnCMyhO84VqPqzuTLxAQt1w3a7xBc6CW3L7MavH9nUBM3gAeWPr9HzCXuiom8e92xr8bnsIJLp4iW/+mREvRKiJIOJGMaqYWx/Xf+6fjDAZuSAdyQCWDQQ1oA0eKJhJtff/ywuW+8FBc2/XHhewYvbOaEBfRjAEXPDWztXZEiR6ewNyTQodWkcn6eWGLlfOYpOYAURksYG0pXD3jSMLKm1vA0U3NpWG6nT/rFdk+dCyhd54TpG1A1ocv2jc0uIpC3ruWG8epLxdLgjEFr9y4QwcVmKZh4gfxcjO3FkJhNnR9Qhen1r4VVRc69VKXR7UsGlkYGkxZE2EW7fA3xPO9CxVxTjV+3c2hFxYb0T5yWf/rPFaSEqVziKrjUGyhe3rDvfa6/jbGs9ecyn+eTneM1DkF8hEfrCZthLvJ9A2BGQx6sm5V23ka2ddzeeOl9CXfurn90cHZDMY31yp93Z2A0Nh13XxLlCJEqOr6u6c206CPZwdjTNI8idXpLZQ3+jZdVFydQ48TO0WDVbiEalScWMfQ/aQpqIp2DNmg8puUf/Cq8/ysbtMadq6IRdcOgMydgvLPbo28TwRbbF66AWpYaPvVQCNQuESr28UHmWP5Q7Yhc3Yr0HxlxDo+PnBEBUDMYBZb81LavmNq5SoJVDfggl35goZPC5kj/WTA9olPcmn4makLqAmAsuB3o7vYFefKLK+qtqH8rn2HVdRRn0M4kExFbQHMk5aAKOpqpr0PvK90Ouz sTuUwHvR kTJvfXtvrB3YwgbnkCDWfmyq8P+UGYbQ8rXpipof2sMwTBij8S/OnTtVlxQkAzkbhj7miU3niXlRAiMU8R4vR7XR73zMLl2xpto+QtAyFF9bs8FpcVhlmyzKZAsFCAKb6MnDknnVw9qbqMo6M8tXp141l7DHQfU9L3xF4Rl7Zbo3RmbmkzQ0BD+J39E7ZXzPi3XnTl+x3Y6uTgF36IdB+Z14oCqYUo/L7hRhkWW+X5gXplGhX056DWX3vlLUBqiZM+x7z4vwQNQor2yOL2V8ivwbw9ZoRUjhuOEZTYVUj4A0Dxzy3zC0gGDgyudf5RxjRFPxYgK6BK6cZlIaK7YC1c0bvEWNYxv0ArsBV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 23, 2025 at 08:51:15PM +0800, Chen Yu wrote: > On systems with NUMA balancing enabled, it has been found > that tracking task activities resulting from NUMA balancing > is beneficial. NUMA balancing employs two mechanisms for task > migration: one is to migrate a task to an idle CPU within its > preferred node, and the other is to swap tasks located on > different nodes when they are on each other's preferred nodes. > > The kernel already provides NUMA page migration statistics in > /sys/fs/cgroup/mytest/memory.stat and /proc/{PID}/sched. However, > it lacks statistics regarding task migration and swapping. > Therefore, relevant counts for task migration and swapping should > be added. > > The following two new fields: > > numa_task_migrated > numa_task_swapped > > will be shown in /sys/fs/cgroup/{GROUP}/memory.stat, /proc/{PID}/sched > and /proc/vmstat Hmm these are scheduler events, how are these relevant to memory cgroup or vmstat? Any reason to not expose these in cpu.stat? > > Introducing both per-task and per-memory cgroup (memcg) NUMA > balancing statistics facilitates a rapid evaluation of the > performance and resource utilization of the target workload. > For instance, users can first identify the container with high > NUMA balancing activity and then further pinpoint a specific > task within that group, and subsequently adjust the memory policy > for that task. In short, although it is possible to iterate through > /proc/$pid/sched to locate the problematic task, the introduction > of aggregated NUMA balancing activity for tasks within each memcg > can assist users in identifying the task more efficiently through > a divide-and-conquer approach. > > As Libo Chen pointed out, the memcg event relies on the text > names in vmstat_text, and /proc/vmstat generates corresponding items > based on vmstat_text. Thus, the relevant task migration and swapping > events introduced in vmstat_text also need to be populated by > count_vm_numa_event(), otherwise these values are zero in > /proc/vmstat. > > Tested-by: K Prateek Nayak > Tested-by: Madadi Vineeth Reddy > Acked-by: Peter Zijlstra (Intel) > Tested-by: Venkat Rao Bagalkote > Signed-off-by: Chen Yu > --- > v4->v5: > no change. > v3->v4: > Populate the /prov/vmstat otherwise the items are all zero. > (Libo) > v2->v3: > Remove unnecessary p->mm check because kernel threads are > not supported by Numa Balancing. (Libo Chen) > v1->v2: > Update the Documentation/admin-guide/cgroup-v2.rst. (Michal) > --- > Documentation/admin-guide/cgroup-v2.rst | 6 ++++++ > include/linux/sched.h | 4 ++++ > include/linux/vm_event_item.h | 2 ++ > kernel/sched/core.c | 9 +++++++-- > kernel/sched/debug.c | 4 ++++ > mm/memcontrol.c | 2 ++ > mm/vmstat.c | 2 ++ > 7 files changed, 27 insertions(+), 2 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 1a16ce68a4d7..d346f3235945 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1670,6 +1670,12 @@ The following nested keys are defined. > numa_hint_faults (npn) > Number of NUMA hinting faults. > > + numa_task_migrated (npn) > + Number of task migration by NUMA balancing. > + > + numa_task_swapped (npn) > + Number of task swap by NUMA balancing. > + > pgdemote_kswapd > Number of pages demoted by kswapd. > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index f96ac1982893..1c50e30b5c01 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -549,6 +549,10 @@ struct sched_statistics { > u64 nr_failed_migrations_running; > u64 nr_failed_migrations_hot; > u64 nr_forced_migrations; > +#ifdef CONFIG_NUMA_BALANCING > + u64 numa_task_migrated; > + u64 numa_task_swapped; > +#endif > > u64 nr_wakeups; > u64 nr_wakeups_sync; > diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h > index 9e15a088ba38..91a3ce9a2687 100644 > --- a/include/linux/vm_event_item.h > +++ b/include/linux/vm_event_item.h > @@ -66,6 +66,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > NUMA_HINT_FAULTS, > NUMA_HINT_FAULTS_LOCAL, > NUMA_PAGE_MIGRATE, > + NUMA_TASK_MIGRATE, > + NUMA_TASK_SWAP, > #endif > #ifdef CONFIG_MIGRATION > PGMIGRATE_SUCCESS, PGMIGRATE_FAIL, > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index c81cf642dba0..62b033199e9c 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3352,6 +3352,10 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) > #ifdef CONFIG_NUMA_BALANCING > static void __migrate_swap_task(struct task_struct *p, int cpu) > { > + __schedstat_inc(p->stats.numa_task_swapped); > + count_vm_numa_event(NUMA_TASK_SWAP); > + count_memcg_event_mm(p->mm, NUMA_TASK_SWAP); > + > if (task_on_rq_queued(p)) { > struct rq *src_rq, *dst_rq; > struct rq_flags srf, drf; > @@ -7953,8 +7957,9 @@ int migrate_task_to(struct task_struct *p, int target_cpu) > if (!cpumask_test_cpu(target_cpu, p->cpus_ptr)) > return -EINVAL; > > - /* TODO: This is not properly updating schedstats */ > - > + __schedstat_inc(p->stats.numa_task_migrated); > + count_vm_numa_event(NUMA_TASK_MIGRATE); > + count_memcg_event_mm(p->mm, NUMA_TASK_MIGRATE); > trace_sched_move_numa(p, curr_cpu, target_cpu); > return stop_one_cpu(curr_cpu, migration_cpu_stop, &arg); > } > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c > index 56ae54e0ce6a..f971c2af7912 100644 > --- a/kernel/sched/debug.c > +++ b/kernel/sched/debug.c > @@ -1206,6 +1206,10 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, > P_SCHEDSTAT(nr_failed_migrations_running); > P_SCHEDSTAT(nr_failed_migrations_hot); > P_SCHEDSTAT(nr_forced_migrations); > +#ifdef CONFIG_NUMA_BALANCING > + P_SCHEDSTAT(numa_task_migrated); > + P_SCHEDSTAT(numa_task_swapped); > +#endif > P_SCHEDSTAT(nr_wakeups); > P_SCHEDSTAT(nr_wakeups_sync); > P_SCHEDSTAT(nr_wakeups_migrate); > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c96c1f2b9cf5..cdaab8a957f3 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -463,6 +463,8 @@ static const unsigned int memcg_vm_event_stat[] = { > NUMA_PAGE_MIGRATE, > NUMA_PTE_UPDATES, > NUMA_HINT_FAULTS, > + NUMA_TASK_MIGRATE, > + NUMA_TASK_SWAP, > #endif > }; > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 4c268ce39ff2..ed08bb384ae4 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -1347,6 +1347,8 @@ const char * const vmstat_text[] = { > "numa_hint_faults", > "numa_hint_faults_local", > "numa_pages_migrated", > + "numa_task_migrated", > + "numa_task_swapped", > #endif > #ifdef CONFIG_MIGRATION > "pgmigrate_success", > -- > 2.25.1 >