From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A68AC369D9 for ; Wed, 30 Apr 2025 10:42:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F04076B00BA; Wed, 30 Apr 2025 06:42:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EB1B26B00BB; Wed, 30 Apr 2025 06:42:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D50006B00BC; Wed, 30 Apr 2025 06:42:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id AC1856B00BA for ; Wed, 30 Apr 2025 06:42:27 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 17A525F1EB for ; Wed, 30 Apr 2025 10:42:29 +0000 (UTC) X-FDA: 83390371218.06.5D9B217 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by imf07.hostedemail.com (Postfix) with ESMTP id 85E4A4000E for ; Wed, 30 Apr 2025 10:42:26 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aYoPhZ0E; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf07.hostedemail.com: domain of yu.c.chen@intel.com designates 198.175.65.15 as permitted sender) smtp.mailfrom=yu.c.chen@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746009747; a=rsa-sha256; cv=none; b=gLTXvLvudrH/zTRQ+CAyJA9Y3OjCIA8qgLBcp4WsL21v/WlBzOR9QLQ/lfgWCh4CwKIfVN jOB+6Jisxkb3r0HqBcnkF/7u8tdoqj5wrKOXIgJ88rzUdmfbb8NWwyKkSQ9Gakwp01ZOtL TBB70E8J1uLH/wPzzsVGeloNiC3Uj9E= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aYoPhZ0E; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf07.hostedemail.com: domain of yu.c.chen@intel.com designates 198.175.65.15 as permitted sender) smtp.mailfrom=yu.c.chen@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746009747; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=ZYw2uFQ3TEuzO3vgDVuD5hSrtAOdqgDDG5/GjR4IvZM=; b=LVRaSJqEeaGgUgcmEII1Mu49daXY0jo9OhUb2DsQrRxQ+7fOAbwNkp29WhDXqSkeO23GbX b0l+Tir/gopAKGSnbwsHTqIxpSfWJi9h95fMZQOVbux4UiNlm78EjM+nhRM+V9LJacX3rp tI5z79rvyzsfJdpAsmlkc6YWLZHkGIc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1746009747; x=1777545747; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=VOI7vS/Hx63Qc2jd9lMpXR2iXNmohRhb+XxgADkUU14=; b=aYoPhZ0E4Pxw3kyp3j/jhKzjB8sCsfRFPS3k5so5i/+J9WkSRzY95CLu BCXCp74tXbLQ+sG9m/uktiMQn5QyWvMX/5Rqd+hFkivjHjRhfts8rCbYN 0SWPtSFGHjbDPC2CAk7bn7V8G5GNJK9OkJB4bQZittESFNU89V70U3EyL 8wfjpxp/8ShipfbIV5ktZb/WQlNlGFHd8qo0ENjXUePDkIPHUKGmtuyD/ q/RH5+E2EpjpCiw7oExi/4IxNyeczRnkFNnZ3uL/WI6utdj2V5Ca3TcaF 4OCto1c6Gv9aY/YKi8NIz1jtCd95XwaD9eHSn3ggJX2wS2asXgy/IPLe9 w==; X-CSE-ConnectionGUID: WwyPNCsMRUGN+8PQSSm2Yw== X-CSE-MsgGUID: 6egMnbGTR02zKopauSjDOQ== X-IronPort-AV: E=McAfee;i="6700,10204,11418"; a="51326303" X-IronPort-AV: E=Sophos;i="6.15,251,1739865600"; d="scan'208";a="51326303" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Apr 2025 03:42:25 -0700 X-CSE-ConnectionGUID: yYbopDp1TwOcHFALwXyC4g== X-CSE-MsgGUID: vjyOm4ajRs6T5uxyOPgSmw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,251,1739865600"; d="scan'208";a="138899488" Received: from chenyu-dev.sh.intel.com ([10.239.62.107]) by orviesa003.jf.intel.com with ESMTP; 30 Apr 2025 03:42:20 -0700 From: Chen Yu To: Peter Zijlstra , Andrew Morton Cc: Ingo Molnar , Tejun Heo , Johannes Weiner , Jonathan Corbet , Mel Gorman , Michal Hocko , Michal Koutny , Muchun Song , Roman Gushchin , Shakeel Butt , "Chen, Tim C" , Aubrey Li , Libo Chen , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chen Yu , K Prateek Nayak , Madadi Vineeth Reddy Subject: [PATCH v3] sched/numa: add statistics of numa balance task migration Date: Wed, 30 Apr 2025 18:36:23 +0800 Message-Id: <20250430103623.3349842-1-yu.c.chen@intel.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 85E4A4000E X-Rspamd-Server: rspam04 X-Stat-Signature: krxz6d686sjqsxyxhto1hq5fpcogca1o X-HE-Tag: 1746009746-593796 X-HE-Meta: U2FsdGVkX19koNNaGjbQp3UzVCH1PWje2EwpuaWvS8wG9CLDTWio4H+xEuhxStFXFwxZmBTCDW/7mmkrFC2KbAaltxAN/EzBBrtVWUcyndGDHzILf+gmP67npfQK5RYDWk+b8KHsy0KPQDIiRINd+Bo/Yuwqf02mdOze3boWWRxJykc8JZLbNZxfMfMkYgJ/zcg9bc4MBygD5hRx+w6rb2KEmBlYWBZq8I4FClKaSPt9M2cWICAJKBTfqzts5JMaOJM2dYkz/qt2cNtqa65aoKc17okrw8AiIR48zk0w11a+K2rxY2bMLjxeIPbVGz1EDjPgNVycuN00Lz3QmJf9EngStQZ6yMVO/Wfx0OHqBUQ//hkqgsCZsv97sIoygJCc4cfMpZHM0nVHCSqaialXiLBb9AYGkLA2lk9kYXQ/EBz2N0FwenIiOiBmAe409wLx0EiAIPr2CmYXWVQMo/FqgBOXfZZ03GY2GhgFK3YgZy52iBI6A+5fE+BwvaWLpYoz8r5iiMBUGmUcBExUnrej0tGK6rf1Ml7DW0rnEUGR9ksItmd8hCfg4VPPw8ijGg8G+1y4VHCIbj0O+8WY8TcTKh4i04D1hd7u96GBzUOjdm8gqKUeFxVDvyDdNCp1ZpuhpgpqVTSp189mcRsMnTMsFurTXRNAbpHiW4QSgPgaZd8AB8ANgsuIZMGrVPKc7SXZkGPitsxHps9KxLmYaq94QNx2YGVLP2FmZ+R+pciAryGVjYgw8dWmz5nAlicxnlwGeRJM5CMGVjKdRp3LIcAYTvpgg8oB5iYt6Pq3jzH5NhprDQ7aM9y8ocMyi1V4vC3EGbo5qPyBvh0OBkbcp6kzaA4sJxPcTjAKCy2rw6SH6Wnj5lDpRtaKiVdwVkkNctVGU9fXwd4RIFrz7g61C8/AJhalrGRnHsZIHfJeFo+Y4+5NxgHht7kPhJ0iWcDMYq4CO4QtdI319z25B4xu0A3 O1P60wmJ 5QQgfgF4n6sy8g99DErlJKTfxc4SLPSmWaRfnqI6gH85nI5dvTP5BkJnK5A/NDfkngW44/h0upns9twCi5wfVToxUeSUwIa/1xl9bc7ETIO/iyvji/AZgvRd2GBCS4HpyanV153qt48b6ErjFCj9n8mCicQP5M/WMiY3yDdVHT/qR5i2BC8CJLx1/r9Dn5ZkVLPeb+xwZZ/hIa/RekQoaoBC3tWK7Svu2tNxDxntM4op1AblY+7Fs9qNOc35/xp2OvVfH1m9StpcGfQ+j5DUWRb+nidnT+Eg8GNf4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On systems with NUMA balancing enabled, it is found that tracking the task activities due to NUMA balancing is helpful. NUMA balancing has two mechanisms for task migration: one is to migrate the task to an idle CPU in its preferred node, the other is to swap tasks on different nodes if they are on each other's preferred node. The kernel already has NUMA page migration statistics in /sys/fs/cgroup/mytest/memory.stat and /proc/{PID}/sched, but does not have statistics for task migration/swap. Add the task migration and swap count accordingly. The following two new fields: numa_task_migrated numa_task_swapped will be displayed in both /sys/fs/cgroup/{GROUP}/memory.stat and /proc/{PID}/sched Introducing both pertask and permemcg NUMA balancing statistics helps to quickly evaluate the performance and resource usage of the target workload. For example, the user can first identify the container which has high NUMA balance activity and then narrow down to a specific task within that group, and tune the memory policy of that task. In summary, it is plausible to iterate the /proc/$pid/sched to find the offending task, but the introduction of per memcg tasks' Numa balancing aggregated activity can further help users identify the task in a divide-and-conquer way. Tested-by: K Prateek Nayak Tested-by: Madadi Vineeth Reddy Acked-by: Peter Zijlstra (Intel) Signed-off-by: Chen Yu --- v2->v3: Remove unnecessary p->mm check because kernel threads are not supported by Numa Balancing. (Libo Chen) v1->v2: Update the Documentation/admin-guide/cgroup-v2.rst. (Michal) --- Documentation/admin-guide/cgroup-v2.rst | 6 ++++++ include/linux/sched.h | 4 ++++ include/linux/vm_event_item.h | 2 ++ kernel/sched/core.c | 7 +++++-- kernel/sched/debug.c | 4 ++++ mm/memcontrol.c | 2 ++ mm/vmstat.c | 2 ++ 7 files changed, 25 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 1a16ce68a4d7..d346f3235945 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1670,6 +1670,12 @@ The following nested keys are defined. numa_hint_faults (npn) Number of NUMA hinting faults. + numa_task_migrated (npn) + Number of task migration by NUMA balancing. + + numa_task_swapped (npn) + Number of task swap by NUMA balancing. + pgdemote_kswapd Number of pages demoted by kswapd. diff --git a/include/linux/sched.h b/include/linux/sched.h index f96ac1982893..1c50e30b5c01 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -549,6 +549,10 @@ struct sched_statistics { u64 nr_failed_migrations_running; u64 nr_failed_migrations_hot; u64 nr_forced_migrations; +#ifdef CONFIG_NUMA_BALANCING + u64 numa_task_migrated; + u64 numa_task_swapped; +#endif u64 nr_wakeups; u64 nr_wakeups_sync; diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 9e15a088ba38..91a3ce9a2687 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -66,6 +66,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, NUMA_HINT_FAULTS, NUMA_HINT_FAULTS_LOCAL, NUMA_PAGE_MIGRATE, + NUMA_TASK_MIGRATE, + NUMA_TASK_SWAP, #endif #ifdef CONFIG_MIGRATION PGMIGRATE_SUCCESS, PGMIGRATE_FAIL, diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c81cf642dba0..25a92f2abda4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3352,6 +3352,9 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) #ifdef CONFIG_NUMA_BALANCING static void __migrate_swap_task(struct task_struct *p, int cpu) { + __schedstat_inc(p->stats.numa_task_swapped); + count_memcg_events_mm(p->mm, NUMA_TASK_SWAP, 1); + if (task_on_rq_queued(p)) { struct rq *src_rq, *dst_rq; struct rq_flags srf, drf; @@ -7953,8 +7956,8 @@ int migrate_task_to(struct task_struct *p, int target_cpu) if (!cpumask_test_cpu(target_cpu, p->cpus_ptr)) return -EINVAL; - /* TODO: This is not properly updating schedstats */ - + __schedstat_inc(p->stats.numa_task_migrated); + count_memcg_events_mm(p->mm, NUMA_TASK_MIGRATE, 1); trace_sched_move_numa(p, curr_cpu, target_cpu); return stop_one_cpu(curr_cpu, migration_cpu_stop, &arg); } diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 56ae54e0ce6a..f971c2af7912 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -1206,6 +1206,10 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, P_SCHEDSTAT(nr_failed_migrations_running); P_SCHEDSTAT(nr_failed_migrations_hot); P_SCHEDSTAT(nr_forced_migrations); +#ifdef CONFIG_NUMA_BALANCING + P_SCHEDSTAT(numa_task_migrated); + P_SCHEDSTAT(numa_task_swapped); +#endif P_SCHEDSTAT(nr_wakeups); P_SCHEDSTAT(nr_wakeups_sync); P_SCHEDSTAT(nr_wakeups_migrate); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c96c1f2b9cf5..cdaab8a957f3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -463,6 +463,8 @@ static const unsigned int memcg_vm_event_stat[] = { NUMA_PAGE_MIGRATE, NUMA_PTE_UPDATES, NUMA_HINT_FAULTS, + NUMA_TASK_MIGRATE, + NUMA_TASK_SWAP, #endif }; diff --git a/mm/vmstat.c b/mm/vmstat.c index 4c268ce39ff2..ed08bb384ae4 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1347,6 +1347,8 @@ const char * const vmstat_text[] = { "numa_hint_faults", "numa_hint_faults_local", "numa_pages_migrated", + "numa_task_migrated", + "numa_task_swapped", #endif #ifdef CONFIG_MIGRATION "pgmigrate_success", -- 2.25.1