From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FE75C54FB3 for ; Thu, 29 May 2025 05:02:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98D6A6B00E7; Thu, 29 May 2025 01:02:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 965486B00E8; Thu, 29 May 2025 01:02:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 854496B00E9; Thu, 29 May 2025 01:02:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6D3AF6B00E7 for ; Thu, 29 May 2025 01:02:14 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id ADBC4E76C6 for ; Thu, 29 May 2025 05:02:13 +0000 (UTC) X-FDA: 83494748946.23.A38B5C4 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by imf02.hostedemail.com (Postfix) with ESMTP id 084638000D for ; Thu, 29 May 2025 05:02:10 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mnmWJtIk; spf=pass (imf02.hostedemail.com: domain of yu.c.chen@intel.com designates 198.175.65.13 as permitted sender) smtp.mailfrom=yu.c.chen@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748494931; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cG5YNH6aD0htv1YuRZIk9AawxMYDvNuAw1BXeZsZZIE=; b=45m1c3aRIjCMZOL3V/0ggG8qlxj5Rr6h0+ydf3+6RBHHNuzcoXUK//ocj9OaO9u0X1Hqj6 HP/OPI1lVqtqf/Ec5M/YDIYV2QfZ5c7W+/oK7eu3BDkDOUcT2pjSQNk47nr9k6CRN+fvzL HnHz5V/lRP68VM82EW/MBz+5TEx843k= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mnmWJtIk; spf=pass (imf02.hostedemail.com: domain of yu.c.chen@intel.com designates 198.175.65.13 as permitted sender) smtp.mailfrom=yu.c.chen@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748494931; a=rsa-sha256; cv=none; b=BervtFh0xK09isK/hzj2uDgMvSHFG4roXPhRt8EVQv6W5/8nE+s0iDQqMmhvXpNXIT33zm UuLme+2/Lz/ftOfHq1Yr/TYOuP0fU9v84OX3kHgLOtOPVkwzetgemLcZY7aUWXfAofX9o2 gjJyMib/ceDUHpvhYp976JeJJl7IY34= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1748494931; x=1780030931; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=K3tT8ayPay0EwSKZy1JU8tEbONSykiQ4PlOtU3Ckvn4=; b=mnmWJtIkr3V/4bgOOV5LKVyDw+jeTxMJ5zfJ1B1uAblA6csMQLfQ67/2 ++CeRS9P9n9nC3gQR6bnnRu6CdAxW019HCFGfU7KtzDhcnjNtTLWEwJZi Xthz+cxw9JGwtaBIahzHoF/AH9UlQz4vguhhy/XrRflElWwnUau5lMUW1 TmpUOoO2m9XTQj1jEKDt1Qs/D7ETd3xNBkncZDwZgx235No54jKtNkD8R lestXhU1su2rhuNFaZSgmd8h4MV3HZxRq1jx3uJflvUh/bNOhrNB0axyd o6+N4UUmtaQ9PAaqi42NPh75hn9AuHdRpaOJOivr1djYHkGhl2bWdzOBb A==; X-CSE-ConnectionGUID: wdvNibgyRwCy8y61FHB93g== X-CSE-MsgGUID: ChKdkF/UTeC61WXBc3MTaA== X-IronPort-AV: E=McAfee;i="6700,10204,11447"; a="61595275" X-IronPort-AV: E=Sophos;i="6.15,323,1739865600"; d="scan'208";a="61595275" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2025 22:02:10 -0700 X-CSE-ConnectionGUID: J0O0Tn4mTC2aWNBc8tvOYQ== X-CSE-MsgGUID: mHzp68MQRZCdkoW42DzvQw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,323,1739865600"; d="scan'208";a="148737864" Received: from chenyu-dev.sh.intel.com ([10.239.62.107]) by orviesa005.jf.intel.com with ESMTP; 28 May 2025 22:02:04 -0700 From: Chen Yu To: peterz@infradead.org, akpm@linux-foundation.org, mkoutny@suse.com, shakeel.butt@linux.dev Cc: mingo@redhat.com, tj@kernel.org, hannes@cmpxchg.org, corbet@lwn.net, mgorman@suse.de, mhocko@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, tim.c.chen@intel.com, aubrey.li@intel.com, libo.chen@oracle.com, kprateek.nayak@amd.com, vineethr@linux.ibm.com, venkat88@linux.ibm.com, ayushjai@amd.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, yu.chen.surf@foxmail.com, Ayush Jain , Chen Yu Subject: [PATCH v6 1/2] sched/numa: fix task swap by skipping kernel threads Date: Thu, 29 May 2025 12:54:04 +0800 Message-Id: <43d68b356b25d124f0d222ebedf3859e86eefb9f.1748493462.git.yu.c.chen@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: czgpbsh5ahekofqxt3hdpeqgkgkhmikq X-Rspamd-Queue-Id: 084638000D X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1748494930-797610 X-HE-Meta: U2FsdGVkX183c074X2QJpeZW/TCOMHoUXxNNrs6+6T+ktw2rvoxk3uG++ZpuS8/Vn9xioNzS/z6RzLm0OQtI7ODx87nlh4xxYjTLzb79eMh6mdKNFMn26uuBfN4p6njbG0KO1XimvbQ/rrctf+K7qyNvmMV57ZnltvsP9hvyU2xykstKRF0vs97gjfzadGc3B3bVyjiKasukseAk6kN2eW3t1QglvmRjpMIbzSHHmaIP4GswuKSPLjZ+juLLyTmv9R5Yt4YYIp0AS17/N7ER8XahLHUiqmrZG3k2vbC/V28Kz2AhDssbNxU5q4Ur4a/8FIaQwH+L6IpmIeQjvl+hZxLFyxj8+H6WjOFd6mR4jR9KftCjus7+uQ2JKehXkGZ67y6rk1NXhi/CYa20CpG/DapF7eO4SvWTmFqnezZ/8PJ/m4qcHivq5JRTi1tC/e56cgSa1Z68Vth4NxhLwWrxqoNaeGEf4Uc6CIK+OzoGJ9A49uRCuf/uVpuR7yKtGh6/0e7yjjgws5InHaADUY0fg7b5ae/iXoMa/3f2hDlwsBwub2TEOyMkInRjFv21Rhmhrb8U1cEOqf+M/MnefQAOgY5LWBneFDIl7BXQLAfei82Hl9Lolz1C7mNgUhF35GoMVQKWEHlnuZCoiAKuXuwls6DkvgVHhZpuF3qWa1ctkAuLCz/uISYaTRbw6ibGy0b+B0xDqi0j4bv9pqyeEcoFYHGs9jV9xQN9Nr+SXKVUAyaaUu/B8EuIYjm7TVvGyCYDmf73FiaEHkaCONQCyQOYRcxGAmF/sTxFGjRqc4ae7cJ711k8ku25RmCztDQadg9vQvW9EJXPIASsPfj0ulYgsE9JdMM/Uiqa00cPP+m2strv49f61gsBR1rfeNusy8nKZ07MrdaN47GHIC5D+3eDjzvsh8dAEZWalZBcaSrzCUmw7iyC54auRzZDBqqzB5AJtMVSgynmuPIGmsPYjL3 C2kxmfDb NjqO3OOk4BjFkGHMCz+nz4v9bdusrxG//ajBewKdIUuhcMz2X41F9hxwaojm/g/EJAAyVd6IRhdB3WU3RTR6AsLUkCzSjPlUrG35/miZ2C2P6WJ4kKMVt0tTxX+mZ+lafJ8US6zhw4DcFmY8iypJpNgAc2uBzh3WgwbrWrdQw+uTosyv9yx1oLTESC1Q48DOJQb4iF8S+tHDmPot6H8NdMQOHRUuPJdT31FlJEKehFxeiBxGT2e0KJYuxPplqbo29CYIBIOtOnJjhysqe9REhHWCQ6HuI8D2eRrlkf7is/pf1dimN00Xsn9PfbQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Libo Chen Task swapping is triggered when there are no idle CPUs in task A's preferred node. In this case, the NUMA load balancer chooses a task B on A's preferred node and swaps B with A. This helps improve NUMA locality without introducing load imbalance between nodes. In the current implementation, B's NUMA node preference is not mandatory. That is to say, a kernel thread might be incorrectly chosen as B. However, kernel thread and user space thread that does not have mm are not supposed to be covered by NUMA balancing because NUMA balancing only considers user pages via VMAs. According to Peter's suggestion for fixing this issue, we use PF_KTHREAD to skip the kernel thread. curr->mm is also checked because it is possible that user_mode_thread() might create a user thread without an mm. As per Prateek's analysis, after adding the PF_KTHREAD check, there is no need to further check the PF_IDLE flag: " - play_idle_precise() already ensures PF_KTHREAD is set before adding PF_IDLE - cpu_startup_entry() is only called from the startup thread which should be marked with PF_KTHREAD (based on my understanding looking at commit cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")) " In summary, the check in task_numa_compare() now aligns with task_tick_numa(). Suggested-by: Michal Koutny Tested-by: Ayush Jain Signed-off-by: Libo Chen Tested-by: Venkat Rao Bagalkote Reviewed-by: Shakeel Butt Signed-off-by: Chen Yu --- v5->v6: Add Reviewed-by from Shakeel. v4->v5: Add PF_KTHREAD check, and remove PF_IDLE check(Prateek). --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 125912c0e9dd..68aa5941c8ba 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2273,7 +2273,8 @@ static bool task_numa_compare(struct task_numa_env *env, rcu_read_lock(); cur = rcu_dereference(dst_rq->curr); - if (cur && ((cur->flags & PF_EXITING) || is_idle_task(cur))) + if (cur && ((cur->flags & (PF_EXITING | PF_KTHREAD)) || + !cur->mm)) cur = NULL; /* -- 2.25.1