From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04EDCC3600E for ; Tue, 17 Sep 2024 11:07:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D8796B0092; Tue, 17 Sep 2024 07:07:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 489EC6B0093; Tue, 17 Sep 2024 07:07:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 328C26B0096; Tue, 17 Sep 2024 07:07:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 15D046B0092 for ; Tue, 17 Sep 2024 07:07:31 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8885F1C28D0 for ; Tue, 17 Sep 2024 11:07:30 +0000 (UTC) X-FDA: 82573954260.16.D2F61C8 Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) by imf09.hostedemail.com (Postfix) with ESMTP id 9201F140009 for ; Tue, 17 Sep 2024 11:07:28 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=dNDXBOkb; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf09.hostedemail.com: domain of mhocko@suse.com designates 209.85.221.53 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726571239; a=rsa-sha256; cv=none; b=XIR+RgncGwkOxoS9hxiZwSMRNQXiDxxvWaSzk/kq7I3FY7P8VbVShoY02zPUob8WkbwsEa bhAxVi36gjNfIPbMMcr0L5MoZCvO6CK0z4uCTPTaQoIJeVF+DTn4QNIklRGeDd3V9g46ik 7OgvFScej2bYo8pw7iwOFiy8UeUK4J4= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=dNDXBOkb; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf09.hostedemail.com: domain of mhocko@suse.com designates 209.85.221.53 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726571239; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D51B76i/syphr7fBrFBR8DX4oQsj03G++/BsgOPPFu4=; b=8iBiA/Fm9C2533jifLeWV/VgcIizSq7mNQH3isOgw1pCd2x46pmvkAa2O6XGa+Nei2R4Q1 c0UOzp47gp56hginp1OTFBnK+oAb63T2tLT/1T2PK2yaspN9eTzMAm31hp+gI3lRX4eU3b /SuaRn29zyVkFPF2bvC0gq/k/y4U4o4= Received: by mail-wr1-f53.google.com with SMTP id ffacd0b85a97d-374c1120a32so3736475f8f.1 for ; Tue, 17 Sep 2024 04:07:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1726571247; x=1727176047; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=D51B76i/syphr7fBrFBR8DX4oQsj03G++/BsgOPPFu4=; b=dNDXBOkbiOtaNdUnZLTHOb1XiYQ2fSWcRAUORZ9jMn7AXRwWioJ+a9VHACa4ekBYTS SOn55qD7HT6P+yBl+V/Ivu+uX9AqMCPb3gi3+1hTCfwczQVD6nFPtkGzVC/K35jqwo1D l26MgR5JsNtrd2n64GcMYVMezMZWtSLQ8DgUefGBPgqp2R/bEWXT4QyL0ob7ZKUpmdGZ v5CoHTs8wPnZAvRfCtkcpChZU5MjCtq1P2POvnDCFYqwVagj4UvsXq0cwZpB4MDZ0LS1 5YtC3fMPyASDLT94cCupywjqxGWrj+lD9VGwc7DneV5N/JKn92+iByIEWnjM2bFIOeoz N86Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726571247; x=1727176047; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=D51B76i/syphr7fBrFBR8DX4oQsj03G++/BsgOPPFu4=; b=xEi2Ggy5tsJl2/R8aFyFYnkG+4iWZEv4uzs6VqFGwyNZvX0lfH+A911Edcir7ohwat rg3WwW4whWT0Jdfn9lTjT0+NLGXsRxHZXVTBH2BRFENiKuLAEXu35OTP64vARvMzmk0m 9HTHLFuN5669gDDFzkXht7qL1aaD1aMVPTztA0+NRgBBKZ6sOuXy83qCCyJg5GvZoEkE QYOzRQI3WV7+LB3EuDiMhRATqIw0/5au5j9dFj2tUePp6pjAEoeO6J2cN7tnr6iqAPQE +SGEGa+Qwr4Uh5RScsrTblpVzXHc7NM4MAalRQhGcz4t4a3i3p4Qd8dunq5ZJLKRC5uU xbAw== X-Forwarded-Encrypted: i=1; AJvYcCW6mAJoObMt7doamayi+pZSvw1tby4dc7dJubAmHZ1Mgn1AUBNxPC2VJYUtG/3BHUNoQcfzETlYPw==@kvack.org X-Gm-Message-State: AOJu0YyQWMcV7a0mYQwnXlwP1AaNgOOQZx6ZDtFWdKGGXpnBEDGFWsAA NYAORVvxIFf1azZ0TfBc0xxKZ9Mv48Bfm8L2bf3y35kvalusKjuUD3SOerXtbhs= X-Google-Smtp-Source: AGHT+IEOVlYSleyjY2/K9oNfyzgog+SKayd+ffTDMu5r/Gl/KopEsVR3ZiFiNUx57eV/p1k50ijMkg== X-Received: by 2002:adf:f9c5:0:b0:374:b675:6213 with SMTP id ffacd0b85a97d-378c2d4c938mr11870489f8f.45.1726571246795; Tue, 17 Sep 2024 04:07:26 -0700 (PDT) Received: from localhost (109-81-94-244.rct.o2.cz. [109.81.94.244]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-378e7805185sm9195857f8f.96.2024.09.17.04.07.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Sep 2024 04:07:26 -0700 (PDT) Date: Tue, 17 Sep 2024 13:07:25 +0200 From: Michal Hocko To: Frederic Weisbecker Cc: LKML , Andrew Morton , Kees Cook , Peter Zijlstra , Thomas Gleixner , Vlastimil Babka , linux-mm@kvack.org, "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Boqun Feng , Zqiang , rcu@vger.kernel.org Subject: Re: [PATCH 12/19] kthread: Default affine kthread to its preferred NUMA node Message-ID: References: <20240916224925.20540-1-frederic@kernel.org> <20240916224925.20540-13-frederic@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Stat-Signature: d37g76kpqcj85hzhqmmq8f31fb8kbdw3 X-Rspamd-Queue-Id: 9201F140009 X-Rspamd-Server: rspam02 X-HE-Tag: 1726571248-595711 X-HE-Meta: U2FsdGVkX19T6/wpM3nCa6ux653GwT1xBJHF0rxT3Zz0Q+1mFXP+E++msAOdaeaz5NJbKN5hhLFUhzmSUytae4jKDjjgz1UiPb/azy8hKs9he8fhngZAZ6e5uM//CFrnSaWwZjdr5yvVYPYno+zirUlN5L6KvtX1rFvP1QTSbvfXUktu7W3bCKVlH/G6quLFUzLXeeKziGgrsaTo4A/w2ENCdiNlWjqInnEmxhNllANPdx3/sM+nLrKEcmfhXs76sdGhlt4hUCURsbXKbtAPEE36IsHbw9w9/Y4nbbneUlJukdlWvyS2NPg8mEYcUp7kljsuKzSaoY6mWyL5wHrPQZHqjPOoMf3A15sT+52Uj63nU4xEDq3bsCyx43AoNVPHHHt0rpi/jFe357DCEYI1XckvhD79J4SNe8AOs3w+ZigcHSxKrgPXA2XLSpng2paqg4x1UrS4xgdODySlKqBX1oHv3ejqtRVOemfvlEhhj6MfLAgaZNRnKQte0XV9++xsnW3DfGyxWNay/SlWONLJ/j1JVKXYNHkiV0UGiJMOIzUNh8ARLUSNZp3iVJdBwoRk2wY3dp/rQ7F3tKUus2pq1ycLuxPQAJ5+Um5KV6DM56OnHCFMcKKY/l3NZF1Ur3Z6bSp4Lf2sIpGFmyyQbZTs9hjNXjkc49gCKaRgmQQDAjOEfjoQBu15f0nUCmanWSCRQ/5wBB0s47sKTwPQPLA1IFiMURkmQmNbeBcYZg8iRgtYXI5DZHmjCeKhhVZ0dIDr04ka3yvzIz4ciAt9VxQOEc1wHtm4Pz/pALs25JJh0XYJ/2vSOWu/QTfbm0Jtk6Nlu1rEhmxebEk7ytmKdvmKk/Qki7eY/g3338KYM8QtOKK3IgMm2lPxBltRbTnzpUoLgr6QWRRsu6n4MQ47KX6Y/7kD6kPM1pho7jN557krKV3O9kvaefrBB0vM0XmQ8lN+bFaJe07clpdrZBciuSo Uk7zWkfP Zd2BwL/6cCz5x4E/xXKYt5MfbzHWe6YlQ91g9ezsD5GmNDbWJk5d5b8BhirLqfbdG/xuYxsHzqocBrp5gKKzntSnmJDQePrCTvVQG2wYS//eUIdFDQBrKHuCC3a8qBnRFvlSfje6PePWngaGjS165imZk0y8slkfTpSYAl7JSX5i/AfSGExf3WCOssKOjTZwn4jZ05xsZWz76QjfSM5+t7So8YmlfUGRip9Kh/JlbJenxhYaINblhtixZP6aRkcGqyeqityJddL/h3O5dCn2tuwwbM55YTuqdFHli9wiFvYAr1PrEnVTkBxO8GKGOXfrJTLgODOe+zvZ3KS/Li8lYtiEce4bUmBWZOxkrO4706/qZ97lwsZn5SJHAi87qrpWfUk3RUX13ts3/zT6K6E6cbXExHw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue 17-09-24 12:34:51, Frederic Weisbecker wrote: > Le Tue, Sep 17, 2024 at 08:26:49AM +0200, Michal Hocko a écrit : > > On Tue 17-09-24 00:49:16, Frederic Weisbecker wrote: > > > Kthreads attached to a preferred NUMA node for their task structure > > > allocation can also be assumed to run preferrably within that same node. > > > > > > A more precise affinity is usually notified by calling > > > kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup. > > > > > > For the others, a default affinity to the node is desired and sometimes > > > implemented with more or less success when it comes to deal with hotplug > > > events and nohz_full / CPU Isolation interactions: > > > > > > - kcompactd is affine to its node and handles hotplug but not CPU Isolation > > > - kswapd is affine to its node and ignores hotplug and CPU Isolation > > > - A bunch of drivers create their kthreads on a specific node and > > > don't take care about affining further. > > > > > > Handle that default node affinity preference at the generic level > > > instead, provided a kthread is created on an actual node and doesn't > > > apply any specific affinity such as a given CPU or a custom cpumask to > > > bind to before its first wake-up. > > > > Makes sense. > > > > > This generic handling is aware of CPU hotplug events and CPU isolation > > > such that: > > > > > > * When a housekeeping CPU goes up and is part of the node of a given > > > kthread, it is added to its applied affinity set (and > > > possibly the default last resort online housekeeping set is removed > > > from the set). > > > > > > * When a housekeeping CPU goes down while it was part of the node of a > > > kthread, it is removed from the kthread's applied > > > affinity. The last resort is to affine the kthread to all online > > > housekeeping CPUs. > > > > But I am not really sure about this part. Sure it makes sense to set the > > affinity to exclude isolated CPUs but why do we care about hotplug > > events at all. Let's say we offline all cpus from a given node (or > > that all but isolated cpus are offline - is this even > > realistic/reasonable usecase?). Wouldn't scheduler ignore the kthread's > > affinity in such a case? In other words how is that different from > > tasksetting an userspace task to a cpu that goes offline? We still do > > allow such a task to run, right? We just do not care about affinity > > anymore. > > Suppose we have this artificial online set: > > NODE 0 -> CPU 0 > NODE 1 -> CPU 1 > NODE 2 -> CPU 2 > > And we have nohz_full=1,2 > > So there is kswapd/2 that is affine to NODE 2 and thus CPU 2 for now. > > Now CPU 2 goes offline. The scheduler migrates off all > tasks. select_fallback_rq() for kswapd/2 doesn't find a suitable CPU > to run to so it affines kswapd/2 to all remaining online CPUs (CPU 0, CPU 1) > (see the "No more Mr. Nice Guy" comment). > > But CPU 1 is nohz_full, so kswapd/2 could run on that isolated CPU. Unless we > handle things before, like this patchset does. But that is equally broken as before, no? CPU2 is isolated as well so it doesn't really make much of a difference. > And note that adding isolcpus=domain,1,2 or setting 1,2 as isolated > cpuset partition (like most isolated workloads should do) is not helping > here. And I'm not sure this last resort scheduler code is the right place > to handle isolated cpumasks. Well, we would have the same situation with userspace tasks, no? Say I have taskset -p 2 (because I want bidning to node2) and that CPU2 goes offline. The task needs to be moved somewhere. And it would be last resort logic to do that unless I am missing anything. Why should kernel threads be any different? > So it looks necessary, unless I am missing something else? I am not objecting to patch per se. I am just not sure this is really needed. It is great to have kernel threads bound to non isolated cpus by default if they have node preferences. But as soon as somebody starts offlining cpus excessively and make the initial cpumask empty then select_fallback_rq sounds like the right thing to do. Not my call though. I was just curious why this is needed and it seems to me you are looking for some sort of correctness for broken setups. -- Michal Hocko SUSE Labs