From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17E88C36009 for ; Tue, 17 Sep 2024 10:34:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B7EC6B008A; Tue, 17 Sep 2024 06:34:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 868346B0093; Tue, 17 Sep 2024 06:34:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 730066B0096; Tue, 17 Sep 2024 06:34:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 55B266B008A for ; Tue, 17 Sep 2024 06:34:59 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 062E61A02DE for ; Tue, 17 Sep 2024 10:34:58 +0000 (UTC) X-FDA: 82573872318.09.20DC296 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf15.hostedemail.com (Postfix) with ESMTP id 4F75AA0019 for ; Tue, 17 Sep 2024 10:34:57 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=HQC6j6WM; spf=pass (imf15.hostedemail.com: domain of frederic@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726569241; a=rsa-sha256; cv=none; b=KBnRb1LBjRsOWGF5DlTJkJdbgM9vSDZznDwiuupSpFf3Oxcks4kwg+x4Bi+hpQCI3hQHOa /9WowdNFCOIdtGlDUI/xx3unHeQJMnmwvFXd+oU4pk9e7fVZcEQDqrDw3JQboR9jVdwoSi PsctJAYYYOx22HJAzjwbPkSgbujmB94= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=HQC6j6WM; spf=pass (imf15.hostedemail.com: domain of frederic@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726569241; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3imd8taKTaJXdfuVJOsqaFt9iGLSfAbK4spqAMs8LZg=; b=FGKJ5y9qZI7PMpL41j0bC9iZAQPEKBJJLmdw1RCz4CTp0Jn3Sb3v1/NXiQoDb10OTR4ZaY /iol74m9YCESVexwEqVcBYW8CJhCs/N82OJjLhOaWxe1B87AB/I89KaAdN1sWQeiLFP05b U3yRAGDIdPuISQe5q8O330SJvepy/OM= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 84ABCA40F45; Tue, 17 Sep 2024 10:34:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 556F1C4CEC6; Tue, 17 Sep 2024 10:34:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726569295; bh=jkoXYaDwCNxtdu1uXwJxHkE0KqMtq8XRlMuq5uYA7+8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=HQC6j6WM0pSGZOMjvw4jMiRTLPPxbzbIvA/OEmOQBzvMzc/RuMlAONRYmY/yA3i7F q0CqWxT6t2XuukQyNlUrhR2VU862bo6KAKam31zYfGT2FG0SO79V1SBg3/pG0iY82i bcsLb1yB0dvm6JbeJ0u9CBb4KwO30iP6zljXKG64l2IghIH2+1fZBk0UaZy/vtbdtK USojShQXCJnEK2IUvhLOzIdBaXTVkdwFiaI+eWIFubioDJZQ9FQV9Rtws6+9VGSd1z 1dt1YJy0y6wP/Xlh0aQoejbdvMy2hjlhrBF4Lk40fX3L4UA3hF6k/W6C16n8VdnNGU qAf3XH6WHM/8g== Date: Tue, 17 Sep 2024 12:34:51 +0200 From: Frederic Weisbecker To: Michal Hocko Cc: LKML , Andrew Morton , Kees Cook , Peter Zijlstra , Thomas Gleixner , Vlastimil Babka , linux-mm@kvack.org, "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Boqun Feng , Zqiang , rcu@vger.kernel.org Subject: Re: [PATCH 12/19] kthread: Default affine kthread to its preferred NUMA node Message-ID: References: <20240916224925.20540-1-frederic@kernel.org> <20240916224925.20540-13-frederic@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Stat-Signature: dixmm66wzeecdq4j9rsr1kthoz3y5zcs X-Rspamd-Queue-Id: 4F75AA0019 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1726569297-926751 X-HE-Meta: U2FsdGVkX1++Uhd98LdM8CXAlmtG88O8XeVIqmIwFAHeGDlX7zUdLVZC4vwaf1Eh52hIdHaatiqlQqju/i3+Nk3rDGTejncJpmRsT1DMDmJDfTBxa+pZ9SciDD/YPq9Fgq38Jq6r22q9uYGRQvGmoszr/OUqEkxSDIkCu+pPeGPdqofrp+ig+MgSzwkVw+9+96HWpb6sPTknH4z41fO6KoRmNrh+c8xWg+wpGkBxpY8gBugH1m1VcwlR4GIj6NI4hri+OmOaMcAcu7CHLKGsSV0i8aLiJ+W98wjz27naSyJhcicbMIeUFBQvtRaEltR0pRu2uThBGEvAXFI9zxUHKhuyBiFOl9fjylCro3X326/xBvyowm99QWAQza1cfUK92v1ySNHAGfGc6vnzCLOQKP5yLczlB36+EQvsn1Mw3mBrAoWURUwP3eMgFvxdUCMVZHlUe5c8mCOYkSc3R9rDGXS08CmTRLrfFWuNzhHqWLspPG2euGoDgEbHpr5Drs+SMw8e25fn31iMgd3CqzG1feS66YXF2LWlFszvZGLjzi38vocdYK8CA5EcyH79jG7NVB67NgEEmNd1HB1Q2OaYCVDYEyUm7n6WCvKtMgneuX1/5r9VF2RX3A7xxh5SaW217C/xiPETssp5lIw5j/OnL93KZM14WHkjrR7QqCpwuTW20QAh655ZzxtwsY+iRiqczrsQ1P802UhZcvG38wcshfnV249tgO4ZCEDSD8uHcRYT139zw+gRjh9RXTEPduRcxcq4ZSn4lr9NqrERcPwnPQJq/TRbT8mCK/bIWdq3Dsyipv5620uCURHLYOdVKxc0yBI01FGobQN886yd5vgJ++dX+J6CpNEO/aLRqzuZxIqyzZgH8rmlFV4XMDRgLH0R0g5l6KFQRsXKVbUUvwIi+RNDN2okmmj4/Ln/J3H1Hc6OmUbkBOvlHKCCpDsEzyflSqbEwGhU2rHUZWQs+A8 WRFqpS8C Yxk7AJXhPL2qQQPKtxcVWNsjfOHK7klDlF/Bl4L3AFyyeGz1LHBMWFriY4qBSBo4ZF+Fxbvo5YiQ4LI0nO03E/i1jOGcoCggy1VV4oOIDF2IBSnZ+9J8xv3tJOKHur+sAaq5zFA1iTsYHBt4Gu5gUUtoET/zmBLUezzHx/2NoYqJJlYFhPUbPiA7eb9QrjSZwKJhC0JUOpzJW0aX0o0J8kpgatVqpySAHsWNKZvSnDqKPhesbFh8+w99ZpPhfmJj5WuiO04CY8ZZPTy5Rq6SEjMDv511n4Age1ve9xiKRjjvOM53gQpk6XH9CKQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Le Tue, Sep 17, 2024 at 08:26:49AM +0200, Michal Hocko a écrit : > On Tue 17-09-24 00:49:16, Frederic Weisbecker wrote: > > Kthreads attached to a preferred NUMA node for their task structure > > allocation can also be assumed to run preferrably within that same node. > > > > A more precise affinity is usually notified by calling > > kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup. > > > > For the others, a default affinity to the node is desired and sometimes > > implemented with more or less success when it comes to deal with hotplug > > events and nohz_full / CPU Isolation interactions: > > > > - kcompactd is affine to its node and handles hotplug but not CPU Isolation > > - kswapd is affine to its node and ignores hotplug and CPU Isolation > > - A bunch of drivers create their kthreads on a specific node and > > don't take care about affining further. > > > > Handle that default node affinity preference at the generic level > > instead, provided a kthread is created on an actual node and doesn't > > apply any specific affinity such as a given CPU or a custom cpumask to > > bind to before its first wake-up. > > Makes sense. > > > This generic handling is aware of CPU hotplug events and CPU isolation > > such that: > > > > * When a housekeeping CPU goes up and is part of the node of a given > > kthread, it is added to its applied affinity set (and > > possibly the default last resort online housekeeping set is removed > > from the set). > > > > * When a housekeeping CPU goes down while it was part of the node of a > > kthread, it is removed from the kthread's applied > > affinity. The last resort is to affine the kthread to all online > > housekeeping CPUs. > > But I am not really sure about this part. Sure it makes sense to set the > affinity to exclude isolated CPUs but why do we care about hotplug > events at all. Let's say we offline all cpus from a given node (or > that all but isolated cpus are offline - is this even > realistic/reasonable usecase?). Wouldn't scheduler ignore the kthread's > affinity in such a case? In other words how is that different from > tasksetting an userspace task to a cpu that goes offline? We still do > allow such a task to run, right? We just do not care about affinity > anymore. Suppose we have this artificial online set: NODE 0 -> CPU 0 NODE 1 -> CPU 1 NODE 2 -> CPU 2 And we have nohz_full=1,2 So there is kswapd/2 that is affine to NODE 2 and thus CPU 2 for now. Now CPU 2 goes offline. The scheduler migrates off all tasks. select_fallback_rq() for kswapd/2 doesn't find a suitable CPU to run to so it affines kswapd/2 to all remaining online CPUs (CPU 0, CPU 1) (see the "No more Mr. Nice Guy" comment). But CPU 1 is nohz_full, so kswapd/2 could run on that isolated CPU. Unless we handle things before, like this patchset does. And note that adding isolcpus=domain,1,2 or setting 1,2 as isolated cpuset partition (like most isolated workloads should do) is not helping here. And I'm not sure this last resort scheduler code is the right place to handle isolated cpumasks. So it looks necessary, unless I am missing something else? And that is just for reaffine on CPU down. CPU up needs mirroring treatment and also it must handle new CPUs freshly added to a node. Thanks. > -- > Michal Hocko > SUSE Labs >