From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF7A0C3DA4A for ; Fri, 9 Aug 2024 05:50:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0EB26B0089; Fri, 9 Aug 2024 01:49:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E983F6B008A; Fri, 9 Aug 2024 01:49:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D5EA06B008C; Fri, 9 Aug 2024 01:49:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B6BA96B0089 for ; Fri, 9 Aug 2024 01:49:59 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 65CF3A098E for ; Fri, 9 Aug 2024 05:49:59 +0000 (UTC) X-FDA: 82431630918.11.438C254 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by imf14.hostedemail.com (Postfix) with ESMTP id 64868100008 for ; Fri, 9 Aug 2024 05:49:57 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux.microsoft.com header.s=default header.b=LnQXcxUW; dmarc=pass (policy=none) header.from=linux.microsoft.com; spf=pass (imf14.hostedemail.com: domain of ssengar@linux.microsoft.com designates 13.77.154.182 as permitted sender) smtp.mailfrom=ssengar@linux.microsoft.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723182524; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UhQ1H6gi9SswMxmgtyTkcFeAEphqY31W/5bcdJDEhYc=; b=RAGt3MuxpJTxfrJnavOvkXyMZb4YA7i/3yynrtdqB45/eBn/z1Jmr3PpppWLjRGdE48hJa zDoGTx/XBYP6vhop7q87Df2Lyp/Kbqb1tWXo0h0bFL4zCv0rMOr+37YM+fGBPYidNGfj/L JHDBltIPArZPyi+iTXpwwjfStuDk5oE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723182524; a=rsa-sha256; cv=none; b=1LSmEfASTV+aBl4zZjoLFdeicUQt2jiC46GK3xpyN+WRwUwSdzixjZfHDZozrpPBPJ0kUZ JpwuO2faUYmOe5ecNx/yCOJzhh+wSDUQeNYmzhN1e9el42TGt1h7zqH5O9LU7jbdKomYRe qcrokzCSvU7MuEHVF+rqbkizdMHzXS8= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=linux.microsoft.com header.s=default header.b=LnQXcxUW; dmarc=pass (policy=none) header.from=linux.microsoft.com; spf=pass (imf14.hostedemail.com: domain of ssengar@linux.microsoft.com designates 13.77.154.182 as permitted sender) smtp.mailfrom=ssengar@linux.microsoft.com Received: by linux.microsoft.com (Postfix, from userid 1127) id 233F020B7165; Thu, 8 Aug 2024 22:49:56 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 233F020B7165 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1723182596; bh=UhQ1H6gi9SswMxmgtyTkcFeAEphqY31W/5bcdJDEhYc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=LnQXcxUWVinPlqMhZwA94DjJIumNgXIxKrq05fH/f1aA2oD98BRffuzBW7SQ8J3Dd F4rsp5Q8uN3OXBr/nsU8RJ2JfgXrVNwA+BPxmQAMj8zRthUKzPzvg+CHfhNT5rK+0z +Il0hnN6bT5ieUvMurRXTsk3frsspRuZl1Ihwjxg= Date: Thu, 8 Aug 2024 22:49:56 -0700 From: Saurabh Singh Sengar To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ssengar@microsoft.com, wei.liu@kernel.org Subject: Re: [PATCH] mm/vmstat: Defer the refresh_zone_stat_thresholds after all CPUs bringup Message-ID: <20240809054956.GA12044@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> References: <1720169301-21002-1-git-send-email-ssengar@linux.microsoft.com> <20240808222006.72071b7b945b290e5270eb30@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20240808222006.72071b7b945b290e5270eb30@linux-foundation.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 64868100008 X-Stat-Signature: 1muo7ir4dmozxcjtqndcc9anij74t7tf X-Rspam-User: X-HE-Tag: 1723182597-729609 X-HE-Meta: U2FsdGVkX19XR7dsFLBS3xxW2ixtlM0L0LZhM92xVYXSqUxXkkpWqbtFFqXg19wT7HZiP58hS5Ni+8LcjgP4LzBJEvWtGgiEN8ZWT4Z2/uhina+W+XP2hmRIpmkGi3ahfRLb2vSQ6CvAUDkmtrfur77LOW36/g88+oJVTcPbvFVP196F1j+RUpvT2+Abbv0IvvfPbQwj2gezMMQZXNonIlupoy0WWe2THFY+VzZRBvk7KN4Yr71aBboSNH5dki7CY1vB6S+0b+svmY92gfpRd/5Bl10dHUyAkFfmaIAVDYEkV5Dph7uqo5eV3zPywZxc+UmOcDQWuo2sBm8TtBOIyvAl3AysiSktolOpG/A9uWQcPNSRPk0PxToZf6xmvUsQPnPUR01iKQsOMkzjTAsOhneCd7Ej9cK7kW4kA/Hf2HjiBICSM09E0TRanZBcxAijxxeCGurgbuSW3S2up0zKzrYanJvLcfRGYS/qAM99SzQPN5MYTLm/x4QNt4z9Q15C4LX8dEl7/aSNYBlI9nGMRbZbkAM6hgZwtMK8NRHZJSHGYnI8r6ksvNgw71J8fzqdoXKFTNT269dfh26LF25oLXffHAw/ppazDwNNHmkIzjE0VdOp29JvgrhA7PHaq3OjKLFbHgknVvGcDIKunxfO9kj1H2mRaWGpRpqAyaXFrFtwSZlliP5NZJDEgXjh4g5AsAuRKy6IkMMm55kzO4poyicfBOVKPOvHfdbZCoOcL5AQBEYc6C8Rrm8kkSTwB2NMHwi7roWCl2DqTrYfoZeEszhMD1gCoM0LQgbGl9dH0f6Gzq2UKAo0b1DdPcdGkyYwSwQ5K3jL2a4Hw15yXeYqiwvjD3k2AcVdFzZ26LotXJ66U9QltGSUwc5xgslosK5j+fWWtXo6Ce72Ou4fZW3kb/wp9Z0ugdc9YP/7UO9U1iEFJxob9WEF0QB7XYQkbX3sgEc7zB8tAvqPYaKLINC AvnuiefQ vG/I+r7lWv5LC69RIesC3e4OjOO4ywDl6OTap2S7mpWHexWu9gnrpHi3HR2hsZKfVjjBaH8sJM28kI7+EV5BlrfU3Lly5Ds8nwqMin4QQtwjFwlDtDtqvpEuXkBeBWmgck6moUb/ydSf7xfeL36FENvOFcO5Z3S30k8rs7x6r7kTCvoXLjESDBx2RcHlfRJ7jk9sGOZq8OCpGmn9K5o2PkRp40cjwglObAqh3GuSOMnebYcbSiFE524spF3g3twAFNOm36/xAP4KqOh0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 08, 2024 at 10:20:06PM -0700, Andrew Morton wrote: > On Fri, 5 Jul 2024 01:48:21 -0700 Saurabh Sengar wrote: > > > refresh_zone_stat_thresholds function has two loops which is expensive for > > higher number of CPUs and NUMA nodes. > > > > Below is the rough estimation of total iterations done by these loops > > based on number of NUMA and CPUs. > > > > Total number of iterations: nCPU * 2 * Numa * mCPU > > Where: > > nCPU = total number of CPUs > > Numa = total number of NUMA nodes > > mCPU = mean value of total CPUs (e.g., 512 for 1024 total CPUs) > > > > For the system under test with 16 NUMA nodes and 1024 CPUs, this > > results in a substantial increase in the number of loop iterations > > during boot-up when NUMA is enabled: > > > > No NUMA = 1024*2*1*512 = 1,048,576 : Here refresh_zone_stat_thresholds > > takes around 224 ms total for all the CPUs in the system under test. > > 16 NUMA = 1024*2*16*512 = 16,777,216 : Here refresh_zone_stat_thresholds > > takes around 4.5 seconds total for all the CPUs in the system under test. > > > > Calling this for each CPU is expensive when there are large number > > of CPUs along with multiple NUMAs. Fix this by deferring > > refresh_zone_stat_thresholds to be called later at once when all the > > secondary CPUs are up. Also, register the DYN hooks to keep the > > existing hotplug functionality intact. > > > > ... > > > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -31,6 +31,7 @@ > > > > #include "internal.h" > > > > +static int vmstat_late_init_done; > > #ifdef CONFIG_NUMA > > int sysctl_vm_numa_stat = ENABLE_NUMA_STAT; > > > > @@ -2107,7 +2108,8 @@ static void __init init_cpu_node_state(void) > > > > static int vmstat_cpu_online(unsigned int cpu) > > { > > - refresh_zone_stat_thresholds(); > > + if (vmstat_late_init_done) > > + refresh_zone_stat_thresholds(); > > > > if (!node_state(cpu_to_node(cpu), N_CPU)) { > > node_set_state(cpu_to_node(cpu), N_CPU); > > @@ -2139,6 +2141,14 @@ static int vmstat_cpu_dead(unsigned int cpu) > > return 0; > > } > > > > +static int __init vmstat_late_init(void) > > +{ > > + refresh_zone_stat_thresholds(); > > + vmstat_late_init_done = 1; > > + > > + return 0; > > +} > > +late_initcall(vmstat_late_init); > > OK, so what's happening here. Once all CPUs are online and running > around doing heaven knows what, one of the CPUs sets up everyone's > thresholds. So for a period, all the other CPUs are running with > inappropriate threshold values. > > So what are all the other CPUs doing at this point in time, and why is > it safe to leave their thresholds in an inappropriate state while they > are doing it? >From what I undersatnd these threshold values are primarily used by userspace tools, and this data will be useful post late_initcall only. If there’s a more effective approach to handle this, please let me know, and I can investigate further. - Saurabh