From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 783DAC38142 for ; Wed, 1 Feb 2023 04:53:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD4E56B0071; Tue, 31 Jan 2023 23:53:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B84BE6B0072; Tue, 31 Jan 2023 23:53:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A73AA6B0074; Tue, 31 Jan 2023 23:53:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9798E6B0071 for ; Tue, 31 Jan 2023 23:53:21 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 50704C0321 for ; Wed, 1 Feb 2023 04:53:21 +0000 (UTC) X-FDA: 80417504202.02.93FDD66 Received: from r3-21.sinamail.sina.com.cn (r3-21.sinamail.sina.com.cn [202.108.3.21]) by imf13.hostedemail.com (Postfix) with ESMTP id F2A5F20011 for ; Wed, 1 Feb 2023 04:53:17 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf13.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.21 as permitted sender) smtp.mailfrom=hdanton@sina.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675227199; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bHgy+hEVMfRBPkIURqqNvv4lkTvazja9eFb6Gyfgl3U=; b=EbSizUgz5y269uciZSBIgiQRuEh9uQzcg4sdCb4Ol0k+BvCN7JU+NesjbzSyFYTD+RLB6X yyc+u7Vlkiz0VH13IL1VAITJcy/cAxZEy4jf+9lDNvrDkoJbmZo73noH7BFIUHKcqshy7Q pWqRhdrEx/kT9rJszU/ZKkuqURern3E= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf13.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.21 as permitted sender) smtp.mailfrom=hdanton@sina.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675227199; a=rsa-sha256; cv=none; b=e70y2/Q50Ltt7kMpUuZnoiX0yJFEjlz8UE0rR0HFXBOvnJFSg23SfquzYbz/KJhu+E/GSW xm2X2l5LH+pfyO9bEevsy835L+Qy/TqAjDm8tberLDWAluPmIekwjlqrDE90hqB0CVvfF8 usKsgTMXHu5j6V0Mr8sUu868XuWNtv0= Received: from unknown (HELO localhost.localdomain)([114.249.61.130]) by sina.com (172.16.97.32) with ESMTP id 63D9EF440001C67F; Wed, 1 Feb 2023 12:49:09 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 03511629998 From: Hillf Danton To: Thomas Gleixner Cc: Yu Liao , fweisbec@gmail.com, mingo@kernel.org, liwei391@huawei.com, adobriyan@gmail.com, mirsad.todorovac@alu.unizg.hr, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Peter Zijlstra Subject: Re: [PATCH RFC] tick/nohz: fix data races in get_cpu_idle_time_us() Date: Wed, 1 Feb 2023 12:53:02 +0800 Message-Id: <20230201045302.316-1-hdanton@sina.com> In-Reply-To: <87357q228f.ffs@tglx> References: <20230128020051.2328465-1-liaoyu15@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: F2A5F20011 X-Rspam-User: X-Stat-Signature: ei5tjdqp79mseprkaao38xgp49tugjja X-HE-Tag: 1675227197-491304 X-HE-Meta: U2FsdGVkX1/70EnfuipkKV56Ozv9bc56an4GAHcCh/svsi5VzxOMMsuH/UpHzLYkv3OEcfU4T7GxYaFxUcfp5cDva0FzfquC9EornvoDxqjQFI2JphRHaTyhn69Tv4o37MgM5al48/+AoVV8QTxGmGxVCSCocRN85KkDQI60d5aH7zO4R7b1Aybc0pyrBtdRjR+hYGR9IfBAmEqMbhmiMwyizTCdEtznk9xoIky9ByqU4ZaP1MLxDqa8UPZ9hu0evQGtzivW6aIv0jA3K4gm5WrnZubce12OBbbL/MJeuO5sFIgW31LA9WZIC4Klej/5BIfusAGvpfqHfHVlLX32makoVL0Z0BkPSfv8pUwmiuc+vdrS3yolzpx27GKFsRsVppzR3Xj29tNAJ3IYvrybMPo5v2GsHUwfd4qEVWMb6v2akPWFOFKnxubuHzm39sTV5TtjwTGHMp2ZC5EAvAN/4FSgBS7iQdaukxNWWGqpv5LPIpIswx1DlZNW/wszDIvPRYwrfWZrer2PN3kOeRKvvl5yQAFu7PEGZJIPRw30szRLopQsn3iG9Vq9/hjsq69V982Zk/HkbP1HPyw8blhn/P4trECDlyTH9x75wACCgWnuidGQ5xZ8KR0p05sJMopmxIsdI3E99BP0sWOlPhEHRxCpSgA+hA81n3Y/8fhn9AUJ6T64wWQPMcr38vBmnFfxjB381EXeUr9k7eA//zJTcZGGptbjdBttIEStqVu9UMLWalmzkJmSLfynhK06DGlI1LNMB2ZLv5NpDh0NV/yWKfsKkVfomTSlsnX4/VVDEmZ2MIDaPUpTQLf/VEfmSBeY/QWqQOj8zKmY+c0Lifsvh8NJFumkgYFL0mJdiyZfjjBFO5+iYQVC0O+2+V6zhL8uyYhAm82Kf0acZ+8L0um17n30GBrP+bEL1jENm9lgH8a+VsgtjefznPMfJ8gFj8ken0qMbGilbjtbgMo4ndj K1vI8rm3 IjK3TYEqLL8l371+u3iRq9EiiEzqJnnMOpXmmmiNREKEvDcIvJyt9mNgEm2NZvCYrwkqYo4u270MLtgiuN2bAKhrs1sbRKYLK4nZVbWwHci9FUC/kiO/t0IAxxgsfaqtiIl3im1h3s2yNnNcQD7dWfNPDMw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 31 Jan 2023 15:44:00 +0100 Thomas Gleixner > > Seriously this procfs accuracy is the least of the problems and if this > would be the only issue then we could trivially fix it by declaring that > the procfs output might go backwards. It's an estimate after all. If > there would be a real reason to ensure monotonicity there then we could > easily do that in the readout code. > > But the real issue is that both get_cpu_idle_time_us() and > get_cpu_iowait_time_us() can invoke update_ts_time_stats() which is way > worse than the above procfs idle time going backwards. > > If update_ts_time_stats() is invoked concurrently for the same CPU then > ts->idle_sleeptime and ts->iowait_sleeptime are turning into random > numbers. > > This has been broken 12 years ago in commit 595aac488b54 ("sched: > Introduce a function to update the idle statistics"). [...] > > P.S.: I hate the spinlock in the idle code path, but I don't have a > better idea. Provided the percpu rule is enforced, the random numbers mentioned above could be erased without another spinlock added. Hillf +++ b/kernel/time/tick-sched.c @@ -640,13 +640,26 @@ static void tick_nohz_update_jiffies(kti /* * Updates the per-CPU time idle statistics counters */ -static void -update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time) +static u64 update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, + int io, u64 *last_update_time) { ktime_t delta; + if (last_update_time) + *last_update_time = ktime_to_us(now); + if (ts->idle_active) { delta = ktime_sub(now, ts->idle_entrytime); + + /* update is only expected on the local CPU */ + if (cpu != smp_processor_id()) { + if (io) + delta = ktime_add(ts->iowait_sleeptime, delta); + else + delta = ktime_add(ts->idle_sleeptime, delta); + return ktime_to_us(delta); + } + if (nr_iowait_cpu(cpu) > 0) ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta); else @@ -654,14 +667,12 @@ update_ts_time_stats(int cpu, struct tic ts->idle_entrytime = now; } - if (last_update_time) - *last_update_time = ktime_to_us(now); - + return 0; } static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now) { - update_ts_time_stats(smp_processor_id(), ts, now, NULL); + update_ts_time_stats(smp_processor_id(), ts, now, 0, NULL); ts->idle_active = 0; sched_clock_idle_wakeup_event(); @@ -698,7 +709,9 @@ u64 get_cpu_idle_time_us(int cpu, u64 *l now = ktime_get(); if (last_update_time) { - update_ts_time_stats(cpu, ts, now, last_update_time); + u64 ret = update_ts_time_stats(cpu, ts, now, 0, last_update_time); + if (ret) + return ret; idle = ts->idle_sleeptime; } else { if (ts->idle_active && !nr_iowait_cpu(cpu)) { @@ -739,7 +752,9 @@ u64 get_cpu_iowait_time_us(int cpu, u64 now = ktime_get(); if (last_update_time) { - update_ts_time_stats(cpu, ts, now, last_update_time); + u64 ret = update_ts_time_stats(cpu, ts, now, 1, last_update_time); + if (ret) + return ret; iowait = ts->iowait_sleeptime; } else { if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {