From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFC7EC05027 for ; Wed, 1 Feb 2023 12:02:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EC5F6B0071; Wed, 1 Feb 2023 07:02:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4754F6B0072; Wed, 1 Feb 2023 07:02:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33D076B0074; Wed, 1 Feb 2023 07:02:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 23CB36B0071 for ; Wed, 1 Feb 2023 07:02:49 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 02359140DB0 for ; Wed, 1 Feb 2023 12:02:48 +0000 (UTC) X-FDA: 80418586458.10.4A51BE5 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf04.hostedemail.com (Postfix) with ESMTP id 3300B4002F for ; Wed, 1 Feb 2023 12:02:46 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=H6MtAi9X; spf=pass (imf04.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675252966; a=rsa-sha256; cv=none; b=bJJj4UjiP03NkM+bhPSyQrx69Z5rW+iwU8g697mqRzfUrDOMWxQxTpdAwh0onEp+5x0GRU nWFtYVOXhwP5+Bkm0D8nDP8AwkQu8YmmJlTeuqw+LJYSW13TF6oJX594wigyG2Ni5RjoYI A1JvusGpf/rLD/LPOLxk6EbE3EpbT7I= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=H6MtAi9X; spf=pass (imf04.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675252966; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gY8ewPeNGYpFQS670FPLViSIDbUf6YOzIAMlj7aFDDU=; b=33CF+++aj7UKDYS3La1dFOtWKASzumWGVg+pqQDxLNDpoLyIUH14LXWT14rBcHj02SkJwQ U2IJiBS9JT7FINdbYnnG4Y75lcevVonpNcwqz8p8YPJj4Rip+IRiWale7iFOsD2DjDdhsK yGzF8DLLSc3OY7YbT+mi88F3w2N4LRA= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 132B86176A; Wed, 1 Feb 2023 12:02:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EB852C433D2; Wed, 1 Feb 2023 12:02:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1675252964; bh=41Xyg3jeP8WJJmfHvsoU4JXbVCVGlTLKPezm4OBpJK4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=H6MtAi9X147T7iYnUcH9rmp5gsgYxpihzHT+65ttexPl2R3YLw8AKfGRk5NUyB+Dp 9oaSGdwtrvhJ9CVgjhB2RZbZMZyOHsOwr1DPBapqmQ/e68Efv+swiJrHBoPFQYhcjY d+dC/JagWSz0hqMpdCW06GAe/5s1fjdeDUy46XiNabZne3duroJRdeS66f7bL7bYOn UyiAhiJGCPhASPE3BHIm00hz/05cLBi2oZcertJ2g7hOxiV9+HUZzhtpBoH+RMS3Z0 Qj6r4PUVGusg+Z0drEiaxh45M1bHsNksGh7xh6zcbryZGXsQfhU3ZLjTZw4hG9XQUD ZnAw1Ng4jCX6g== Date: Wed, 1 Feb 2023 13:02:41 +0100 From: Frederic Weisbecker To: Hillf Danton Cc: Thomas Gleixner , Yu Liao , fweisbec@gmail.com, mingo@kernel.org, liwei391@huawei.com, adobriyan@gmail.com, mirsad.todorovac@alu.unizg.hr, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Peter Zijlstra Subject: Re: [PATCH RFC] tick/nohz: fix data races in get_cpu_idle_time_us() Message-ID: References: <20230128020051.2328465-1-liaoyu15@huawei.com> <20230201045302.316-1-hdanton@sina.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230201045302.316-1-hdanton@sina.com> X-Rspam-User: X-Rspamd-Queue-Id: 3300B4002F X-Rspamd-Server: rspam01 X-Stat-Signature: xdmp8nemduiwpe1z53oapqxwebrho1ug X-HE-Tag: 1675252966-835370 X-HE-Meta: U2FsdGVkX19PH5cMs9kwF+jXMqtHxNvy3ptTtubYjmU5jKCy4bAbdDcqEEE0ZqynK912+PIWMRcK2La1V/aSr51BKH+PC9gS9UggI248/Q+SWn2LcrYB4Oo5tDgBwaM8CXt00S8KFSyhs+/qBw+g/M7EvlXKnUCnQbaI9PAsjoUmyEXZg0e6JCv8lWmErdlKBrkvyDctUFc5jncwdDRCUfqWDxLW2hqei2bMYi/pD4FWYYUoru+C0Jq+m4aeQwxbqLv9gDEI5PSHD3Gdc1mku/u/5kkBVZ3g+VE80rTj4c11avaMQF9DeMYyypTGpIQNP1bHaknRLkeUvunsn51H4dqT1F/jB/KKbcZNoPtSc3sNZ+5nvDVx/Apo7RPOK1tJB3dkIlsGQ7FdN/T64G2TZk7EWKIYmAkW4qdqwF/9QnlX+et7TwPLd1GhyZYYkvP+a1freno7964jwvZbM66lHExpa3d7pc5m5ce1mrWU3CyzQx2Bev9qkcCng04OHyqQFU+XkFvHj0KdurM8ABAIEgVab6V8saaYVHb3LBb8eDFhBcbuzh+/yEQg6AVJgOuiTMki+Q8U0brKZ42r6Ygyl4WWA4DJyA0bCeJ+lmRWTEBN3Y9ESsCQ8cx5ArhrcpoCB0kGuLtEwQhY4sxYbiHY5GPmdfDPqbvObLhi5cbRe3Y4rEjXoB4G/B1WCU48q4RfpCWmVYlToeXpRSn6FGZIy8fylJ92fx9w/htc5i19NyAyJop4keVy9fOTL5OuAM/+If26TL5/jRHTLmiwky0c5jVxjqKZ+Ob0lfbBoIXKG3ThAZqz+n352wqzcDVU4cJgg9miWlUqzB74hDuzKK0sgIAU8DVp8l7ZWrbGPHrN+dG3wBuiZAcSMKDgv93Btkb/p5MlleLiajtVX87EncQ2u20AE/bvKj8xZMu5puW1hbEU69RHgI9rVZULbEhheABC3StUG9DcFXeRJVlZ9OD R/KI6x5l MZFjqwURqdsBnFnVvmweDMDNbATpWlJ37ligpSiVfo4xeZJCrL1SjBhHN+WGxMxFM7ziIXmt7BtfvU0pLMFLlyguqMvcn1I8ur1+1ZJzQ+XMsfXIpZTv0VNjM2VBvwU00OLCMRSQ70jsFnWNdrxwowFbpTZW+MnL5qBM6jG3MaAJQAO5/VEZ0p9Kv2Mua6Rs4z9v34nHlKRAIdxchxhx/BgQI3hCUZ2FY1aTV6fBmOoeP4IwY1CKCDX2MOnIR7kiERcP9S1LEqIU2ehVsoyFoR655wSYn/GU+DwNmrZzIp+zmkEUboDUoiBJeg8LHnyAPnntT7x8x/6qVnwNQcRRvHEwiRDqLQMoynIhQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 01, 2023 at 12:53:02PM +0800, Hillf Danton wrote: > On Tue, 31 Jan 2023 15:44:00 +0100 Thomas Gleixner > > > > Seriously this procfs accuracy is the least of the problems and if this > > would be the only issue then we could trivially fix it by declaring that > > the procfs output might go backwards. It's an estimate after all. If > > there would be a real reason to ensure monotonicity there then we could > > easily do that in the readout code. > > > > But the real issue is that both get_cpu_idle_time_us() and > > get_cpu_iowait_time_us() can invoke update_ts_time_stats() which is way > > worse than the above procfs idle time going backwards. > > > > If update_ts_time_stats() is invoked concurrently for the same CPU then > > ts->idle_sleeptime and ts->iowait_sleeptime are turning into random > > numbers. > > > > This has been broken 12 years ago in commit 595aac488b54 ("sched: > > Introduce a function to update the idle statistics"). > > [...] > > > > > P.S.: I hate the spinlock in the idle code path, but I don't have a > > better idea. > > Provided the percpu rule is enforced, the random numbers mentioned above > could be erased without another spinlock added. > > Hillf > +++ b/kernel/time/tick-sched.c > @@ -640,13 +640,26 @@ static void tick_nohz_update_jiffies(kti > /* > * Updates the per-CPU time idle statistics counters > */ > -static void > -update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time) > +static u64 update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, > + int io, u64 *last_update_time) > { > ktime_t delta; > > + if (last_update_time) > + *last_update_time = ktime_to_us(now); > + > if (ts->idle_active) { > delta = ktime_sub(now, ts->idle_entrytime); > + > + /* update is only expected on the local CPU */ > + if (cpu != smp_processor_id()) { Why not just updating it only on idle exit then? > + if (io) I fear it's not up to the caller to decides if the idle time is IO or not. > + delta = ktime_add(ts->iowait_sleeptime, delta); > + else > + delta = ktime_add(ts->idle_sleeptime, delta); > + return ktime_to_us(delta); > + } > + > if (nr_iowait_cpu(cpu) > 0) > ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta); > else But you kept the old update above. So if this is not the local CPU, what do you do? You'd need to return (without updating iowait_sleeptime): ts->idle_sleeptime + ktime_sub(ktime_get(), ts->idle_entrytime) Right? But then you may race with the local updater, risking to return the delta added twice. So you need at least a seqcount. But in the end, nr_iowait_cpu() is broken because that counter can be decremented remotely and so the whole thing is beyond repair: CPU 0 CPU 1 CPU 2 ----- ----- ------ //io_schedule() TASK A current->in_iowait = 1 rq(0)->nr_iowait++ //switch to idle // READ /proc/stat // See nr_iowait_cpu(0) == 1 return ts->iowait_sleeptime + ktime_sub(ktime_get(), ts->idle_entrytime) //try_to_wake_up(TASK A) rq(0)->nr_iowait-- //idle exit // See nr_iowait_cpu(0) == 0 ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime) Thanks.