From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BE46C433EF for ; Wed, 10 Nov 2021 01:37:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7AD2E61107 for ; Wed, 10 Nov 2021 01:37:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7AD2E61107 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id BECDC6B0071; Tue, 9 Nov 2021 20:37:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B9C666B0072; Tue, 9 Nov 2021 20:37:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8ABB6B0073; Tue, 9 Nov 2021 20:37:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0111.hostedemail.com [216.40.44.111]) by kanga.kvack.org (Postfix) with ESMTP id 999906B0071 for ; Tue, 9 Nov 2021 20:37:22 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 4969A184E36D6 for ; Wed, 10 Nov 2021 01:37:22 +0000 (UTC) X-FDA: 78791308050.05.EA0F105 Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf23.hostedemail.com (Postfix) with ESMTP id 2EB2390003B9 for ; Wed, 10 Nov 2021 01:37:07 +0000 (UTC) Received: by mail-qt1-f181.google.com with SMTP id v4so682106qtw.8 for ; Tue, 09 Nov 2021 17:37:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mFSvIXLvuM5WjGRK19uU3La2y2+mphjkkQvNxDEReck=; b=MjbowGEkIBxR3fNlzyzLDUyz/7KxvRAEMfqE+K5RzoWnB2aX/RYLRgWb7ocRe7I5jb uUQnpbiV37I2O63es1OsESl8qEcdSlG3YKo2k5SOoE2LMZLvbqu6PBZZZbJRSPezUDET Dj/1HmKaU5gYinTY7QvPwwjUhDpjHxPXIKSCad1ygzXLIKIU3kBlUVEAbNhwyqNUxVIg 0OfD+yEvQZtDeoyggjC/RvTlCl3P5qWY3Q+8Tq9BTMoDG9AcMvYyj/KodABo0kll2EYE J5YXc2ncolES1hwRPW5HPI/2PlAgXWO+Xm17SJcqUDuPF3K/glgrDzfv2ANCXfDuhk4d En7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mFSvIXLvuM5WjGRK19uU3La2y2+mphjkkQvNxDEReck=; b=nZx/tUrBbUpRfkwTRtNW4H2It019/okT5/dm7Oxizu791Y+yvQvwHUrnMaNF0kb6H9 JErTetBoeVyP55jspvda9P62a8IporKkp4xGbG+qBbQ2DVgnL20YuZEzwrO4H07mIS5i 2+Zh91ru3yKxixPRdDlmtGuF3xZEBbCL4t1OnzlrKxXSsotPJ8Oim7DVoJCqNP7ONyBp K6nS/zOD6KrQcVxl7wD/GrKio/zVWAKVgvDzNrRzfbTpQaCQdT/bLmQmQVxqdj9MaWOH +EE7e7Egpel5FqSrNkN+1Kc907vUf1h1qyOmdshNuSUfjMDM3kD30eEAEkT4JEKkpme0 gmkA== X-Gm-Message-State: AOAM532l40vPkpi8vwEw7QjYGBe9jEIGnLD4dn+3qRTvNSkwCVUlG+1c JOWUNVmE6do4pFwPEV1hNIYrSzJo6hq0jfgfXx8= X-Google-Smtp-Source: ABdhPJxe01kC/jo6H85bZ52e7Vud7jVAEE7fdwfh4ocPE8ivOcLUrQW87+wnvtG4mjZbs1jEgLD6SFrRWJk+RAOzksg= X-Received: by 2002:ac8:580b:: with SMTP id g11mr13279957qtg.272.1636508241235; Tue, 09 Nov 2021 17:37:21 -0800 (PST) MIME-Version: 1.0 References: <1634278612-17055-1-git-send-email-huangzhaoyang@gmail.com> In-Reply-To: From: Zhaoyang Huang Date: Wed, 10 Nov 2021 09:37:00 +0800 Message-ID: Subject: Re: [Resend PATCH] psi : calc cfs task memstall time more precisely To: Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Ke Wang , xuewen.yan@unisoc.com Cc: Johannes Weiner , Andrew Morton , Michal Hocko , Vladimir Davydov , Zhaoyang Huang , "open list:MEMORY MANAGEMENT" , LKML Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2EB2390003B9 X-Stat-Signature: 1ai8pw3j9c4hwnj94af68ngq7jjjxjwa Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=MjbowGEk; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com X-HE-Tag: 1636508227-194463 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 9, 2021 at 10:56 PM Peter Zijlstra wrote: > > On Tue, Nov 02, 2021 at 03:47:33PM -0400, Johannes Weiner wrote: > > CC peterz as well for rt and timekeeping magic > > > > On Fri, Oct 15, 2021 at 02:16:52PM +0800, Huangzhaoyang wrote: > > > From: Zhaoyang Huang > > > > > > In an EAS enabled system, there are two scenarios discordant to current design, > > > > > > 1. workload used to be heavy uneven among cores for sake of scheduler policy. > > > RT task usually preempts CFS task in little core. > > > 2. CFS task's memstall time is counted as simple as exit - entry so far, which > > > ignore the preempted time by RT, DL and Irqs. > > It ignores preemption full-stop. I don't see why RT/IRQ should be > special cased here. As Johannes comments, what we are trying to solve is mainly the preempted time of the CFS task by RT/IRQ, NOT the RT/IRQ themselves. Could you please catch up the recent reply of Dietmar, which maybe provide more information. > > > > With these two constraints, the percpu nonidle time would be mainly consumed by > > > none CFS tasks and couldn't be averaged. Eliminating them by calc the time growth > > > via the proportion of cfs_rq's utilization on the whole rq. > > > > > +static unsigned long psi_memtime_fixup(u32 growth) > > > +{ > > > + struct rq *rq = task_rq(current); > > > + unsigned long growth_fixed = (unsigned long)growth; > > > + > > > + if (!(current->policy == SCHED_NORMAL || current->policy == SCHED_BATCH)) > > > + return growth_fixed; > > > + > > > + if (current->in_memstall) > > > + growth_fixed = div64_ul((1024 - rq->avg_rt.util_avg - rq->avg_dl.util_avg > > > + - rq->avg_irq.util_avg + 1) * growth, 1024); > > > + > > > + return growth_fixed; > > > +} > > > + > > > static void init_triggers(struct psi_group *group, u64 now) > > > { > > > struct psi_trigger *t; > > > @@ -658,6 +675,7 @@ static void record_times(struct psi_group_cpu *groupc, u64 now) > > > } > > > > > > if (groupc->state_mask & (1 << PSI_MEM_SOME)) { > > > + delta = psi_memtime_fixup(delta); > > > > Ok, so we want to deduct IRQ and RT preemption time from the memstall > > period of an active reclaimer, since it's technically not stalled on > > memory during this time but on CPU. > > > > However, we do NOT want to deduct IRQ and RT time from memstalls that > > are sleeping on refaults swapins, since they are not affected by what > > is going on on the CPU. > > I think that focus on RT/IRQ is mis-guided here, and the implementation > is horrendous. > > So the fundamental question seems to be; and I think Johannes is the one > to answer that: What time-base do these metrics want to use? > > Do some of these states want to account in task-time instead of > wall-time perhaps? I can't quite remember, but vague memories are > telling me most of the PSI accounting was about blocked tasks, not > running tasks, which makes all this rather more complicated. memstall time is counted as exit - enter, which include both blocked and running stat. However, we think the blocked time introduced by preemption of RT/IRQ/DL are memstall irrelevant(should be eliminated), while the ones between CFS tasks could be. Thanks for the mechanism of load tracking, the implementation could be simple by calculating the proportion of CFS_UTIL among the whole core's capacity. > > Randomly scaling time as proposed seems almost certainly wrong. What > would that make the stats mean? It is NOT randomly scaling, but scales in each record_times for CFS tasks.