From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F68FC433FE for ; Tue, 11 Oct 2022 17:12:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DCF8A6B0073; Tue, 11 Oct 2022 13:12:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D54906B0074; Tue, 11 Oct 2022 13:12:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCEC36B0075; Tue, 11 Oct 2022 13:12:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A20A46B0073 for ; Tue, 11 Oct 2022 13:12:12 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 48FF34034E for ; Tue, 11 Oct 2022 17:12:12 +0000 (UTC) X-FDA: 80009311704.16.8309BBF Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com [209.85.219.173]) by imf06.hostedemail.com (Postfix) with ESMTP id D4ADA18001F for ; Tue, 11 Oct 2022 17:12:11 +0000 (UTC) Received: by mail-yb1-f173.google.com with SMTP id b145so17307239yba.0 for ; Tue, 11 Oct 2022 10:12:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=KnLsDrpRlyfA2SNwggSxJNapTK/NC3BUKgKfLg4Iv+c=; b=DfClpkCStJqczyYfUb3/tTsHkX3dz2C9oKdUmETpb3hbcyXMXODTfFZ27smSo3D0bJ bP/VH/3up9gv7YJj5hwX+74b7Wnd2bvRwDY5cyR/RC8R/McLBwbyUt5/qTzIe3aC7NM+ 89UY0NFp2NmLzRmQRsGuDfSzxoxmUY0gl8MRs1wsp3vIbmqGXXs/tTCgbgbVRpgzAJUT oupMfv3H9onmYITGeHJvt3mFZj3fS99fM6UqHg3GOpuyvVQaV8Z2Poge/9/eTVyUDlQ9 R2Tnf3V3qhcuIU3AqixGu0U1XA7by0mSmgRdGsloCf1qhSRhRG85FkqsDmrYCFlOrDN/ AolQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=KnLsDrpRlyfA2SNwggSxJNapTK/NC3BUKgKfLg4Iv+c=; b=Ur/zn9ob0Tu+a46SMgjp2uVtQ8W3tXZFvKht0W07q7fXUnwGl149NE9N3VUWRhTXBJ nABDW3Ru+BeH+skslhWOK6p8QAaKn3Ux4GZkrgF4/K3/H7By+/Lod7vOfmf4GRmIc65c sUXo7oExZG3S3HCwiyQqzsNA5vTF1hykNiVdM4Zj+BMgqpiPOvuCQ9t33Vl5F7AhFrrh 9ntO+AwPZPAyUS40rPq+ats4/dUYUPa6zqUXQpESx4EVk9YT4VMhGzAtHGYD2GXpOV6g MMaPtIQiLm4xcHHn5HyyjuDVWHKUlf7qEDi/hz+y74lalPQBWYEzLYa1VkF1AJBJc8pw q97Q== X-Gm-Message-State: ACrzQf3bBDd2oWQc6AIV4A3gkWmkat9g46JaB+3xRvsItMIK0fHvaVgc 8r93kYUcVzft2EDnjuP5ARdeYgyuapbGeD9qKPBXPA== X-Google-Smtp-Source: AMsMyM7ehCqLm7UIuqO0H1v4PFxI3U8r1zE0+cJdP8FObg01axHVhLJRmcUW0bxfdDOWTF0gnEbq6SkRe48zKp/h/zI= X-Received: by 2002:a25:49c6:0:b0:6bc:17dc:4441 with SMTP id w189-20020a2549c6000000b006bc17dc4441mr24505236yba.593.1665508329942; Tue, 11 Oct 2022 10:12:09 -0700 (PDT) MIME-Version: 1.0 References: <20221011113818.340-1-hdanton@sina.com> In-Reply-To: <20221011113818.340-1-hdanton@sina.com> From: Suren Baghdasaryan Date: Tue, 11 Oct 2022 10:11:58 -0700 Message-ID: Subject: Re: PSI idle-shutoff To: Hillf Danton Cc: Pavan Kondeti , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, quic_charante@quicinc.com Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665508331; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KnLsDrpRlyfA2SNwggSxJNapTK/NC3BUKgKfLg4Iv+c=; b=TGDhQhcEtqE+8zdpQjErF4dnh4QGF9V6XRkU1NLtFgZ9S/nYbLzrKuKbk2zdrImaaZMwgj PS0zXMyX/yz7rzYsah3kguiEDb8p7RQd3k8XOwmfRcOB6DZEzA/acCbTrrAjwvqPTOMpJr ucNg0MCQETbeFLM+xykkwWrjNwjQr0E= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=DfClpkCS; spf=pass (imf06.hostedemail.com: domain of surenb@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665508331; a=rsa-sha256; cv=none; b=LvpyKRej5ypXr4D+sysQzepBUlhaSOspn1lqspPVUFP4s1U79pmaTcdwnguquNtAXU9ZKd 1eRIttM2KngrgbTklczZr5FbUSdAS74ocMz4eRGdYMPDjblbkoNw/bi3XsSHotWzEiRMow eXlzSN/682k49Hbe3sOeRAx/2foKgyA= X-Rspam-User: X-Rspamd-Server: rspam11 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=DfClpkCS; spf=pass (imf06.hostedemail.com: domain of surenb@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: ep4wsijo39twbi77pejo33d34p73jwwz X-Rspamd-Queue-Id: D4ADA18001F X-HE-Tag: 1665508331-656509 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 11, 2022 at 4:38 AM Hillf Danton wrote: > > On 10 Oct 2022 14:16:26 -0700 Suren Baghdasaryan > > On Mon, Oct 10, 2022 at 3:57 AM Hillf Danton wrote: > > > On 13 Sep 2022 19:38:17 +0530 Pavan Kondeti > > > > Hi > > > > > > > > The fact that psi_avgs_work()->collect_percpu_times()->get_recent_times() > > > > run from a kworker thread, PSI_NONIDLE condition would be observed as > > > > there is a RUNNING task. So we would always end up re-arming the work. > > > > > > > > If the work is re-armed from the psi_avgs_work() it self, the backing off > > > > logic in psi_task_change() (will be moved to psi_task_switch soon) can't > > > > help. The work is already scheduled. so we don't do anything there. > > > > > > > > Probably I am missing some thing here. Can you please clarify how we > > > > shut off re-arming the psi avg work? > > > > > > Instead of open coding schedule_delayed_work() in bid to check if timer > > > hits the idle task (see delayed_work_timer_fn()), the idle task is tracked > > > in psi_task_switch() and checked by kworker to see if it preempted the idle > > > task. > > > > > > Only for thoughts now. > > > > > > Hillf > > > > > > +++ b/kernel/sched/psi.c > > > @@ -412,6 +412,8 @@ static u64 update_averages(struct psi_gr > > > return avg_next_update; > > > } > > > > > > +static DEFINE_PER_CPU(int, prev_task_is_idle); > > > + > > > static void psi_avgs_work(struct work_struct *work) > > > { > > > struct delayed_work *dwork; > > > @@ -439,7 +441,7 @@ static void psi_avgs_work(struct work_st > > > if (now >= group->avg_next_update) > > > group->avg_next_update = update_averages(group, now); > > > > > > - if (nonidle) { > > > + if (nonidle && 0 == per_cpu(prev_task_is_idle, raw_smp_processor_id())) { > > > > This condition would be incorrect if nonidle was set by a cpu other > > than raw_smp_processor_id() and > > prev_task_is_idle[raw_smp_processor_id()] == 0. > > Thanks for taking a look. Thanks for the suggestion! > > > IOW, if some activity happens on a non-current cpu, we would fail to > > reschedule psi_avgs_work for it. > > Given activities on remote CPUs, can you specify what prevents psi_avgs_work > from being scheduled on remote CPUs if for example the local CPU has been > idle for a second? I'm not a scheduler expert but I can imagine some work that finished running on a big core A and generated some activity since the last time psi_avgs_work executed. With no other activity the next psi_avgs_work could be scheduled on a small core B to conserve power. There might be other cases involving cpuset limitation changes or cpu offlining but I didn't think too hard about these. The bottom line, I don't think we should be designing mechanisms which rely on assumptions about how tasks will be scheduled. Even if these assumptions are correct today they might change in the future and things will break in unexpected places. > > > This can be fixed in collect_percpu_times() by > > considering prev_task_is_idle for all other CPUs as well. However > > Chengming's approach seems simpler to me TBH and does not require an > > additional per-cpu variable. > > Good ideas are always welcome. No question about that. Thanks!