From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB780ECAAD5 for ; Mon, 12 Sep 2022 14:38:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 399EF6B0073; Mon, 12 Sep 2022 10:38:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 348846B0074; Mon, 12 Sep 2022 10:38:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E96B8D0001; Mon, 12 Sep 2022 10:38:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 11F356B0073 for ; Mon, 12 Sep 2022 10:38:31 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CC6511208F6 for ; Mon, 12 Sep 2022 14:38:30 +0000 (UTC) X-FDA: 79903689180.27.AAC4662 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 372571200CC for ; Mon, 12 Sep 2022 14:38:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662993509; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=k+KaS0fM2LtTWTqQre34n/5PD+Y3KKlfchETGtwq+Wo=; b=U2ezyRQrVTxBjTYJr8mtYWwPgMN2YxZHiepCyr7jaYhjPOzWkL8pJTCTHtEez0Mhj10gT9 TDj3aJRB8yo6px3/nPZuE0OtYfJ3t7zWm6HRt+vz/BrEtQ2ob4Pzt3g/JtFuDTtNrBJq7R a6I+9HrjhCTdcPbAcbB0tdYIn21wZgw= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-604-Orv-nlXePCOMxWeOq6qf-w-1; Mon, 12 Sep 2022 10:38:25 -0400 X-MC-Unique: Orv-nlXePCOMxWeOq6qf-w-1 Received: by mail-wr1-f71.google.com with SMTP id e18-20020adfa452000000b00228a420c389so1750978wra.16 for ; Mon, 12 Sep 2022 07:38:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date; bh=k+KaS0fM2LtTWTqQre34n/5PD+Y3KKlfchETGtwq+Wo=; b=yEyeVWcCjQu7vc1VkplKSxdNf89f0h44Caxvf1mCa1BWK5fPJW+QNkgFIHF+IHlMRa knpVAUjH9rEcPLIV+xUhDifV82i6gUiDxqhLxvLnnsJZQl/xHAqR89L9n00CDH+qYRho FN9VfV7KyX8c8H+KoaVElbQSWPbppNip9eMj+A6ughZurXPzQkjx3i7jqRcWGqZji3KG ydbbOKDSBmCpZUBRIWewm6bTVGG5ZC8nzCwp68GavvtztWt/Ug+S+JNpJ6YYzoqql4RX /FuYFaIfRq03zj9W8HXLXx2hyt7mc01zVuxyxg+4yo4uo7+pi5m5qQsYZPP7vw+F2nBO iW3A== X-Gm-Message-State: ACgBeo1s36rDx66avMhCK4r9WxsVUf5NHyQkUv418u/JXja7Uys+8BmB uBWhVt03Ck7B3NFaPt1lFh+qnuovBqlS4WZBNBgt9E27cmJO8Q8g381oRDMJ0LoIQ3KTE/gdtPC f7e0rrEprCQ== X-Received: by 2002:a05:600c:4e89:b0:3b4:8648:c4e1 with SMTP id f9-20020a05600c4e8900b003b48648c4e1mr4043685wmq.26.1662993504780; Mon, 12 Sep 2022 07:38:24 -0700 (PDT) X-Google-Smtp-Source: AA6agR5ayv/qWTCTw5RPK88jYtq0JR4D1IAi2vB6wkp8G23D0cBKEqTvSUyTpTgCjSUnKT2mRKr2hg== X-Received: by 2002:a05:600c:4e89:b0:3b4:8648:c4e1 with SMTP id f9-20020a05600c4e8900b003b48648c4e1mr4043656wmq.26.1662993504497; Mon, 12 Sep 2022 07:38:24 -0700 (PDT) Received: from localhost (cpc111743-lutn13-2-0-cust979.9-3.cable.virginm.net. [82.17.115.212]) by smtp.gmail.com with ESMTPSA id y25-20020a05600c365900b003b483000583sm4934789wmq.48.2022.09.12.07.38.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Sep 2022 07:38:23 -0700 (PDT) Date: Mon, 12 Sep 2022 15:38:23 +0100 From: Aaron Tomlin To: Marcelo Tosatti Cc: Frederic Weisbecker , cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v7 2/3] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too Message-ID: <20220912143822.irn6xhs2etmumqlt@ava.usersys.com> X-PGP-Key: http://pgp.mit.edu/pks/lookup?search=atomlin%40redhat.com X-PGP-Fingerprint: 7906 84EB FA8A 9638 8D1E 6E9B E2DE 9658 19CC 77D6 References: <20220817191346.287594886@redhat.com> <20220817191524.201253713@redhat.com> <20220909121224.GA220905@lothringen> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=U2ezyRQr; spf=pass (imf29.hostedemail.com: domain of atomlin@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=atomlin@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662993510; a=rsa-sha256; cv=none; b=aULBMT1sBAg8NowG2/NFVGF0oikchMxt/j9j7InYFGNhBCj1q23ArEJUPeY2dVHxaam7Ni IeQBE4OJaVuFlcirwAkr8lFs1n5R8PdY80X5SVaqozFZzx3MEIFIeXbxsoObheKZPWBMjN QFpNhcU3SFqQP+MUGPLns74rOhxJnVI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662993510; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=k+KaS0fM2LtTWTqQre34n/5PD+Y3KKlfchETGtwq+Wo=; b=DTYHg45lS2xJMz8ubZhQEHANe9dATE8VT+WlAZGFoKB7MYZtKRmTE33ELRA3oBqO1T61Lv JPJlI4dllRM6UAcQtz1/sQHandzp43xwikNH/d4Ysp9G2zfyxb/Yt9x5WEr0FpXauBlA5z SGKXliLPAmHz2Do2snLc40P/rh1Hu/8= X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 372571200CC X-Rspam-User: Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=U2ezyRQr; spf=pass (imf29.hostedemail.com: domain of atomlin@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=atomlin@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: b9ogjxd3hyc1czghs3pe3wyn35s5fts1 X-HE-Tag: 1662993510-275955 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri 2022-09-09 16:35 -0300, Marcelo Tosatti wrote: > For the scenario where we re-enter idle without calling quiet_vmstat: > > > CPU-0 CPU-1 > > 0) vmstat_shepherd notices its necessary to queue vmstat work > to remote CPU, queues deferrable timer into timer wheel, and calls > trigger_dyntick_cpu (target_cpu == cpu-1). > > 1) Stop the tick (get_next_timer_interrupt will not take deferrable > timers into account), calls quiet_vmstat, which keeps the vmstat work > (vmstat_update function) queued. > 2) Idle > 3) Idle exit > 4) Run thread on CPU, some activity marks vmstat dirty > 5) Idle > 6) Goto 3 > > At 5, since the tick is already stopped, the deferrable > timer for the delayed work item will not execute, > and vmstat_shepherd will consider > > static void vmstat_shepherd(struct work_struct *w) > { > int cpu; > > cpus_read_lock(); > /* Check processors whose vmstat worker threads have been disabled */ > for_each_online_cpu(cpu) { > struct delayed_work *dw = &per_cpu(vmstat_work, cpu); > > if (!delayed_work_pending(dw) && need_update(cpu)) > queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); > > cond_resched(); > } > cpus_read_unlock(); > > schedule_delayed_work(&shepherd, > round_jiffies_relative(sysctl_stat_interval)); > } > > As far as i can tell... Hi Marcelo, Yes, I agree with the scenario above. > > > Consider the following theoretical scenario: > > > > > > 1. CPU Y migrated running task A to CPU X that was > > > in an idle state i.e. waiting for an IRQ - not > > > polling; marked the current task on CPU X to > > > need/or require a reschedule i.e., set > > > TIF_NEED_RESCHED and invoked a reschedule IPI to > > > CPU X (see sched_move_task()) > > > > CPU Y is nohz_full right? > > > > > > > > 2. CPU X acknowledged the reschedule IPI from CPU Y; > > > generic idle loop code noticed the > > > TIF_NEED_RESCHED flag against the idle task and > > > attempts to exit of the loop and calls the main > > > scheduler function i.e. __schedule(). > > > > > > Since the idle tick was previously stopped no > > > scheduling-clock tick would occur. > > > So, no deferred timers would be handled > > > > > > 3. Post transition to kernel execution Task A > > > running on CPU Y, indirectly released a few pages > > > (e.g. see __free_one_page()); CPU Y's > > > 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone > > > specific 'vm_stat[]' update was deferred as per the > > > CPU-specific stat threshold > > > > > > 4. Task A does invoke exit(2) and the kernel does > > > remove the task from the run-queue; the idle task > > > was selected to execute next since there are no > > > other runnable tasks assigned to the given CPU > > > (see pick_next_task() and pick_next_task_idle()) > > > > This happens on CPU X, right? > > > > > > > > 5. On return to the idle loop since the idle tick > > > was already stopped and can remain so (see [1] > > > below) e.g. no pending soft IRQs, no attempt is > > > made to zero and fold CPU Y's vmstat counters > > > since reprogramming of the scheduling-clock tick > > > is not required/or needed (see [2]) > > > > And now back to CPU Y, confused... > > Aaron, can you explain the diagram above? Hi Frederic, Sorry about that. How about the following: - Note: CPU X is part of 'tick_nohz_full_mask' 1. CPU Y migrated running task A to CPU X that was in an idle state i.e. waiting for an IRQ; marked the current task on CPU X to need/or require a reschedule i.e., set TIF_NEED_RESCHED and invoked a reschedule IPI to CPU X (see sched_move_task()) 2. CPU X acknowledged the reschedule IPI. Generic idle loop code noticed the TIF_NEED_RESCHED flag against the idle task and attempts to exit of the loop and calls the main scheduler function i.e. __schedule(). Since the idle tick was previously stopped no scheduling-clock tick would occur. So, no deferred timers would be handled 3. Post transition to kernel execution Task A running on CPU X, indirectly released a few pages (e.g. see __free_one_page()); CPU X's 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone specific 'vm_stat[]' update was deferred as per the CPU-specific stat threshold 4. Task A does invoke exit(2) and the kernel does remove the task from the run-queue; the idle task was selected to execute next since there are no other runnable tasks assigned to the given CPU (see pick_next_task() and pick_next_task_idle()) 5. On return to the idle loop since the idle tick was already stopped and can remain so (see [1] below) e.g. no pending soft IRQs, no attempt is made to zero and fold CPU X's vmstat counters since reprogramming of the scheduling-clock tick is not required/or needed (see [2]) Kind regards, -- Aaron Tomlin