From: Aaron Tomlin <atomlin@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <frederic@kernel.org>,
cl@linux.com, tglx@linutronix.de, mingo@kernel.org,
peterz@infradead.org, pauld@redhat.com, neelx@redhat.com,
oleksandr@natalenko.name, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH v7 2/3] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too
Date: Mon, 12 Sep 2022 15:38:23 +0100 [thread overview]
Message-ID: <20220912143822.irn6xhs2etmumqlt@ava.usersys.com> (raw)
In-Reply-To: <YxuViCnKcIYVE02B@fuller.cnet>
On Fri 2022-09-09 16:35 -0300, Marcelo Tosatti wrote:
> For the scenario where we re-enter idle without calling quiet_vmstat:
>
>
> CPU-0 CPU-1
>
> 0) vmstat_shepherd notices its necessary to queue vmstat work
> to remote CPU, queues deferrable timer into timer wheel, and calls
> trigger_dyntick_cpu (target_cpu == cpu-1).
>
> 1) Stop the tick (get_next_timer_interrupt will not take deferrable
> timers into account), calls quiet_vmstat, which keeps the vmstat work
> (vmstat_update function) queued.
> 2) Idle
> 3) Idle exit
> 4) Run thread on CPU, some activity marks vmstat dirty
> 5) Idle
> 6) Goto 3
>
> At 5, since the tick is already stopped, the deferrable
> timer for the delayed work item will not execute,
> and vmstat_shepherd will consider
>
> static void vmstat_shepherd(struct work_struct *w)
> {
> int cpu;
>
> cpus_read_lock();
> /* Check processors whose vmstat worker threads have been disabled */
> for_each_online_cpu(cpu) {
> struct delayed_work *dw = &per_cpu(vmstat_work, cpu);
>
> if (!delayed_work_pending(dw) && need_update(cpu))
> queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0);
>
> cond_resched();
> }
> cpus_read_unlock();
>
> schedule_delayed_work(&shepherd,
> round_jiffies_relative(sysctl_stat_interval));
> }
>
> As far as i can tell...
Hi Marcelo,
Yes, I agree with the scenario above.
> > > Consider the following theoretical scenario:
> > >
> > > 1. CPU Y migrated running task A to CPU X that was
> > > in an idle state i.e. waiting for an IRQ - not
> > > polling; marked the current task on CPU X to
> > > need/or require a reschedule i.e., set
> > > TIF_NEED_RESCHED and invoked a reschedule IPI to
> > > CPU X (see sched_move_task())
> >
> > CPU Y is nohz_full right?
> >
> > >
> > > 2. CPU X acknowledged the reschedule IPI from CPU Y;
> > > generic idle loop code noticed the
> > > TIF_NEED_RESCHED flag against the idle task and
> > > attempts to exit of the loop and calls the main
> > > scheduler function i.e. __schedule().
> > >
> > > Since the idle tick was previously stopped no
> > > scheduling-clock tick would occur.
> > > So, no deferred timers would be handled
> > >
> > > 3. Post transition to kernel execution Task A
> > > running on CPU Y, indirectly released a few pages
> > > (e.g. see __free_one_page()); CPU Y's
> > > 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone
> > > specific 'vm_stat[]' update was deferred as per the
> > > CPU-specific stat threshold
> > >
> > > 4. Task A does invoke exit(2) and the kernel does
> > > remove the task from the run-queue; the idle task
> > > was selected to execute next since there are no
> > > other runnable tasks assigned to the given CPU
> > > (see pick_next_task() and pick_next_task_idle())
> >
> > This happens on CPU X, right?
> >
> > >
> > > 5. On return to the idle loop since the idle tick
> > > was already stopped and can remain so (see [1]
> > > below) e.g. no pending soft IRQs, no attempt is
> > > made to zero and fold CPU Y's vmstat counters
> > > since reprogramming of the scheduling-clock tick
> > > is not required/or needed (see [2])
> >
> > And now back to CPU Y, confused...
>
> Aaron, can you explain the diagram above?
Hi Frederic,
Sorry about that. How about the following:
- Note: CPU X is part of 'tick_nohz_full_mask'
1. CPU Y migrated running task A to CPU X that
was in an idle state i.e. waiting for an IRQ;
marked the current task on CPU X to need/or
require a reschedule i.e., set TIF_NEED_RESCHED
and invoked a reschedule IPI to CPU X
(see sched_move_task())
2. CPU X acknowledged the reschedule IPI. Generic
idle loop code noticed the TIF_NEED_RESCHED flag
against the idle task and attempts to exit of the
loop and calls the main scheduler function i.e.
__schedule().
Since the idle tick was previously stopped no
scheduling-clock tick would occur.
So, no deferred timers would be handled
3. Post transition to kernel execution Task A
running on CPU X, indirectly released a few pages
(e.g. see __free_one_page()); CPU X's
'vm_stat_diff[NR_FREE_PAGES]' was updated and zone
specific 'vm_stat[]' update was deferred as per the
CPU-specific stat threshold
4. Task A does invoke exit(2) and the kernel does
remove the task from the run-queue; the idle task
was selected to execute next since there are no
other runnable tasks assigned to the given CPU
(see pick_next_task() and pick_next_task_idle())
5. On return to the idle loop since the idle tick
was already stopped and can remain so (see [1]
below) e.g. no pending soft IRQs, no attempt is
made to zero and fold CPU X's vmstat counters
since reprogramming of the scheduling-clock tick
is not required/or needed (see [2])
Kind regards,
--
Aaron Tomlin
next prev parent reply other threads:[~2022-09-12 14:38 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-17 19:13 [PATCH v7 0/3] " Marcelo Tosatti
2022-08-17 19:13 ` [PATCH v7 1/3] mm/vmstat: Use per cpu variable to track a vmstat discrepancy Marcelo Tosatti
2022-08-24 20:20 ` Andrew Morton
2022-08-26 13:29 ` Aaron Tomlin
2022-08-17 19:13 ` [PATCH v7 2/3] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too Marcelo Tosatti
2022-08-24 20:20 ` Andrew Morton
2022-09-09 12:12 ` Frederic Weisbecker
2022-09-09 19:35 ` Marcelo Tosatti
2022-09-12 14:38 ` Aaron Tomlin [this message]
2022-09-14 11:04 ` Frederic Weisbecker
2022-08-17 19:13 ` [PATCH v7 3/3] mm/vmstat: do not queue vmstat_update if tick is stopped Marcelo Tosatti
-- strict thread matches above, loose matches on Subject: below --
2022-08-17 19:01 [patch 0/3] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too Marcelo Tosatti
2022-08-17 19:01 ` [patch 2/3] " Marcelo Tosatti
2022-08-17 19:01 ` [PATCH v7 " Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220912143822.irn6xhs2etmumqlt@ava.usersys.com \
--to=atomlin@redhat.com \
--cc=cl@linux.com \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@kernel.org \
--cc=mtosatti@redhat.com \
--cc=neelx@redhat.com \
--cc=oleksandr@natalenko.name \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox