linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Aaron Tomlin <atomlin@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <frederic@kernel.org>,
	cl@linux.com, tglx@linutronix.de, mingo@kernel.org,
	peterz@infradead.org, pauld@redhat.com, neelx@redhat.com,
	oleksandr@natalenko.name, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v7 2/3] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too
Date: Mon, 12 Sep 2022 15:38:23 +0100	[thread overview]
Message-ID: <20220912143822.irn6xhs2etmumqlt@ava.usersys.com> (raw)
In-Reply-To: <YxuViCnKcIYVE02B@fuller.cnet>

On Fri 2022-09-09 16:35 -0300, Marcelo Tosatti wrote:
> For the scenario where we re-enter idle without calling quiet_vmstat:
> 
> 
> CPU-0			CPU-1
> 
> 0) vmstat_shepherd notices its necessary to queue vmstat work 
> to remote CPU, queues deferrable timer into timer wheel, and calls
> trigger_dyntick_cpu (target_cpu == cpu-1).
> 
> 			1) Stop the tick (get_next_timer_interrupt will not take deferrable
> 			   timers into account), calls quiet_vmstat, which keeps the vmstat work
> 			   (vmstat_update function) queued.
> 			2) Idle
> 			3) Idle exit
> 			4) Run thread on CPU, some activity marks vmstat dirty
> 			5) Idle
> 			6) Goto 3
> 
> At 5, since the tick is already stopped, the deferrable 
> timer for the delayed work item will not execute,
> and vmstat_shepherd will consider 
> 
> static void vmstat_shepherd(struct work_struct *w)
> {
>         int cpu;
> 
>         cpus_read_lock();
>         /* Check processors whose vmstat worker threads have been disabled */
>         for_each_online_cpu(cpu) {
>                 struct delayed_work *dw = &per_cpu(vmstat_work, cpu);
> 
>                 if (!delayed_work_pending(dw) && need_update(cpu))
>                         queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0);
> 
>                 cond_resched();
>         }
>         cpus_read_unlock();
> 
>         schedule_delayed_work(&shepherd,
>                 round_jiffies_relative(sysctl_stat_interval));
> }
> 
> As far as i can tell...

Hi Marcelo,

Yes, I agree with the scenario above.

> > > Consider the following theoretical scenario:
> > > 
> > >         1.      CPU Y migrated running task A to CPU X that was
> > >                 in an idle state i.e. waiting for an IRQ - not
> > >                 polling; marked the current task on CPU X to
> > >                 need/or require a reschedule i.e., set
> > >                 TIF_NEED_RESCHED and invoked a reschedule IPI to
> > >                 CPU X (see sched_move_task())
> > 
> > CPU Y is nohz_full right?
> > 
> > > 
> > >         2.      CPU X acknowledged the reschedule IPI from CPU Y;
> > >                 generic idle loop code noticed the
> > >                 TIF_NEED_RESCHED flag against the idle task and
> > >                 attempts to exit of the loop and calls the main
> > >                 scheduler function i.e. __schedule().
> > > 
> > >                 Since the idle tick was previously stopped no
> > >                 scheduling-clock tick would occur.
> > >                 So, no deferred timers would be handled
> > > 
> > >         3.      Post transition to kernel execution Task A
> > >                 running on CPU Y, indirectly released a few pages
> > >                 (e.g. see __free_one_page()); CPU Y's
> > >                 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone
> > >                 specific 'vm_stat[]' update was deferred as per the
> > >                 CPU-specific stat threshold
> > > 
> > >         4.      Task A does invoke exit(2) and the kernel does
> > >                 remove the task from the run-queue; the idle task
> > >                 was selected to execute next since there are no
> > >                 other runnable tasks assigned to the given CPU
> > >                 (see pick_next_task() and pick_next_task_idle())
> > 
> > This happens on CPU X, right?
> > 
> > > 
> > >         5.      On return to the idle loop since the idle tick
> > >                 was already stopped and can remain so (see [1]
> > >                 below) e.g. no pending soft IRQs, no attempt is
> > >                 made to zero and fold CPU Y's vmstat counters
> > >                 since reprogramming of the scheduling-clock tick
> > >                 is not required/or needed (see [2])
> > 
> > And now back to CPU Y, confused...
> 
> Aaron, can you explain the diagram above? 

Hi Frederic,

Sorry about that. How about the following:

 - Note: CPU X is part of 'tick_nohz_full_mask'

    1.      CPU Y migrated running task A to CPU X that
	    was in an idle state i.e. waiting for an IRQ;
	    marked the current task on CPU X to need/or
	    require a reschedule i.e., set TIF_NEED_RESCHED
	    and invoked a reschedule IPI to CPU X
	    (see sched_move_task())

    2.      CPU X acknowledged the reschedule IPI. Generic
	    idle loop code noticed the TIF_NEED_RESCHED flag
	    against the idle task and attempts to exit of the
	    loop and calls the main scheduler function i.e.
	    __schedule().

	    Since the idle tick was previously stopped no
	    scheduling-clock tick would occur.
	    So, no deferred timers would be handled

    3.      Post transition to kernel execution Task A
	    running on CPU X, indirectly released a few pages
	    (e.g. see __free_one_page()); CPU X's
	    'vm_stat_diff[NR_FREE_PAGES]' was updated and zone
	    specific 'vm_stat[]' update was deferred as per the
	    CPU-specific stat threshold

    4.      Task A does invoke exit(2) and the kernel does
	    remove the task from the run-queue; the idle task
	    was selected to execute next since there are no
	    other runnable tasks assigned to the given CPU
	    (see pick_next_task() and pick_next_task_idle())

    5.      On return to the idle loop since the idle tick
	    was already stopped and can remain so (see [1]
	    below) e.g. no pending soft IRQs, no attempt is
	    made to zero and fold CPU X's vmstat counters
	    since reprogramming of the scheduling-clock tick
	    is not required/or needed (see [2])



Kind regards,

-- 
Aaron Tomlin



  reply	other threads:[~2022-09-12 14:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17 19:13 [PATCH v7 0/3] " Marcelo Tosatti
2022-08-17 19:13 ` [PATCH v7 1/3] mm/vmstat: Use per cpu variable to track a vmstat discrepancy Marcelo Tosatti
2022-08-24 20:20   ` Andrew Morton
2022-08-26 13:29     ` Aaron Tomlin
2022-08-17 19:13 ` [PATCH v7 2/3] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too Marcelo Tosatti
2022-08-24 20:20   ` Andrew Morton
2022-09-09 12:12   ` Frederic Weisbecker
2022-09-09 19:35     ` Marcelo Tosatti
2022-09-12 14:38       ` Aaron Tomlin [this message]
2022-09-14 11:04         ` Frederic Weisbecker
2022-08-17 19:13 ` [PATCH v7 3/3] mm/vmstat: do not queue vmstat_update if tick is stopped Marcelo Tosatti
  -- strict thread matches above, loose matches on Subject: below --
2022-08-17 19:01 [patch 0/3] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too Marcelo Tosatti
2022-08-17 19:01 ` [patch 2/3] " Marcelo Tosatti
2022-08-17 19:01   ` [PATCH v7 " Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220912143822.irn6xhs2etmumqlt@ava.usersys.com \
    --to=atomlin@redhat.com \
    --cc=cl@linux.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=neelx@redhat.com \
    --cc=oleksandr@natalenko.name \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox