linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: + mm-align-vmstat_works-timer.patch added to -mm tree
       [not found] <200904011945.n31JjWqG028114@imap1.linux-foundation.org>
@ 2009-04-06  3:30 ` KOSAKI Motohiro
  2009-04-06  3:57   ` KOSAKI Motohiro
  2009-04-07  4:04   ` Anton Blanchard
  0 siblings, 2 replies; 4+ messages in thread
From: KOSAKI Motohiro @ 2009-04-06  3:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kosaki.motohiro, anton, linux-mm, Andrew Morton

(swich to lkml and linux-mm)

Hi Anton,

Do you have any mesurement data?

Honestly, I made the same patch few week ago.
but I found two problems.

1)
work queue tracer (in -tip) reported it isn't proper rounded.

The fact is, schedule_delayed_work(work, round_jiffies_relative()) is
a bit ill.

it mean
  - round_jiffies_relative() calculate rounded-time - jiffies
  - schedule_delayed_work() calculate argument + jiffies

it assume no jiffies change at above two place. IOW it assume
non preempt kernel.


2)
> -	schedule_delayed_work_on(cpu, vmstat_work, HZ + cpu);
> +	schedule_delayed_work_on(cpu, vmstat_work,
> +				 __round_jiffies_relative(HZ, cpu));

isn't same meaning.

vmstat_work mean to move per-cpu stastics to global stastics.
Then, (HZ + cpu) mean to avoid to touch the same global variable at the same time.

Oh well, this patch have performance regression risk on _very_ big server.
(perhaps, only sgi?)

but I agree vmstat_work is one of most work queue heavy user.
For power consumption view, it isn't proper behavior.

I still think improving another way.

> 
> The patch titled
>      mm: align vmstat_work's timer
> has been added to the -mm tree.  Its filename is
>      mm-align-vmstat_works-timer.patch
> 
> Before you just go and hit "reply", please:
>    a) Consider who else should be cc'ed
>    b) Prefer to cc a suitable mailing list as well
>    c) Ideally: find the original patch on the mailing list and do a
>       reply-to-all to that, adding suitable additional cc's
> 
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
> 
> See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find
> out what to do about this
> 
> The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/
> 
> ------------------------------------------------------
> Subject: mm: align vmstat_work's timer
> From: Anton Blanchard <anton@samba.org>
> 
> Even though vmstat_work is marked deferrable, there are still benefits to
> aligning it.  For certain applications we want to keep OS jitter as low as
> possible and aligning timers and work so they occur together can reduce
> their overall impact.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/vmstat.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff -puN mm/vmstat.c~mm-align-vmstat_works-timer mm/vmstat.c
> --- a/mm/vmstat.c~mm-align-vmstat_works-timer
> +++ a/mm/vmstat.c
> @@ -984,7 +984,7 @@ static void vmstat_update(struct work_st
>  {
>  	refresh_cpu_vm_stats(smp_processor_id());
>  	schedule_delayed_work(&__get_cpu_var(vmstat_work),
> -		sysctl_stat_interval);
> +		round_jiffies_relative(sysctl_stat_interval));
>  }
>  
>  static void __cpuinit start_cpu_timer(int cpu)
> @@ -992,7 +992,8 @@ static void __cpuinit start_cpu_timer(in
>  	struct delayed_work *vmstat_work = &per_cpu(vmstat_work, cpu);
>  
>  	INIT_DELAYED_WORK_DEFERRABLE(vmstat_work, vmstat_update);
> -	schedule_delayed_work_on(cpu, vmstat_work, HZ + cpu);
> +	schedule_delayed_work_on(cpu, vmstat_work,
> +				 __round_jiffies_relative(HZ, cpu));
>  }
>  
>  /*
> _
> 
> Patches currently in -mm which might be from anton@samba.org are
> 
> origin.patch
> mm-align-vmstat_works-timer.patch
> random-align-rekey_works-timer.patch
> sunrpc-align-cache_clean-works-timer.patch
> 
> --
> To unsubscribe from this list: send the line "unsubscribe mm-commits" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: + mm-align-vmstat_works-timer.patch added to -mm tree
  2009-04-06  3:30 ` + mm-align-vmstat_works-timer.patch added to -mm tree KOSAKI Motohiro
@ 2009-04-06  3:57   ` KOSAKI Motohiro
  2009-04-07  4:04   ` Anton Blanchard
  1 sibling, 0 replies; 4+ messages in thread
From: KOSAKI Motohiro @ 2009-04-06  3:57 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: linux-kernel, anton, linux-mm, Andrew Morton

> (swich to lkml and linux-mm)
> 
> Hi Anton,
> 
> Do you have any mesurement data?
> 
> Honestly, I made the same patch few week ago.
> but I found two problems.
> 
> 1)
> work queue tracer (in -tip) reported it isn't proper rounded.

Ah, sorry ignore this sentence.
I used my local patch queue's feature for mesurement, not -tip.


> 
> The fact is, schedule_delayed_work(work, round_jiffies_relative()) is
> a bit ill.
> 
> it mean
>   - round_jiffies_relative() calculate rounded-time - jiffies
>   - schedule_delayed_work() calculate argument + jiffies
> 
> it assume no jiffies change at above two place. IOW it assume
> non preempt kernel.
> 
> 
> 2)
> > -	schedule_delayed_work_on(cpu, vmstat_work, HZ + cpu);
> > +	schedule_delayed_work_on(cpu, vmstat_work,
> > +				 __round_jiffies_relative(HZ, cpu));
> 
> isn't same meaning.
> 
> vmstat_work mean to move per-cpu stastics to global stastics.
> Then, (HZ + cpu) mean to avoid to touch the same global variable at the same time.
> 
> Oh well, this patch have performance regression risk on _very_ big server.
> (perhaps, only sgi?)
> 
> but I agree vmstat_work is one of most work queue heavy user.
> For power consumption view, it isn't proper behavior.
> 
> I still think improving another way.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: + mm-align-vmstat_works-timer.patch added to -mm tree
  2009-04-06  3:30 ` + mm-align-vmstat_works-timer.patch added to -mm tree KOSAKI Motohiro
  2009-04-06  3:57   ` KOSAKI Motohiro
@ 2009-04-07  4:04   ` Anton Blanchard
  2009-04-09  0:59     ` KOSAKI Motohiro
  1 sibling, 1 reply; 4+ messages in thread
From: Anton Blanchard @ 2009-04-07  4:04 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: linux-kernel, linux-mm, Andrew Morton, mingo, tglx


Hi,

> Do you have any mesurement data?

I was using a simple set of kprobes to look at when timers and
workqueues fire.

> The fact is, schedule_delayed_work(work, round_jiffies_relative()) is
> a bit ill.
> 
> it mean
>   - round_jiffies_relative() calculate rounded-time - jiffies
>   - schedule_delayed_work() calculate argument + jiffies
> 
> it assume no jiffies change at above two place. IOW it assume
> non preempt kernel.

I'm not sure we are any worse off here. Before the patch we could end up
with all threads converging on the same jiffy, and once that happens
they will continue to fire over the top of each other (at least until a
difference in the time it takes vmstat_work to complete causes them to
diverge again).

With the patch we always apply a per cpu offset, so should keep them
separated even if jiffies sometimes changes between
round_jiffies_relative() and schedule_delayed_work().

> 2)
> > -	schedule_delayed_work_on(cpu, vmstat_work, HZ + cpu);
> > +	schedule_delayed_work_on(cpu, vmstat_work,
> > +				 __round_jiffies_relative(HZ, cpu));
> 
> isn't same meaning.
> 
> vmstat_work mean to move per-cpu stastics to global stastics.
> Then, (HZ + cpu) mean to avoid to touch the same global variable at the same time.

round_jiffies_common still provides per cpu skew doesn't it?

        /*
         * We don't want all cpus firing their timers at once hitting the
         * same lock or cachelines, so we skew each extra cpu with an extra
         * 3 jiffies. This 3 jiffies came originally from the mm/ code which
         * already did this.
         * The skew is done by adding 3*cpunr, then round, then subtract this
         * extra offset again.
         */

In fact we are also skewing timer interrupts across half a timer tick in
tick_setup_sched_timer:

	/* Get the next period (per cpu) */
	hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());
	offset = ktime_to_ns(tick_period) >> 1;
	do_div(offset, num_possible_cpus());
	offset *= smp_processor_id();
	hrtimer_add_expires_ns(&ts->sched_timer, offset);

I still need to see if I can measure a reduction in jitter by removing
this half jiffy skew and aligning all timer interrupts. Assuming we skew
per cpu work and timers, it seems like we shouldn't need to skew timer
interrupts too.

> but I agree vmstat_work is one of most work queue heavy user.
> For power consumption view, it isn't proper behavior.
> 
> I still think improving another way.

I definitely agree it would be nice to fix vmstat_work :)

Anton

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: + mm-align-vmstat_works-timer.patch added to -mm tree
  2009-04-07  4:04   ` Anton Blanchard
@ 2009-04-09  0:59     ` KOSAKI Motohiro
  0 siblings, 0 replies; 4+ messages in thread
From: KOSAKI Motohiro @ 2009-04-09  0:59 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: kosaki.motohiro, linux-kernel, linux-mm, Andrew Morton, mingo, tglx

Hi

> 
> Hi,
> 
> > Do you have any mesurement data?
> 
> I was using a simple set of kprobes to look at when timers and
> workqueues fire.

ok. thanks.


> > The fact is, schedule_delayed_work(work, round_jiffies_relative()) is
> > a bit ill.
> > 
> > it mean
> >   - round_jiffies_relative() calculate rounded-time - jiffies
> >   - schedule_delayed_work() calculate argument + jiffies
> > 
> > it assume no jiffies change at above two place. IOW it assume
> > non preempt kernel.
> 
> I'm not sure we are any worse off here. Before the patch we could end up
> with all threads converging on the same jiffy, and once that happens
> they will continue to fire over the top of each other (at least until a
> difference in the time it takes vmstat_work to complete causes them to
> diverge again).
> 
> With the patch we always apply a per cpu offset, so should keep them
> separated even if jiffies sometimes changes between
> round_jiffies_relative() and schedule_delayed_work().

Well, ok I agree your patch don't have back step.

I mean I agree preempt kernel vs round_jiffies_relative() problem is
unrelated to your patch.


> > 2)
> > > -	schedule_delayed_work_on(cpu, vmstat_work, HZ + cpu);
> > > +	schedule_delayed_work_on(cpu, vmstat_work,
> > > +				 __round_jiffies_relative(HZ, cpu));
> > 
> > isn't same meaning.
> > 
> > vmstat_work mean to move per-cpu stastics to global stastics.
> > Then, (HZ + cpu) mean to avoid to touch the same global variable at the same time.
> 
> round_jiffies_common still provides per cpu skew doesn't it?
> 
>         /*
>          * We don't want all cpus firing their timers at once hitting the
>          * same lock or cachelines, so we skew each extra cpu with an extra
>          * 3 jiffies. This 3 jiffies came originally from the mm/ code which
>          * already did this.
>          * The skew is done by adding 3*cpunr, then round, then subtract this
>          * extra offset again.
>          */
> 
> In fact we are also skewing timer interrupts across half a timer tick in
> tick_setup_sched_timer:
> 
> 	/* Get the next period (per cpu) */
> 	hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());
> 	offset = ktime_to_ns(tick_period) >> 1;
> 	do_div(offset, num_possible_cpus());
> 	offset *= smp_processor_id();
> 	hrtimer_add_expires_ns(&ts->sched_timer, offset);
> 
> I still need to see if I can measure a reduction in jitter by removing
> this half jiffy skew and aligning all timer interrupts. Assuming we skew
> per cpu work and timers, it seems like we shouldn't need to skew timer
> interrupts too.

Ah, you are perfectly right.
I missed it.


> > but I agree vmstat_work is one of most work queue heavy user.
> > For power consumption view, it isn't proper behavior.
> > 
> > I still think improving another way.
> 
> I definitely agree it would be nice to fix vmstat_work :)

Thank you for kindful explanation :)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-04-09  0:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200904011945.n31JjWqG028114@imap1.linux-foundation.org>
2009-04-06  3:30 ` + mm-align-vmstat_works-timer.patch added to -mm tree KOSAKI Motohiro
2009-04-06  3:57   ` KOSAKI Motohiro
2009-04-07  4:04   ` Anton Blanchard
2009-04-09  0:59     ` KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox