Re: [PATCH 15/17] [RFC] sched/eevdf: Sleeper bonus

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 15/17] [RFC] sched/eevdf: Sleeper bonus
       [not found] ` <20230328110354.641979416@infradead.org>
@ 2023-03-30  7:53   ` Hillf Danton
  0 siblings, 0 replies; 10+ messages in thread
From: Hillf Danton @ 2023-03-30  7:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, vincent.guittot, linux-kernel, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, corbet, qyousef, chris.hyser,
	patrick.bellasi, pjt, joshdon, timj, kprateek.nayak, yu.c.chen,
	youssefesmat, linux-mm, joel, efault

On 28 Mar 2023 11:26:37 +0200 Peter Zijlstra (Intel) <peterz@infradead.org>
> @@ -4878,22 +4878,55 @@ place_entity(struct cfs_rq *cfs_rq, stru
>  		if (WARN_ON_ONCE(!load))
>  			load = 1;
>  		lag = div_s64(lag, load);
> +
> +		vruntime -= lag;
> +	}
> +
> +	/*
> +	 * Base the deadline on the 'normal' EEVDF placement policy in an
> +	 * attempt to not let the bonus crud below wreck things completely.
> +	 */
> +	se->deadline = vruntime;
> +
> +	/*
> +	 * The whole 'sleeper' bonus hack... :-/ This is strictly unfair.
> +	 *
> +	 * By giving a sleeping task a little boost, it becomes possible for a
> +	 * 50% task to compete equally with a 100% task. That is, strictly fair
> +	 * that setup would result in a 67% / 33% split. Sleeper bonus will
> +	 * change that to 50% / 50%.
> +	 *
> +	 * This thing hurts my brain, because tasks leaving with negative lag
> +	 * will move 'time' backward, so comparing against a historical
> +	 * se->vruntime is dodgy as heck.
> +	 */
> +	if (sched_feat(PLACE_BONUS) &&
> +	    (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED)) {
> +		/*
> +		 * If se->vruntime is ahead of vruntime, something dodgy
> +		 * happened and we cannot give bonus due to not having valid
> +		 * history.
> +		 */
> +		if ((s64)(se->vruntime - vruntime) < 0) {
> +			vruntime -= se->slice/2;
> +			vruntime = max_vruntime(se->vruntime, vruntime);
> +		}
>  	}
>  
> -	se->vruntime = vruntime - lag;
> +	se->vruntime = vruntime;
>  
>  	/*
>  	 * When joining the competition; the exisiting tasks will be,
>  	 * on average, halfway through their slice, as such start tasks
>  	 * off with half a slice to ease into the competition.
>  	 */
> -	if (sched_feat(PLACE_DEADLINE_INITIAL) && initial)
> +	if (sched_feat(PLACE_DEADLINE_INITIAL) && (flags & ENQUEUE_INITIAL))
>  		vslice /= 2;
>  
>  	/*
>  	 * EEVDF: vd_i = ve_i + r_i/w_i
>  	 */
> -	se->deadline = se->vruntime + vslice;
> +	se->deadline += vslice;
>  }

Because lag makes no sense for more-than-a-second sleepers, it is simpler to
make them able to preempt the current at the next tick, in line with what fork
can do at best.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 08/17] sched/fair: Implement an EEVDF like policy
       [not found] ` <20230328110354.141543852@infradead.org>
@ 2023-03-30 11:02   ` Hillf Danton
  0 siblings, 0 replies; 10+ messages in thread
From: Hillf Danton @ 2023-03-30 11:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, vincent.guittot, linux-kernel, rostedt, bsegall, mgorman,
	bristot, corbet, qyousef, pavel, qperret, tim.c.chen, joshdon,
	timj, kprateek.nayak, yu.c.chen, youssefesmat, linux-mm, joel,
	efault

On 28 Mar 2023 11:26:30 +0200 Peter Zijlstra <peterz@infradead.org>
> 
>  check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
>  {
> -	unsigned long ideal_runtime, delta_exec;
> +	unsigned long delta_exec;
>  	struct sched_entity *se;
>  	s64 delta;
>  
> -	/*
> -	 * When many tasks blow up the sched_period; it is possible that
> -	 * sched_slice() reports unusually large results (when many tasks are
> -	 * very light for example). Therefore impose a maximum.
> -	 */
> -	ideal_runtime = min_t(u64, sched_slice(cfs_rq, curr), sysctl_sched_latency);
> +	if (sched_feat(EEVDF)) {
> +		if (pick_eevdf(cfs_rq) != curr)
> +			goto preempt;
> +
> +		return;
> +	}

Given deadline, the tick preempt can be replaced with a timer expiring at the EDL.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 14/17] sched/eevdf: Better handle mixed slice length
       [not found]   ` <CAKfTPtAkFBw5zt0+WK7dWBUE9OrbOOExG8ueUE6ogdCEQZhpXQ@mail.gmail.com>
@ 2023-04-01 23:23     ` Hillf Danton
  2023-04-02  2:40       ` Mike Galbraith
  0 siblings, 1 reply; 10+ messages in thread
From: Hillf Danton @ 2023-04-01 23:23 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, linux-kernel, Peter Zijlstra, rostedt, bsegall, mgorman,
	bristot, corbet, qyousef, joshdon, timj, kprateek.nayak,
	yu.c.chen, youssefesmat, linux-mm, joel, efault

On 31 Mar 2023 17:26:51 +0200 Vincent Guittot <vincent.guittot@linaro.org>
> On Tue, 28 Mar 2023 at 13:06, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > @@ -4832,6 +4834,18 @@ place_entity(struct cfs_rq *cfs_rq, stru
> >                 lag = se->vlag;
> >
> >                 /*
> > +                * For latency sensitive tasks; those that have a shorter than
> > +                * average slice and do not fully consume the slice, transition
> > +                * to EEVDF placement strategy #2.
> > +                */
> > +               if (sched_feat(PLACE_FUDGE) &&
> > +                   cfs_rq->avg_slice > se->slice * cfs_rq->avg_load) {
> > +                       lag += vslice;
> > +                       if (lag > 0)
> > +                               lag = 0;
> 
> By using different lag policies for tasks, doesn't this create
> unfairness between tasks ?
> 
> I wanted to stress this situation with a simple use case but it seems
> that even without changing the slice, there is a fairness problem:
> 
> Task A always run
> Task B loops on : running 1ms then sleeping 1ms
> default nice and latency nice prio bot both
> each task should get around 50% of the time.
> 
> The fairness is ok with tip/sched/core
> but with eevdf, Task B only gets around 30%

Convincing evidence for glitch in wakeup preempt.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 14/17] sched/eevdf: Better handle mixed slice length
  2023-04-01 23:23     ` [PATCH 14/17] sched/eevdf: Better handle mixed slice length Hillf Danton
@ 2023-04-02  2:40       ` Mike Galbraith
  2023-04-02  6:28         ` Hillf Danton
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Galbraith @ 2023-04-02  2:40 UTC (permalink / raw)
  To: Hillf Danton, Vincent Guittot
  Cc: mingo, linux-kernel, Peter Zijlstra, rostedt, bsegall, mgorman,
	bristot, corbet, qyousef, joshdon, timj, kprateek.nayak,
	yu.c.chen, youssefesmat, linux-mm, joel

On Sun, 2023-04-02 at 07:23 +0800, Hillf Danton wrote:
> On 31 Mar 2023 17:26:51 +0200 Vincent Guittot <vincent.guittot@linaro.org>
> >
> > I wanted to stress this situation with a simple use case but it seems
> > that even without changing the slice, there is a fairness problem:
> >
> > Task A always run
> > Task B loops on : running 1ms then sleeping 1ms
> > default nice and latency nice prio bot both
> > each task should get around 50% of the time.
> >
> > The fairness is ok with tip/sched/core
> > but with eevdf, Task B only gets around 30%
>
> Convincing evidence for glitch in wakeup preempt.

If you turn on PLACE_BONUS, it'll mimic FAIR_SLEEPERS.. but if you then
do some testing, you'll probably turn it right back off.

The 50/50 split in current code isn't really any more fair, as soon as
you leave the tiny bubble of fairness, it's not the least bit fair.
Nor is that tiny bubble all rainbows and unicorns, it brought with it
benchmark wins and losses, like everything that changes more than
comments, its price being service latency variance.

The short term split doesn't really mean all that much, some things
will like the current fair-bubble better, some eevdf virtual deadline
math and its less spiky service.  We'll see.

I'm kinda hoping eevdf works out, FAIR_SLEEPERS is quite annoying to
squabble with.

	-Mike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 14/17] sched/eevdf: Better handle mixed slice length
  2023-04-02  2:40       ` Mike Galbraith
@ 2023-04-02  6:28         ` Hillf Danton
  0 siblings, 0 replies; 10+ messages in thread
From: Hillf Danton @ 2023-04-02  6:28 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: mingo, linux-kernel, Vincent Guittot, Peter Zijlstra, rostedt,
	bsegall, mgorman, bristot, corbet, qyousef, joshdon, timj,
	kprateek.nayak, yu.c.chen, youssefesmat, linux-mm, joel

On 02 Apr 2023 04:40:20 +0200 Mike Galbraith <efault@gmx.de>
> On Sun, 2023-04-02 at 07:23 +0800, Hillf Danton wrote:
> > On 31 Mar 2023 17:26:51 +0200 Vincent Guittot <vincent.guittot@linaro.org>
> > >
> > > I wanted to stress this situation with a simple use case but it seems
> > > that even without changing the slice, there is a fairness problem:
> > >
> > > Task A always run
> > > Task B loops on : running 1ms then sleeping 1ms
> > > default nice and latency nice prio bot both
> > > each task should get around 50% of the time.
> > >
> > > The fairness is ok with tip/sched/core
> > > but with eevdf, Task B only gets around 30%
> >
> > Convincing evidence for glitch in wakeup preempt.
> 
> If you turn on PLACE_BONUS, it'll mimic FAIR_SLEEPERS.. but if you then
> do some testing, you'll probably turn it right back off.
> 
> The 50/50 split in current code isn't really any more fair, as soon as
> you leave the tiny bubble of fairness, it's not the least bit fair.
> Nor is that tiny bubble all rainbows and unicorns, it brought with it
> benchmark wins and losses, like everything that changes more than
> comments, its price being service latency variance.
> 
> The short term split doesn't really mean all that much, some things
> will like the current fair-bubble better, some eevdf virtual deadline
> math and its less spiky service.  We'll see.
> 
> I'm kinda hoping eevdf works out, FAIR_SLEEPERS is quite annoying to
> squabble with.

Yeah no matter whatever role FAIR_SLEEPERS could play next week, this paves
a brick for Vlastimil Babka to take it over leaving Peter happy to sit
back with netflix on.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 00/17] sched: EEVDF using latency-nice
       [not found] ` <20230410031350.GA49280@maniforge>
@ 2023-04-10  8:23   ` Hillf Danton
  2023-04-11 10:15     ` Mike Galbraith
  0 siblings, 1 reply; 10+ messages in thread
From: Hillf Danton @ 2023-04-10  8:23 UTC (permalink / raw)
  To: David Vernet
  Cc: Peter Zijlstra, mingo, vincent.guittot, linux-kernel, linux-mm,
	mgorman, bristot, corbet, kprateek.nayak, youssefesmat, joel,
	efault

On 9 Apr 2023 22:13:50 -0500 David Vernet <void@manifault.com>
> 
> Hi Peter,
> 
> I used the EEVDF scheduler to run workloads on one of Meta's largest
> services (our main HHVM web server), and I wanted to share my
> observations with you.

Thanks for your testing.
> 
> 3. Low latency + long slice are not mutually exclusive for us
> 
> An interesting quality of web workloads running JIT engines is that they
> require both low-latency, and long slices on the CPU. The reason we need
> the tasks to be low latency is they're on the critical path for
> servicing web requests (for most of their runtime, at least), and the
> reasons we need them to have long slices are enumerated above -- they
> thrash the icache / DSB / iTLB, more aggressive context switching causes
> us to thrash on paging from disk, and in general, these tasks are on the
> critical path for servicing web requests and we want to encourage them
> to run to completion.
> 
> This causes EEVDF to perform poorly for workloads with these
> characteristics. If we decrease latency nice for our web workers then

Take a look at the diff below.

> they'll have lower latency, but only because their slices are smaller.
> This in turn causes the increase in context switches, which causes the
> thrashing described above.
> 
> Worth noting -- I did try and increase the default base slice length by
> setting sysctl_sched_base_slice to 35ms, and these were the results:
> 
> With EEVDF slice 35ms and latency_nice 0
> ----------------------------------------
> - .5 - 2.25% drop in throughput
> - 2.5 - 4.5% increase in p95 latencies
> - 2.5 - 5.25% increase in p99 latencies
> - Context switch per minute increase: 9.5 - 12.4%
> - Involuntary context switch increase: ~320 - 330%
> - Major fault delta: -3.6% to 37.6%
> - IPC decrease .5 - .9%
> 
> With EEVDF slice 35ms and latency_nice -8 for web workers
> ---------------------------------------------------------
> - .5 - 2.5% drop in throughput
> - 1.7 - 4.75% increase in p95 latencies
> - 2.5 - 5% increase in p99 latencies
> - Context switch per minute increase: 10.5 - 15%
> - Involuntary context switch increase: ~327 - 350%
> - Major fault delta: -1% to 45%
> - IPC decrease .4 - 1.1%
> 
> I was expecting the increase in context switches and involuntary context
> switches to be lower what than they ended up being with the increased
> default slice length. Regardless, it still seems to tell a relatively
> consistent story with the numbers from above. The improvement in IPC is
> expected, though also less improved than I was anticipating (presumably
> due to the still-high context switch rate). There were also fewer major
> faults per minute compared to runs with a shorter default slice.
> 
> Note that even if increasing the slice length did cause fewer context
> switches and major faults, I still expect that it would hurt throughput
> and latency for HHVM given that when latency-nicer tasks are eventually
> given the CPU, the web workers will have to wait around for longer than
> we'd like for those tasks to burn through their longer slices.
> 
> In summary, I must admit that this patch set makes me a bit nervous.
> Speaking for Meta at least, the patch set in its current form exceeds
> the performance regressions (generally < .5% at the very most) that
> we're able to tolerate in production. More broadly, it will certainly
> cause us to have to carefully consider how it affects our model for
> server capacity.
> 
> Thanks,
> David
> 

In order to only narrow down the poor performance reported, make a tradeoff
between runtime and latency simply by restoring sysctl_sched_min_granularity
at tick preempt, given the known order on the runqueue.

--- x/kernel/sched/fair.c
+++ y/kernel/sched/fair.c
@@ -5172,6 +5172,12 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
 static void
 check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 {
+	unsigned int sysctl_sched_latency = 1000000ULL;
+	unsigned long delta_exec;
+
+	delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
+	if (delta_exec < sysctl_sched_latency)
+		return;
 	if (pick_eevdf(cfs_rq) != curr) {
 		resched_curr(rq_of(cfs_rq));
 		/*


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 00/17] sched: EEVDF using latency-nice
  2023-04-10  8:23   ` [PATCH 00/17] sched: EEVDF using latency-nice Hillf Danton
@ 2023-04-11 10:15     ` Mike Galbraith
  2023-04-11 13:33       ` Hillf Danton
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Galbraith @ 2023-04-11 10:15 UTC (permalink / raw)
  To: Hillf Danton, David Vernet
  Cc: Peter Zijlstra, mingo, vincent.guittot, linux-kernel, linux-mm,
	mgorman, bristot, corbet, kprateek.nayak, youssefesmat, joel

On Mon, 2023-04-10 at 16:23 +0800, Hillf Danton wrote:
>
> In order to only narrow down the poor performance reported, make a tradeoff
> between runtime and latency simply by restoring sysctl_sched_min_granularity
> at tick preempt, given the known order on the runqueue.

Tick preemption isn't the primary contributor to the scheduling delta,
it's wakeup preemption. If you look at the perf summaries of 5 minute
recordings on my little 8 rq box below, you'll see that the delta is
more than twice what a 250Hz tick could inflict.  You could also just
turn off WAKEUP_PREEMPTION and watch the delta instantly peg negative.

Anyway...

Given we know preemption is markedly up, and as always a source of pain
(as well as gain), perhaps we can try to tamp it down a little without
inserting old constraints into the shiny new scheduler.

The dirt simple tweak below puts a dent in the sting by merely sticking
with whatever decision EEVDF last made until it itself invalidates that
decision. It still selects via the same math, just does so the tiniest
bit less frenetically.

---
 kernel/sched/fair.c     |    3 +++
 kernel/sched/features.h |    6 ++++++
 2 files changed, 9 insertions(+)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -950,6 +950,9 @@ static struct sched_entity *pick_eevdf(s
 	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
 		curr = NULL;

+	if (sched_feat(GENTLE_EEVDF) && curr)
+		return curr;
+
 	while (node) {
 		struct sched_entity *se = __node_2_se(node);

--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -14,6 +14,12 @@ SCHED_FEAT(MINIMAL_VA, false)
 SCHED_FEAT(VALIDATE_QUEUE, false)

 /*
+ * Don't be quite so damn twitchy, once you select a champion let the
+ * poor bastard carry the baton until no longer eligible to do so.
+ */
+SCHED_FEAT(GENTLE_EEVDF, true)
+
+/*
  * Prefer to schedule the task we woke last (assuming it failed
  * wakeup-preemption), since its likely going to consume data we
  * touched, increases cache locality.

perf.data.cfs
 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(8)      |1665786.092 ms |   529819 | avg:   1.046 ms | max:  33.639 ms | sum:554226.960 ms |
  dav1d-worker:(8)      | 187982.593 ms |   448022 | avg:   0.881 ms | max:  35.806 ms | sum:394546.442 ms |
  X:2503                | 102533.714 ms |    89729 | avg:   0.071 ms | max:   9.448 ms | sum: 6372.383 ms |
  VizCompositorTh:5235  |  38717.241 ms |    76743 | avg:   0.632 ms | max:  24.308 ms | sum:48502.097 ms |
  llvmpipe-0:(2)        |  32520.412 ms |    42390 | avg:   1.041 ms | max:  19.804 ms | sum:44116.653 ms |
  llvmpipe-1:(2)        |  32374.548 ms |    35557 | avg:   1.247 ms | max:  17.439 ms | sum:44347.573 ms |
  llvmpipe-2:(2)        |  31579.168 ms |    34292 | avg:   1.312 ms | max:  16.775 ms | sum:45005.225 ms |
  llvmpipe-3:(2)        |  30478.664 ms |    33659 | avg:   1.375 ms | max:  16.863 ms | sum:46268.417 ms |
  llvmpipe-7:(2)        |  29778.002 ms |    30684 | avg:   1.543 ms | max:  17.384 ms | sum:47338.420 ms |
  llvmpipe-4:(2)        |  29741.774 ms |    32832 | avg:   1.433 ms | max:  18.571 ms | sum:47062.280 ms |
  llvmpipe-5:(2)        |  29462.794 ms |    32641 | avg:   1.455 ms | max:  19.802 ms | sum:47497.195 ms |
  llvmpipe-6:(2)        |  28367.114 ms |    32132 | avg:   1.514 ms | max:  16.562 ms | sum:48646.738 ms |
  ThreadPoolForeg:(16)  |  22238.667 ms |    66355 | avg:   0.353 ms | max:  46.477 ms | sum:23409.474 ms |
  VideoFrameCompo:5243  |  17071.755 ms |    75223 | avg:   0.288 ms | max:  33.358 ms | sum:21650.918 ms |
  chrome:(8)            |   6478.351 ms |    47110 | avg:   0.486 ms | max:  28.018 ms | sum:22910.980 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |2317066.420 ms |  2221889 |                 |       46.477 ms |   1629736.515 ms |
 ----------------------------------------------------------------------------------------------------------

perf.data.eevdf
 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(8)      |1673379.930 ms |   743590 | avg:   0.745 ms | max:  28.003 ms | sum:554041.093 ms |
  dav1d-worker:(8)      | 197647.514 ms |  1139053 | avg:   0.434 ms | max:  22.357 ms | sum:494377.980 ms |
  X:2495                | 100741.946 ms |   114808 | avg:   0.191 ms | max:   8.583 ms | sum:21945.360 ms |
  VizCompositorTh:6571  |  37705.863 ms |    74900 | avg:   0.479 ms | max:  16.464 ms | sum:35843.010 ms |
  llvmpipe-6:(2)        |  30757.126 ms |    38941 | avg:   1.448 ms | max:  18.529 ms | sum:56371.507 ms |
  llvmpipe-3:(2)        |  30658.127 ms |    40296 | avg:   1.405 ms | max:  24.791 ms | sum:56601.212 ms |
  llvmpipe-4:(2)        |  30456.388 ms |    40011 | avg:   1.419 ms | max:  23.840 ms | sum:56793.272 ms |
  llvmpipe-2:(2)        |  30395.971 ms |    40828 | avg:   1.394 ms | max:  19.195 ms | sum:56897.961 ms |
  llvmpipe-5:(2)        |  30346.432 ms |    39393 | avg:   1.445 ms | max:  21.747 ms | sum:56917.495 ms |
  llvmpipe-1:(2)        |  30275.694 ms |    41349 | avg:   1.378 ms | max:  20.765 ms | sum:56989.923 ms |
  llvmpipe-7:(2)        |  29768.515 ms |    37626 | avg:   1.532 ms | max:  20.649 ms | sum:57639.337 ms |
  llvmpipe-0:(2)        |  28931.905 ms |    42568 | avg:   1.378 ms | max:  20.942 ms | sum:58667.379 ms |
  ThreadPoolForeg:(60)  |  22598.216 ms |   131514 | avg:   0.342 ms | max:  36.105 ms | sum:44927.149 ms |
  VideoFrameCompo:6587  |  16966.649 ms |    90751 | avg:   0.357 ms | max:  18.199 ms | sum:32379.045 ms |
  chrome:(25)           |   8862.695 ms |    75923 | avg:   0.308 ms | max:  30.821 ms | sum:23347.992 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |2331946.838 ms |  3471615 |                 |       36.105 ms |   1808071.407 ms |
 ----------------------------------------------------------------------------------------------------------

perf.data.eevdf+tweak
 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(8)      |1687121.317 ms |   695518 | avg:   0.760 ms | max:  24.098 ms | sum:528302.626 ms |
  dav1d-worker:(8)      | 183514.008 ms |   922884 | avg:   0.489 ms | max:  32.093 ms | sum:451319.787 ms |
  X:2489                |  99164.486 ms |   101585 | avg:   0.239 ms | max:   8.896 ms | sum:24295.253 ms |
  VizCompositorTh:17881 |  37911.007 ms |    71122 | avg:   0.499 ms | max:  16.743 ms | sum:35460.994 ms |
  llvmpipe-1:(2)        |  29946.625 ms |    40320 | avg:   1.394 ms | max:  23.036 ms | sum:56222.367 ms |
  llvmpipe-2:(2)        |  29910.414 ms |    39677 | avg:   1.412 ms | max:  24.187 ms | sum:56011.791 ms |
  llvmpipe-6:(2)        |  29742.389 ms |    37822 | avg:   1.484 ms | max:  18.228 ms | sum:56109.947 ms |
  llvmpipe-3:(2)        |  29644.994 ms |    39155 | avg:   1.435 ms | max:  21.191 ms | sum:56202.636 ms |
  llvmpipe-5:(2)        |  29520.006 ms |    38037 | avg:   1.482 ms | max:  21.698 ms | sum:56373.679 ms |
  llvmpipe-4:(2)        |  29460.485 ms |    38562 | avg:   1.462 ms | max:  26.308 ms | sum:56389.022 ms |
  llvmpipe-7:(2)        |  29449.959 ms |    36308 | avg:   1.557 ms | max:  21.617 ms | sum:56547.129 ms |
  llvmpipe-0:(2)        |  29041.903 ms |    41207 | avg:   1.389 ms | max:  26.322 ms | sum:57239.666 ms |
  ThreadPoolForeg:(16)  |  22490.094 ms |   112591 | avg:   0.377 ms | max:  27.027 ms | sum:42414.618 ms |
  VideoFrameCompo:17888 |  17385.895 ms |    86651 | avg:   0.367 ms | max:  19.350 ms | sum:31767.043 ms |
  chrome:(8)            |   6826.127 ms |    61487 | avg:   0.306 ms | max:  20.000 ms | sum:18835.879 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |2326181.115 ms |  3081183 |                 |       32.093 ms |   1737425.434 ms |
 ----------------------------------------------------------------------------------------------------------




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 00/17] sched: EEVDF using latency-nice
  2023-04-11 10:15     ` Mike Galbraith
@ 2023-04-11 13:33       ` Hillf Danton
  2023-04-11 14:56         ` Mike Galbraith
       [not found]         ` <20230412025042.1413-1-hdanton@sina.com>
  0 siblings, 2 replies; 10+ messages in thread
From: Hillf Danton @ 2023-04-11 13:33 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, mingo, vincent.guittot, linux-kernel, linux-mm,
	mgorman, bristot, corbet, kprateek.nayak, youssefesmat, joel

On Tue, 11 Apr 2023 12:15:41 +0200 Mike Galbraith <efault@gmx.de>
> On Mon, 2023-04-10 at 16:23 +0800, Hillf Danton wrote:
> >
> > In order to only narrow down the poor performance reported, make a tradeoff
> > between runtime and latency simply by restoring sysctl_sched_min_granularity
> > at tick preempt, given the known order on the runqueue.
> 
> Tick preemption isn't the primary contributor to the scheduling delta,
> it's wakeup preemption. If you look at the perf summaries of 5 minute
> recordings on my little 8 rq box below, you'll see that the delta is
> more than twice what a 250Hz tick could inflict.  You could also just
> turn off WAKEUP_PREEMPTION and watch the delta instantly peg negative.
> 
> Anyway...
> 
> Given we know preemption is markedly up, and as always a source of pain
> (as well as gain), perhaps we can try to tamp it down a little without
> inserting old constraints into the shiny new scheduler.
> 
> The dirt simple tweak below puts a dent in the sting by merely sticking
> with whatever decision EEVDF last made until it itself invalidates that
> decision. It still selects via the same math, just does so the tiniest
> bit less frenetically.
> 
> ---
>  kernel/sched/fair.c     |    3 +++
>  kernel/sched/features.h |    6 ++++++
>  2 files changed, 9 insertions(+)
> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -950,6 +950,9 @@ static struct sched_entity *pick_eevdf(s
>  	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
>  		curr =3D NULL;
> 
> +	if (sched_feat(GENTLE_EEVDF) && curr)
> +		return curr;
> +

This is rather aggressive, given latency-10 curr and latency-0 candidate
at tick hit for instance. And along your direction a mild change is
postpone the preempt wakeup to the next tick.

+++ b/kernel/sched/fair.c
@@ -7932,8 +7932,6 @@ static void check_preempt_wakeup(struct
 		return;
 
 	cfs_rq = cfs_rq_of(se);
-	update_curr(cfs_rq);
-
 	/*
 	 * XXX pick_eevdf(cfs_rq) != se ?
 	 */


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 00/17] sched: EEVDF using latency-nice
  2023-04-11 13:33       ` Hillf Danton
@ 2023-04-11 14:56         ` Mike Galbraith
       [not found]         ` <20230412025042.1413-1-hdanton@sina.com>
  1 sibling, 0 replies; 10+ messages in thread
From: Mike Galbraith @ 2023-04-11 14:56 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Peter Zijlstra, mingo, vincent.guittot, linux-kernel, linux-mm,
	mgorman, bristot, corbet, kprateek.nayak, youssefesmat, joel

On Tue, 2023-04-11 at 21:33 +0800, Hillf Danton wrote:
> On Tue, 11 Apr 2023 12:15:41 +0200 Mike Galbraith <efault@gmx.de>
> >
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -950,6 +950,9 @@ static struct sched_entity *pick_eevdf(s
> >         if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
> >                 curr =3D NULL;
> >
> > +       if (sched_feat(GENTLE_EEVDF) && curr)
> > +               return curr;
> > +
>
> This is rather aggressive, given latency-10 curr and latency-0 candidate
> at tick hit for instance.

The numbers seem to indicate that the ~400k ctx switches eliminated
were meaningless to the load being measures.  I recorded everything for
5 minutes, and the recording wide max actually went down.. but one-off
hits happen regularly in noisy GUI regardless of scheduler, are
difficult to assign meaning to.

Now I'm not saying there is no cost, if you change anything that's
converted to instructions, there is a price tag somewhere, whether you
notice immediately or not.  Nor am I saying that patchlet is golden.  I
am saying that some of the ctx switch delta look very much like useless
overhead that can and should be made to go away.  From my POV, patchlet
actually looks like kinda viable, but to Peter and regression reporter,
it and associated data are presented as a datapoint.

>  And along your direction a mild change is
> postpone the preempt wakeup to the next tick.
>
> +++ b/kernel/sched/fair.c
> @@ -7932,8 +7932,6 @@ static void check_preempt_wakeup(struct
>                 return;
>  
>         cfs_rq = cfs_rq_of(se);
> -       update_curr(cfs_rq);
> -
>         /*
>          * XXX pick_eevdf(cfs_rq) != se ?
>          */

Mmmm, stopping time is a bad idea methinks.

	-Mike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 00/17] sched: EEVDF using latency-nice
       [not found]         ` <20230412025042.1413-1-hdanton@sina.com>
@ 2023-04-12  4:05           ` Mike Galbraith
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Galbraith @ 2023-04-12  4:05 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Peter Zijlstra, mingo, vincent.guittot, linux-kernel, linux-mm,
	mgorman, bristot, corbet, kprateek.nayak, youssefesmat, joel

On Wed, 2023-04-12 at 10:50 +0800, Hillf Danton wrote:
> On Tue, 11 Apr 2023 16:56:24 +0200 Mike Galbraith <efault@gmx.de>
>
>
> The data from you and David (lat_nice: -12 throughput: -.9% to 0.25%) is
> supporting eevdf, given a optimization <5% could be safely ignored in general
> (while 10% good and 20% standing ovation).
>

There's nothing pro or con here, David's testing seems to agree with my
own testing that a bit of adjustment may be necessary and that's it.
Cold hard numbers to developer, completely optional mitigation tweak to
fellow tester.. and we're done.

	-Mike


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-04-12  4:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230328092622.062917921@infradead.org>
     [not found] ` <20230328110354.641979416@infradead.org>
2023-03-30  7:53   ` [PATCH 15/17] [RFC] sched/eevdf: Sleeper bonus Hillf Danton
     [not found] ` <20230328110354.141543852@infradead.org>
2023-03-30 11:02   ` [PATCH 08/17] sched/fair: Implement an EEVDF like policy Hillf Danton
     [not found] ` <20230328110354.562078801@infradead.org>
     [not found]   ` <CAKfTPtAkFBw5zt0+WK7dWBUE9OrbOOExG8ueUE6ogdCEQZhpXQ@mail.gmail.com>
2023-04-01 23:23     ` [PATCH 14/17] sched/eevdf: Better handle mixed slice length Hillf Danton
2023-04-02  2:40       ` Mike Galbraith
2023-04-02  6:28         ` Hillf Danton
     [not found] ` <20230410031350.GA49280@maniforge>
2023-04-10  8:23   ` [PATCH 00/17] sched: EEVDF using latency-nice Hillf Danton
2023-04-11 10:15     ` Mike Galbraith
2023-04-11 13:33       ` Hillf Danton
2023-04-11 14:56         ` Mike Galbraith
     [not found]         ` <20230412025042.1413-1-hdanton@sina.com>
2023-04-12  4:05           ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox