From: Hillf Danton <hdanton@sina.com>
To: David Vernet <void@manifault.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
mingo@kernel.org, vincent.guittot@linaro.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
mgorman@suse.de, bristot@redhat.com, corbet@lwn.net,
kprateek.nayak@amd.com, youssefesmat@chromium.org,
joel@joelfernandes.org, efault@gmx.de
Subject: Re: [PATCH 00/17] sched: EEVDF using latency-nice
Date: Mon, 10 Apr 2023 16:23:07 +0800 [thread overview]
Message-ID: <20230410082307.1327-1-hdanton@sina.com> (raw)
In-Reply-To: <20230410031350.GA49280@maniforge>
On 9 Apr 2023 22:13:50 -0500 David Vernet <void@manifault.com>
>
> Hi Peter,
>
> I used the EEVDF scheduler to run workloads on one of Meta's largest
> services (our main HHVM web server), and I wanted to share my
> observations with you.
Thanks for your testing.
>
> 3. Low latency + long slice are not mutually exclusive for us
>
> An interesting quality of web workloads running JIT engines is that they
> require both low-latency, and long slices on the CPU. The reason we need
> the tasks to be low latency is they're on the critical path for
> servicing web requests (for most of their runtime, at least), and the
> reasons we need them to have long slices are enumerated above -- they
> thrash the icache / DSB / iTLB, more aggressive context switching causes
> us to thrash on paging from disk, and in general, these tasks are on the
> critical path for servicing web requests and we want to encourage them
> to run to completion.
>
> This causes EEVDF to perform poorly for workloads with these
> characteristics. If we decrease latency nice for our web workers then
Take a look at the diff below.
> they'll have lower latency, but only because their slices are smaller.
> This in turn causes the increase in context switches, which causes the
> thrashing described above.
>
> Worth noting -- I did try and increase the default base slice length by
> setting sysctl_sched_base_slice to 35ms, and these were the results:
>
> With EEVDF slice 35ms and latency_nice 0
> ----------------------------------------
> - .5 - 2.25% drop in throughput
> - 2.5 - 4.5% increase in p95 latencies
> - 2.5 - 5.25% increase in p99 latencies
> - Context switch per minute increase: 9.5 - 12.4%
> - Involuntary context switch increase: ~320 - 330%
> - Major fault delta: -3.6% to 37.6%
> - IPC decrease .5 - .9%
>
> With EEVDF slice 35ms and latency_nice -8 for web workers
> ---------------------------------------------------------
> - .5 - 2.5% drop in throughput
> - 1.7 - 4.75% increase in p95 latencies
> - 2.5 - 5% increase in p99 latencies
> - Context switch per minute increase: 10.5 - 15%
> - Involuntary context switch increase: ~327 - 350%
> - Major fault delta: -1% to 45%
> - IPC decrease .4 - 1.1%
>
> I was expecting the increase in context switches and involuntary context
> switches to be lower what than they ended up being with the increased
> default slice length. Regardless, it still seems to tell a relatively
> consistent story with the numbers from above. The improvement in IPC is
> expected, though also less improved than I was anticipating (presumably
> due to the still-high context switch rate). There were also fewer major
> faults per minute compared to runs with a shorter default slice.
>
> Note that even if increasing the slice length did cause fewer context
> switches and major faults, I still expect that it would hurt throughput
> and latency for HHVM given that when latency-nicer tasks are eventually
> given the CPU, the web workers will have to wait around for longer than
> we'd like for those tasks to burn through their longer slices.
>
> In summary, I must admit that this patch set makes me a bit nervous.
> Speaking for Meta at least, the patch set in its current form exceeds
> the performance regressions (generally < .5% at the very most) that
> we're able to tolerate in production. More broadly, it will certainly
> cause us to have to carefully consider how it affects our model for
> server capacity.
>
> Thanks,
> David
>
In order to only narrow down the poor performance reported, make a tradeoff
between runtime and latency simply by restoring sysctl_sched_min_granularity
at tick preempt, given the known order on the runqueue.
--- x/kernel/sched/fair.c
+++ y/kernel/sched/fair.c
@@ -5172,6 +5172,12 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
static void
check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
{
+ unsigned int sysctl_sched_latency = 1000000ULL;
+ unsigned long delta_exec;
+
+ delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
+ if (delta_exec < sysctl_sched_latency)
+ return;
if (pick_eevdf(cfs_rq) != curr) {
resched_curr(rq_of(cfs_rq));
/*
next prev parent reply other threads:[~2023-04-10 8:23 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20230328092622.062917921@infradead.org>
[not found] ` <20230328110354.641979416@infradead.org>
2023-03-30 7:53 ` [PATCH 15/17] [RFC] sched/eevdf: Sleeper bonus Hillf Danton
[not found] ` <20230328110354.141543852@infradead.org>
2023-03-30 11:02 ` [PATCH 08/17] sched/fair: Implement an EEVDF like policy Hillf Danton
[not found] ` <20230328110354.562078801@infradead.org>
[not found] ` <CAKfTPtAkFBw5zt0+WK7dWBUE9OrbOOExG8ueUE6ogdCEQZhpXQ@mail.gmail.com>
2023-04-01 23:23 ` [PATCH 14/17] sched/eevdf: Better handle mixed slice length Hillf Danton
2023-04-02 2:40 ` Mike Galbraith
2023-04-02 6:28 ` Hillf Danton
[not found] ` <20230410031350.GA49280@maniforge>
2023-04-10 8:23 ` Hillf Danton [this message]
2023-04-11 10:15 ` [PATCH 00/17] sched: EEVDF using latency-nice Mike Galbraith
2023-04-11 13:33 ` Hillf Danton
2023-04-11 14:56 ` Mike Galbraith
[not found] ` <20230412025042.1413-1-hdanton@sina.com>
2023-04-12 4:05 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230410082307.1327-1-hdanton@sina.com \
--to=hdanton@sina.com \
--cc=bristot@redhat.com \
--cc=corbet@lwn.net \
--cc=efault@gmx.de \
--cc=joel@joelfernandes.org \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=youssefesmat@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox