ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends
@ 2015-07-15 15:37 Chris Mason
  2015-07-15 19:23 ` Kristen Accardi
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Chris Mason @ 2015-07-15 15:37 UTC (permalink / raw)
  To: ksummit-discuss

Hi everyone,

I know I never get bored of graphs comparing old/new, but I feel guilty
suggesting this one yet again.  Still, I think it's important for the
people trying to push new kernels into production to have a chance to
talk about the problems we've hit, and/or the changes that have made
life easier.

We're starting to push 4.0 into prod (122 hosts almost counts), and I'm
sure we'll backport some wins from 4.2+.  I'm hoping to make this a
collection point for other benchmarking war stories.  Our biggest gains
right now are coming from scsi-mq, and early benchmarks show 4.2 has a
boost that I'm hoping are from the futex locking improvements.

It ties in a little with the new interfaces applications may be able to use
(restartable sequences etc topic), and I want to ask the broad question of
"are we doing enough to prevent performance regressions".

We have a long list of people involved on the Facebook side, Jens at the
very least can talk about the scsi/block-mq benchmarks.  I'd love to hear
Fengguang's thoughts as well.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends
  2015-07-15 15:37 [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends Chris Mason
@ 2015-07-15 19:23 ` Kristen Accardi
  2015-07-15 19:39   ` David Woodhouse
  2015-07-17 21:22 ` Davidlohr Bueso
  2015-08-03  4:58 ` Fengguang Wu
  2 siblings, 1 reply; 10+ messages in thread
From: Kristen Accardi @ 2015-07-15 19:23 UTC (permalink / raw)
  To: Chris Mason, ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 1667 bytes --]

I am very interested to hear what if any benchmarks people are doing on
non-server platforms as well.  I find it difficult to know what benchmarks
customers think are important for upstream linux for desktop and mobile
systems that are not running Android or Chrome, and would love to hear what
others are using.

On Wed, Jul 15, 2015 at 8:37 AM Chris Mason <clm@fb.com> wrote:

> Hi everyone,
>
> I know I never get bored of graphs comparing old/new, but I feel guilty
> suggesting this one yet again.  Still, I think it's important for the
> people trying to push new kernels into production to have a chance to
> talk about the problems we've hit, and/or the changes that have made
> life easier.
>
> We're starting to push 4.0 into prod (122 hosts almost counts), and I'm
> sure we'll backport some wins from 4.2+.  I'm hoping to make this a
> collection point for other benchmarking war stories.  Our biggest gains
> right now are coming from scsi-mq, and early benchmarks show 4.2 has a
> boost that I'm hoping are from the futex locking improvements.
>
> It ties in a little with the new interfaces applications may be able to use
> (restartable sequences etc topic), and I want to ask the broad question of
> "are we doing enough to prevent performance regressions".
>
> We have a long list of people involved on the Facebook side, Jens at the
> very least can talk about the scsi/block-mq benchmarks.  I'd love to hear
> Fengguang's thoughts as well.
>
> -chris
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss
>

[-- Attachment #2: Type: text/html, Size: 2227 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends
  2015-07-15 19:23 ` Kristen Accardi
@ 2015-07-15 19:39   ` David Woodhouse
  2015-07-15 19:58     ` Chris Mason
  0 siblings, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2015-07-15 19:39 UTC (permalink / raw)
  To: Kristen Accardi, Chris Mason, ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 754 bytes --]

On Wed, 2015-07-15 at 19:23 +0000, Kristen Accardi wrote:
> I am very interested to hear what if any benchmarks people are doing 
> on non-server platforms as well.  I find it difficult to know what 
> benchmarks customers think are important for upstream linux for 
> desktop and mobile systems that are not running Android or Chrome, 
> and would love to hear what others are using.

I would imagine the most interesting metrics are all about power vs.
performance, rather than pure performance. 

In fact, in the server environment where you have to pay for the power
in the first place, and then pay again for the air conditioning to
extract the resulting heat, I'm surprised it isn't already more of a
consideration.


-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends
  2015-07-15 19:39   ` David Woodhouse
@ 2015-07-15 19:58     ` Chris Mason
  2015-07-15 20:32       ` Kristen Accardi
  2015-07-16  1:35       ` Len Brown
  0 siblings, 2 replies; 10+ messages in thread
From: Chris Mason @ 2015-07-15 19:58 UTC (permalink / raw)
  To: David Woodhouse; +Cc: ksummit-discuss

On Wed, Jul 15, 2015 at 08:39:55PM +0100, David Woodhouse wrote:
> On Wed, 2015-07-15 at 19:23 +0000, Kristen Accardi wrote:
> > I am very interested to hear what if any benchmarks people are doing 
> > on non-server platforms as well.  I find it difficult to know what 
> > benchmarks customers think are important for upstream linux for 
> > desktop and mobile systems that are not running Android or Chrome, 
> > and would love to hear what others are using.
> 
> I would imagine the most interesting metrics are all about power vs.
> performance, rather than pure performance. 
> 
> In fact, in the server environment where you have to pay for the power
> in the first place, and then pay again for the air conditioning to
> extract the resulting heat, I'm surprised it isn't already more of a
> consideration.

It would be fun to use turbostat or a rack power meter to
measure/compare power usage between two kernels in a given benchmark.  I
think the power meters we do have are not going to be fine grained
enough to give valid results, but if turbostat is consistent enough we
could try it.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends
  2015-07-15 19:58     ` Chris Mason
@ 2015-07-15 20:32       ` Kristen Accardi
  2015-07-17 19:38         ` Artem Bityutskiy
  2015-07-16  1:35       ` Len Brown
  1 sibling, 1 reply; 10+ messages in thread
From: Kristen Accardi @ 2015-07-15 20:32 UTC (permalink / raw)
  To: Chris Mason, David Woodhouse; +Cc: Artem Bityutskiy, ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]

On Wed, Jul 15, 2015 at 12:58 PM Chris Mason <clm@fb.com> wrote:

> On Wed, Jul 15, 2015 at 08:39:55PM +0100, David Woodhouse wrote:
> > On Wed, 2015-07-15 at 19:23 +0000, Kristen Accardi wrote:
> > > I am very interested to hear what if any benchmarks people are doing
> > > on non-server platforms as well.  I find it difficult to know what
> > > benchmarks customers think are important for upstream linux for
> > > desktop and mobile systems that are not running Android or Chrome,
> > > and would love to hear what others are using.
> >
> > I would imagine the most interesting metrics are all about power vs.
> > performance, rather than pure performance.
> >
> > In fact, in the server environment where you have to pay for the power
> > in the first place, and then pay again for the air conditioning to
> > extract the resulting heat, I'm surprised it isn't already more of a
> > consideration.
>
> It would be fun to use turbostat or a rack power meter to
> measure/compare power usage between two kernels in a given benchmark.  I
> think the power meters we do have are not going to be fine grained
> enough to give valid results, but if turbostat is consistent enough we
> could try it.
>
> -chris
>
>
Artem has setup this capability with 2 server benchmarks (specpower and
specweb) and has been taking a look at upstream kernel PnP with regard to
these benchmarks on specific intel platforms.

[-- Attachment #2: Type: text/html, Size: 1829 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends
  2015-07-15 19:58     ` Chris Mason
  2015-07-15 20:32       ` Kristen Accardi
@ 2015-07-16  1:35       ` Len Brown
  2015-08-02 11:49         ` Fengguang Wu
  1 sibling, 1 reply; 10+ messages in thread
From: Len Brown @ 2015-07-16  1:35 UTC (permalink / raw)
  To: Chris Mason; +Cc: ksummit-discuss

> It would be fun to use turbostat or a rack power meter to
> measure/compare power usage between two kernels in a given benchmark.  I
> think the power meters we do have are not going to be fine grained
> enough to give valid results, but if turbostat is consistent enough we
> could try it.

The RAPL power meters exported by turbostat can correlate surprisingly
well with highly accurate external power meters.  But even if perfect,
RAPL doesn't know about
the hardware outside of the processor package (except Xeon DRAM), so
the absolute
numbers will not match an AC power meter.  But differences are visible
and consistent.
The accuracy and the quality of correlation with actual electricals
varies a lot with
the type of processor.  In general, Xeon is the best, followed by
desktop/mobile core,
and Atom's RAPL power meters have been the least accurate of those
shipped, so far.

Yes, 0-day is using this output today to identify regressions, without
any external power meters.  But they are also adding external power meters.

There are also systems with instrumented power supplies which export
the system AC power via IPMI.

cheers,
Len Brown, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends
  2015-07-15 20:32       ` Kristen Accardi
@ 2015-07-17 19:38         ` Artem Bityutskiy
  0 siblings, 0 replies; 10+ messages in thread
From: Artem Bityutskiy @ 2015-07-17 19:38 UTC (permalink / raw)
  To: Kristen Accardi, Chris Mason, David Woodhouse; +Cc: ksummit-discuss

On Wed, 2015-07-15 at 20:32 +0000, Kristen Accardi wrote:
> > It would be fun to use turbostat or a rack power meter to
> > measure/compare power usage between two kernels in a given 
> > benchmark.  I
> > think the power meters we do have are not going to be fine grained
> > enough to give valid results, but if turbostat is consistent enough 
> > we
> > could try it.

Yes, this is what we are trying to build/automate. Run power-aware
server benchmarks, before and after a kernel patch(es), compare, tell
the delta in a smart and easy to interpret way. Power is measured with
a real power meter. The project is internal so far.


-- 



  
  


Best Regards,

Artem Bityutskiy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends
  2015-07-15 15:37 [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends Chris Mason
  2015-07-15 19:23 ` Kristen Accardi
@ 2015-07-17 21:22 ` Davidlohr Bueso
  2015-08-03  4:58 ` Fengguang Wu
  2 siblings, 0 replies; 10+ messages in thread
From: Davidlohr Bueso @ 2015-07-17 21:22 UTC (permalink / raw)
  To: Chris Mason; +Cc: ksummit-discuss

On Wed, 2015-07-15 at 11:37 -0400, Chris Mason wrote:
> We're starting to push 4.0 into prod (122 hosts almost counts), and I'm
> sure we'll backport some wins from 4.2+.  I'm hoping to make this a
> collection point for other benchmarking war stories.  Our biggest gains
> right now are coming from scsi-mq, and early benchmarks show 4.2 has a
> boost that I'm hoping are from the futex locking improvements.

At least for 4.2 you might also want to keep an eye out for the new
qspinlock stuff. Which could be another source of the performance boost
you are seeing. Of course I have no idea what your workload does other
than suffer from futexes.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends
  2015-07-16  1:35       ` Len Brown
@ 2015-08-02 11:49         ` Fengguang Wu
  0 siblings, 0 replies; 10+ messages in thread
From: Fengguang Wu @ 2015-08-02 11:49 UTC (permalink / raw)
  To: Len Brown; +Cc: ksummit-discuss

On Wed, Jul 15, 2015 at 09:35:51PM -0400, Len Brown wrote:
> > It would be fun to use turbostat or a rack power meter to
> > measure/compare power usage between two kernels in a given benchmark.  I
> > think the power meters we do have are not going to be fine grained
> > enough to give valid results, but if turbostat is consistent enough we
> > could try it.
> 
> The RAPL power meters exported by turbostat can correlate surprisingly
> well with highly accurate external power meters.  But even if perfect,
> RAPL doesn't know about
> the hardware outside of the processor package (except Xeon DRAM), so
> the absolute
> numbers will not match an AC power meter.  But differences are visible
> and consistent.
> The accuracy and the quality of correlation with actual electricals
> varies a lot with
> the type of processor.  In general, Xeon is the best, followed by
> desktop/mobile core,
> and Atom's RAPL power meters have been the least accurate of those
> shipped, so far.
> 
> Yes, 0-day is using this output today to identify regressions, without
> any external power meters.  But they are also adding external power meters.

Yeah we collect turbostat stats in every benchmark it runs and the
machines that support RAPL. It has been effective in catching power
regressions.

There are also 4 external power meters to measure whole-machine power
consumption, however that number is limited comparing to the machines
that support RAPL.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends
  2015-07-15 15:37 [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends Chris Mason
  2015-07-15 19:23 ` Kristen Accardi
  2015-07-17 21:22 ` Davidlohr Bueso
@ 2015-08-03  4:58 ` Fengguang Wu
  2 siblings, 0 replies; 10+ messages in thread
From: Fengguang Wu @ 2015-08-03  4:58 UTC (permalink / raw)
  To: Chris Mason; +Cc: ksummit-discuss

On Wed, Jul 15, 2015 at 11:37:25AM -0400, Chris Mason wrote:
> Hi everyone,
> 
> I know I never get bored of graphs comparing old/new, but I feel guilty
> suggesting this one yet again.  Still, I think it's important for the
> people trying to push new kernels into production to have a chance to
> talk about the problems we've hit, and/or the changes that have made
> life easier.

I'm very interested in learning your experiences and problems, and
check whether they can be avoided in upstream kernel. So that
production systems like Facebook can upgrade kernels smoother in future.

> We're starting to push 4.0 into prod (122 hosts almost counts), and I'm
> sure we'll backport some wins from 4.2+.  I'm hoping to make this a
> collection point for other benchmarking war stories.  Our biggest gains
> right now are coming from scsi-mq, and early benchmarks show 4.2 has a
> boost that I'm hoping are from the futex locking improvements.

I can also share the performance trends in the data collected by 0day.
I'm afraid it'll be a bit negative because we cannot catchup with
writing new test cases to take advantage of the improvements in new
kernels.

Here is a comparison for a set of 988 test jobs.

                   v4.0    v4.1
-------------------------------
    perf-index      100      99  (the larger, the better)
   power-index      100      95
 latency-index      100      98
    size-index      100      98

The overall regressions also indicate 0day is not mature enough to
bisect all regressions in time and keep them from hitting mainline.

> It ties in a little with the new interfaces applications may be able to use
> (restartable sequences etc topic), and I want to ask the broad question of
> "are we doing enough to prevent performance regressions".

There are much to be desired in 0day POV.

- timeliness

The earlier regressions are caught, the better. Up to now kbuild is
doing reasonably well (mostly within 1 hour), however the runtime
tests -- boot, functional, performance/power/latency -- still have
obvious gaps (typically days long but sometimes may go up to weeks).

- coverage

Kbuild has achieved near 100% coverage (700 reports per month).
However runtime tests are far from enough (50 reports per month).

This is the area that needs collaborations throughout the community.
Developers in each subsystem -- mm, fs, network, rcu, sched, cgroup,
VM, drm, media, etc. -- may have versatile ways for testing his
subsystem or feature set:

- run some WORKLOAD to evaluate performance/power/latency/..

- SETUP the system in different ways to run tests
  eg. fs params, md/dm setup, cgroup, NUMA policy, CPU affinity, ..

- MONITOR various system metrics during the test run

If such knowledge and scripts can be shared and accumulated it'd be
valuable for other developers and testers, and will eventually help
overall linux kernel health.

Up to now 0day has collected a number of WORKLOAD, SETUP and MONITOR
scripts. They are public available here

https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/

There are much more to be desired. Contribution of new scripts will be
highly appreciated.

We are especially in short of SETUP scripts. Good test schemes should
cover different combinations of SETUP+WORKLOAD and their parameters.
There are presumably a huge number of ways one can configure his
system, however most are beyond our imagination and test scope.

For MONITOR/WORKLOAD scripts, we borrowed some few nice scripts from
Mel's MMTests. phoronix, xfstests, autotest, kernel selftests etc.
test suites are also running routinely in 0day infrastructure. So if
you add new test case to one of them, there are good chances it'll be
pick up by 0day.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-08-03  4:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-15 15:37 [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends Chris Mason
2015-07-15 19:23 ` Kristen Accardi
2015-07-15 19:39   ` David Woodhouse
2015-07-15 19:58     ` Chris Mason
2015-07-15 20:32       ` Kristen Accardi
2015-07-17 19:38         ` Artem Bityutskiy
2015-07-16  1:35       ` Len Brown
2015-08-02 11:49         ` Fengguang Wu
2015-07-17 21:22 ` Davidlohr Bueso
2015-08-03  4:58 ` Fengguang Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox