* [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends
@ 2015-07-15 15:37 Chris Mason
2015-07-15 19:23 ` Kristen Accardi
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Chris Mason @ 2015-07-15 15:37 UTC (permalink / raw)
To: ksummit-discuss
Hi everyone,
I know I never get bored of graphs comparing old/new, but I feel guilty
suggesting this one yet again. Still, I think it's important for the
people trying to push new kernels into production to have a chance to
talk about the problems we've hit, and/or the changes that have made
life easier.
We're starting to push 4.0 into prod (122 hosts almost counts), and I'm
sure we'll backport some wins from 4.2+. I'm hoping to make this a
collection point for other benchmarking war stories. Our biggest gains
right now are coming from scsi-mq, and early benchmarks show 4.2 has a
boost that I'm hoping are from the futex locking improvements.
It ties in a little with the new interfaces applications may be able to use
(restartable sequences etc topic), and I want to ask the broad question of
"are we doing enough to prevent performance regressions".
We have a long list of people involved on the Facebook side, Jens at the
very least can talk about the scsi/block-mq benchmarks. I'd love to hear
Fengguang's thoughts as well.
-chris
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends 2015-07-15 15:37 [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends Chris Mason @ 2015-07-15 19:23 ` Kristen Accardi 2015-07-15 19:39 ` David Woodhouse 2015-07-17 21:22 ` Davidlohr Bueso 2015-08-03 4:58 ` Fengguang Wu 2 siblings, 1 reply; 10+ messages in thread From: Kristen Accardi @ 2015-07-15 19:23 UTC (permalink / raw) To: Chris Mason, ksummit-discuss [-- Attachment #1: Type: text/plain, Size: 1667 bytes --] I am very interested to hear what if any benchmarks people are doing on non-server platforms as well. I find it difficult to know what benchmarks customers think are important for upstream linux for desktop and mobile systems that are not running Android or Chrome, and would love to hear what others are using. On Wed, Jul 15, 2015 at 8:37 AM Chris Mason <clm@fb.com> wrote: > Hi everyone, > > I know I never get bored of graphs comparing old/new, but I feel guilty > suggesting this one yet again. Still, I think it's important for the > people trying to push new kernels into production to have a chance to > talk about the problems we've hit, and/or the changes that have made > life easier. > > We're starting to push 4.0 into prod (122 hosts almost counts), and I'm > sure we'll backport some wins from 4.2+. I'm hoping to make this a > collection point for other benchmarking war stories. Our biggest gains > right now are coming from scsi-mq, and early benchmarks show 4.2 has a > boost that I'm hoping are from the futex locking improvements. > > It ties in a little with the new interfaces applications may be able to use > (restartable sequences etc topic), and I want to ask the broad question of > "are we doing enough to prevent performance regressions". > > We have a long list of people involved on the Facebook side, Jens at the > very least can talk about the scsi/block-mq benchmarks. I'd love to hear > Fengguang's thoughts as well. > > -chris > _______________________________________________ > Ksummit-discuss mailing list > Ksummit-discuss@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss > [-- Attachment #2: Type: text/html, Size: 2227 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends 2015-07-15 19:23 ` Kristen Accardi @ 2015-07-15 19:39 ` David Woodhouse 2015-07-15 19:58 ` Chris Mason 0 siblings, 1 reply; 10+ messages in thread From: David Woodhouse @ 2015-07-15 19:39 UTC (permalink / raw) To: Kristen Accardi, Chris Mason, ksummit-discuss [-- Attachment #1: Type: text/plain, Size: 754 bytes --] On Wed, 2015-07-15 at 19:23 +0000, Kristen Accardi wrote: > I am very interested to hear what if any benchmarks people are doing > on non-server platforms as well. I find it difficult to know what > benchmarks customers think are important for upstream linux for > desktop and mobile systems that are not running Android or Chrome, > and would love to hear what others are using. I would imagine the most interesting metrics are all about power vs. performance, rather than pure performance. In fact, in the server environment where you have to pay for the power in the first place, and then pay again for the air conditioning to extract the resulting heat, I'm surprised it isn't already more of a consideration. -- dwmw2 [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5691 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends 2015-07-15 19:39 ` David Woodhouse @ 2015-07-15 19:58 ` Chris Mason 2015-07-15 20:32 ` Kristen Accardi 2015-07-16 1:35 ` Len Brown 0 siblings, 2 replies; 10+ messages in thread From: Chris Mason @ 2015-07-15 19:58 UTC (permalink / raw) To: David Woodhouse; +Cc: ksummit-discuss On Wed, Jul 15, 2015 at 08:39:55PM +0100, David Woodhouse wrote: > On Wed, 2015-07-15 at 19:23 +0000, Kristen Accardi wrote: > > I am very interested to hear what if any benchmarks people are doing > > on non-server platforms as well. I find it difficult to know what > > benchmarks customers think are important for upstream linux for > > desktop and mobile systems that are not running Android or Chrome, > > and would love to hear what others are using. > > I would imagine the most interesting metrics are all about power vs. > performance, rather than pure performance. > > In fact, in the server environment where you have to pay for the power > in the first place, and then pay again for the air conditioning to > extract the resulting heat, I'm surprised it isn't already more of a > consideration. It would be fun to use turbostat or a rack power meter to measure/compare power usage between two kernels in a given benchmark. I think the power meters we do have are not going to be fine grained enough to give valid results, but if turbostat is consistent enough we could try it. -chris ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends 2015-07-15 19:58 ` Chris Mason @ 2015-07-15 20:32 ` Kristen Accardi 2015-07-17 19:38 ` Artem Bityutskiy 2015-07-16 1:35 ` Len Brown 1 sibling, 1 reply; 10+ messages in thread From: Kristen Accardi @ 2015-07-15 20:32 UTC (permalink / raw) To: Chris Mason, David Woodhouse; +Cc: Artem Bityutskiy, ksummit-discuss [-- Attachment #1: Type: text/plain, Size: 1406 bytes --] On Wed, Jul 15, 2015 at 12:58 PM Chris Mason <clm@fb.com> wrote: > On Wed, Jul 15, 2015 at 08:39:55PM +0100, David Woodhouse wrote: > > On Wed, 2015-07-15 at 19:23 +0000, Kristen Accardi wrote: > > > I am very interested to hear what if any benchmarks people are doing > > > on non-server platforms as well. I find it difficult to know what > > > benchmarks customers think are important for upstream linux for > > > desktop and mobile systems that are not running Android or Chrome, > > > and would love to hear what others are using. > > > > I would imagine the most interesting metrics are all about power vs. > > performance, rather than pure performance. > > > > In fact, in the server environment where you have to pay for the power > > in the first place, and then pay again for the air conditioning to > > extract the resulting heat, I'm surprised it isn't already more of a > > consideration. > > It would be fun to use turbostat or a rack power meter to > measure/compare power usage between two kernels in a given benchmark. I > think the power meters we do have are not going to be fine grained > enough to give valid results, but if turbostat is consistent enough we > could try it. > > -chris > > Artem has setup this capability with 2 server benchmarks (specpower and specweb) and has been taking a look at upstream kernel PnP with regard to these benchmarks on specific intel platforms. [-- Attachment #2: Type: text/html, Size: 1829 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends 2015-07-15 20:32 ` Kristen Accardi @ 2015-07-17 19:38 ` Artem Bityutskiy 0 siblings, 0 replies; 10+ messages in thread From: Artem Bityutskiy @ 2015-07-17 19:38 UTC (permalink / raw) To: Kristen Accardi, Chris Mason, David Woodhouse; +Cc: ksummit-discuss On Wed, 2015-07-15 at 20:32 +0000, Kristen Accardi wrote: > > It would be fun to use turbostat or a rack power meter to > > measure/compare power usage between two kernels in a given > > benchmark. I > > think the power meters we do have are not going to be fine grained > > enough to give valid results, but if turbostat is consistent enough > > we > > could try it. Yes, this is what we are trying to build/automate. Run power-aware server benchmarks, before and after a kernel patch(es), compare, tell the delta in a smart and easy to interpret way. Power is measured with a real power meter. The project is internal so far. -- Best Regards, Artem Bityutskiy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends 2015-07-15 19:58 ` Chris Mason 2015-07-15 20:32 ` Kristen Accardi @ 2015-07-16 1:35 ` Len Brown 2015-08-02 11:49 ` Fengguang Wu 1 sibling, 1 reply; 10+ messages in thread From: Len Brown @ 2015-07-16 1:35 UTC (permalink / raw) To: Chris Mason; +Cc: ksummit-discuss > It would be fun to use turbostat or a rack power meter to > measure/compare power usage between two kernels in a given benchmark. I > think the power meters we do have are not going to be fine grained > enough to give valid results, but if turbostat is consistent enough we > could try it. The RAPL power meters exported by turbostat can correlate surprisingly well with highly accurate external power meters. But even if perfect, RAPL doesn't know about the hardware outside of the processor package (except Xeon DRAM), so the absolute numbers will not match an AC power meter. But differences are visible and consistent. The accuracy and the quality of correlation with actual electricals varies a lot with the type of processor. In general, Xeon is the best, followed by desktop/mobile core, and Atom's RAPL power meters have been the least accurate of those shipped, so far. Yes, 0-day is using this output today to identify regressions, without any external power meters. But they are also adding external power meters. There are also systems with instrumented power supplies which export the system AC power via IPMI. cheers, Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends 2015-07-16 1:35 ` Len Brown @ 2015-08-02 11:49 ` Fengguang Wu 0 siblings, 0 replies; 10+ messages in thread From: Fengguang Wu @ 2015-08-02 11:49 UTC (permalink / raw) To: Len Brown; +Cc: ksummit-discuss On Wed, Jul 15, 2015 at 09:35:51PM -0400, Len Brown wrote: > > It would be fun to use turbostat or a rack power meter to > > measure/compare power usage between two kernels in a given benchmark. I > > think the power meters we do have are not going to be fine grained > > enough to give valid results, but if turbostat is consistent enough we > > could try it. > > The RAPL power meters exported by turbostat can correlate surprisingly > well with highly accurate external power meters. But even if perfect, > RAPL doesn't know about > the hardware outside of the processor package (except Xeon DRAM), so > the absolute > numbers will not match an AC power meter. But differences are visible > and consistent. > The accuracy and the quality of correlation with actual electricals > varies a lot with > the type of processor. In general, Xeon is the best, followed by > desktop/mobile core, > and Atom's RAPL power meters have been the least accurate of those > shipped, so far. > > Yes, 0-day is using this output today to identify regressions, without > any external power meters. But they are also adding external power meters. Yeah we collect turbostat stats in every benchmark it runs and the machines that support RAPL. It has been effective in catching power regressions. There are also 4 external power meters to measure whole-machine power consumption, however that number is limited comparing to the machines that support RAPL. Thanks, Fengguang ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends 2015-07-15 15:37 [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends Chris Mason 2015-07-15 19:23 ` Kristen Accardi @ 2015-07-17 21:22 ` Davidlohr Bueso 2015-08-03 4:58 ` Fengguang Wu 2 siblings, 0 replies; 10+ messages in thread From: Davidlohr Bueso @ 2015-07-17 21:22 UTC (permalink / raw) To: Chris Mason; +Cc: ksummit-discuss On Wed, 2015-07-15 at 11:37 -0400, Chris Mason wrote: > We're starting to push 4.0 into prod (122 hosts almost counts), and I'm > sure we'll backport some wins from 4.2+. I'm hoping to make this a > collection point for other benchmarking war stories. Our biggest gains > right now are coming from scsi-mq, and early benchmarks show 4.2 has a > boost that I'm hoping are from the futex locking improvements. At least for 4.2 you might also want to keep an eye out for the new qspinlock stuff. Which could be another source of the performance boost you are seeing. Of course I have no idea what your workload does other than suffer from futexes. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends 2015-07-15 15:37 [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends Chris Mason 2015-07-15 19:23 ` Kristen Accardi 2015-07-17 21:22 ` Davidlohr Bueso @ 2015-08-03 4:58 ` Fengguang Wu 2 siblings, 0 replies; 10+ messages in thread From: Fengguang Wu @ 2015-08-03 4:58 UTC (permalink / raw) To: Chris Mason; +Cc: ksummit-discuss On Wed, Jul 15, 2015 at 11:37:25AM -0400, Chris Mason wrote: > Hi everyone, > > I know I never get bored of graphs comparing old/new, but I feel guilty > suggesting this one yet again. Still, I think it's important for the > people trying to push new kernels into production to have a chance to > talk about the problems we've hit, and/or the changes that have made > life easier. I'm very interested in learning your experiences and problems, and check whether they can be avoided in upstream kernel. So that production systems like Facebook can upgrade kernels smoother in future. > We're starting to push 4.0 into prod (122 hosts almost counts), and I'm > sure we'll backport some wins from 4.2+. I'm hoping to make this a > collection point for other benchmarking war stories. Our biggest gains > right now are coming from scsi-mq, and early benchmarks show 4.2 has a > boost that I'm hoping are from the futex locking improvements. I can also share the performance trends in the data collected by 0day. I'm afraid it'll be a bit negative because we cannot catchup with writing new test cases to take advantage of the improvements in new kernels. Here is a comparison for a set of 988 test jobs. v4.0 v4.1 ------------------------------- perf-index 100 99 (the larger, the better) power-index 100 95 latency-index 100 98 size-index 100 98 The overall regressions also indicate 0day is not mature enough to bisect all regressions in time and keep them from hitting mainline. > It ties in a little with the new interfaces applications may be able to use > (restartable sequences etc topic), and I want to ask the broad question of > "are we doing enough to prevent performance regressions". There are much to be desired in 0day POV. - timeliness The earlier regressions are caught, the better. Up to now kbuild is doing reasonably well (mostly within 1 hour), however the runtime tests -- boot, functional, performance/power/latency -- still have obvious gaps (typically days long but sometimes may go up to weeks). - coverage Kbuild has achieved near 100% coverage (700 reports per month). However runtime tests are far from enough (50 reports per month). This is the area that needs collaborations throughout the community. Developers in each subsystem -- mm, fs, network, rcu, sched, cgroup, VM, drm, media, etc. -- may have versatile ways for testing his subsystem or feature set: - run some WORKLOAD to evaluate performance/power/latency/.. - SETUP the system in different ways to run tests eg. fs params, md/dm setup, cgroup, NUMA policy, CPU affinity, .. - MONITOR various system metrics during the test run If such knowledge and scripts can be shared and accumulated it'd be valuable for other developers and testers, and will eventually help overall linux kernel health. Up to now 0day has collected a number of WORKLOAD, SETUP and MONITOR scripts. They are public available here https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/ There are much more to be desired. Contribution of new scripts will be highly appreciated. We are especially in short of SETUP scripts. Good test schemes should cover different combinations of SETUP+WORKLOAD and their parameters. There are presumably a huge number of ways one can configure his system, however most are beyond our imagination and test scope. For MONITOR/WORKLOAD scripts, we borrowed some few nice scripts from Mel's MMTests. phoronix, xfstests, autotest, kernel selftests etc. test suites are also running routinely in 0day infrastructure. So if you add new test case to one of them, there are good chances it'll be pick up by 0day. Thanks, Fengguang ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-08-03 4:59 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-07-15 15:37 [Ksummit-discuss] [TECH TOPIC] benchmarking and performance trends Chris Mason 2015-07-15 19:23 ` Kristen Accardi 2015-07-15 19:39 ` David Woodhouse 2015-07-15 19:58 ` Chris Mason 2015-07-15 20:32 ` Kristen Accardi 2015-07-17 19:38 ` Artem Bityutskiy 2015-07-16 1:35 ` Len Brown 2015-08-02 11:49 ` Fengguang Wu 2015-07-17 21:22 ` Davidlohr Bueso 2015-08-03 4:58 ` Fengguang Wu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox