On Thursday 17 January 2008 06:58, Dave Kleikamp wrote: > We weren't able to get in any runs before the holidays, but we finally > have some good news from our performance team: > > "To test the effects of the patch, an OLTP workload was run on an IBM > x3850 M2 server with 2 processors (quad-core Intel Xeon processors at > 2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel. Comparing > runs with and without the patch resulted in an overall performance > benefit of ~9.8%. Correspondingly, oprofiles showed that samples from > __up_read and __down_read routines that is seen during thread contention > for system resources was reduced from 2.8% down to .05%. Monitoring > the /proc/vmstat output from the patched run showed that the counter for > fast_gup contained a very high number while the fast_gup_slow value was > zero." Just for reference, I've attached a more complete patch for x86, which has to be applied on top of the pte_special patch posted in another thread. No need to test anything at this point... the generated code for this version is actually slightly better than the last one despite the extra condition being tested for. With a few tweak I was actually able to reduce the number of tests in the inner loop, and adding noinline to the leaf functions helps keep them in registers. I'm currently having a look at an initial powerpc 64 patch, hopefully we'll see similar improvements there. Will post that when I get further along with it. Thanks, Nick