* Poor DBT-3 pgsql 8way numbers on recent 2.6 mm kernels @ 2004-03-12 22:31 Mary Edie Meredith 2004-03-13 7:39 ` Andrew Morton 0 siblings, 1 reply; 8+ messages in thread From: Mary Edie Meredith @ 2004-03-12 22:31 UTC (permalink / raw) To: linux-mm; +Cc: Mary Edie Meredith For the last few mm kernels, I have discovered a performance problem in DBT-3 (using PostgreSQL) in the "throughput" portion of the test (when the test is running multiple processes ) on our 8-way STP systems as compared to 4-way runs and the baseline kernel results. Using the default DBT-3 options (ie using LVM, ext2, PostgreSQL version 7.4.1) on RH9 for 2.6.4-mm1 (PLM 2745) [Note that the 4-way number is _better than the 8-way number] 2.6.4-mm1 Runid..CPUs.Thruput (bigger is better) 289860 8 86.5<-----------(profiling data below) 289831 4 112.7 Compare to base: linux-2.6.4 Runid..CPUs.Thruput 289421 8 137.2<---------- 289383 4 120.73 DBT-3 is a read mostly DSS workload and the throughput phase is where we run multiple query streams (as many as we have CPUs). In this workload, the database is stored on a file system and it almost completely caches in page cache early on. So there is not a lot of physical IO in the throughput portion of the test. I also found similar 8way thruput numbers on these mm kernels: Kernel........PLM..Thruput 2.6.4rc2-mm1 2676 84.56 2.6.4rc1-mm2 2666 85.54 2.6.4rc1-mm1 2664 85.73 Before the 2.6.3-mm4 kernel, the test we are running now (with LVM and pgsql 7.4.1 was not available) so results are not availablewithout running them manually. I did run 2.6.1-mm5and it had a thruput result of 124.02 on an 8way. Still not great but definitely better than ~86.0. I just wanted to report this and I wonder if you already know why this is happening. -------------------------------------------------- Profiling data from RUNID 289860 sorted first by ticks, second by load: http://khack.osdl.org/stp/289860/profile/Framework_Close-tick.sort http://khack.osdl.org/stp/289860/profile/Framework_Close-load.sort -- Mary Edie Meredith maryedie@osdl.org 503-626-2455 x42 Open Source Development Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Poor DBT-3 pgsql 8way numbers on recent 2.6 mm kernels 2004-03-12 22:31 Poor DBT-3 pgsql 8way numbers on recent 2.6 mm kernels Mary Edie Meredith @ 2004-03-13 7:39 ` Andrew Morton [not found] ` <405379ED.A7D6B1E4@us.ibm.com> 0 siblings, 1 reply; 8+ messages in thread From: Andrew Morton @ 2004-03-13 7:39 UTC (permalink / raw) To: maryedie; +Cc: linux-mm Mary Edie Meredith <maryedie@osdl.org> wrote: > > For the last few mm kernels, I have discovered a > performance problem in DBT-3 (using PostgreSQL) > in the "throughput" portion of the test (when the > test is running multiple processes ) on our 8-way > STP systems as compared to 4-way runs and the baseline > kernel results. If I could reproduce this I could find and fix it very quickly. But when I tried to get dbt2 working it was a near-death (and unsuccessful) experience. Did it get any easier in dbt3? I wold be suspecting the darn readahead code again. That was merged into Linus's tree yesterday so perhaps you can test latest -bk? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <405379ED.A7D6B1E4@us.ibm.com>]
* Re: Poor DBT-3 pgsql 8way numbers on recent 2.6 mm kernels [not found] ` <405379ED.A7D6B1E4@us.ibm.com> @ 2004-03-13 21:48 ` Andrew Morton 2004-03-15 16:45 ` Mary Edie Meredith 0 siblings, 1 reply; 8+ messages in thread From: Andrew Morton @ 2004-03-13 21:48 UTC (permalink / raw) To: badari; +Cc: maryedie, linux-mm badari <pbadari@us.ibm.com> wrote: > > Andrew, > > We don't see any degradation with -mm trees with DSS workloads. > Meredith mentioned that the workload is "cached". Not much > IO activity. I wonder how it can be related to readahead ? Well I don't know what "cached" means really. That's a reoccurring problem with these complex performance tests which some groups are running: lack of the really detailed information which kernel developers can use, long turnaround times in gathering followup information, even slow email turnaround times. It's been a bit frustrating from that point of view. I read the dbt3-pgsql setup docs. It looks pretty formidable. For a start, it provides waaaaaaaaaay too many options. Sure, tell people how to tweak things, but provide some simple, standardised setup with works out-of-the-box. Maybe it does, I don't know. Anyway, if it means that the database is indeed in pagecache and this test is not using direct-io then presumably there's a lot of synchronous write traffic happening and not much reading? A vmstat strace would tell. And if that is indeed the case I'd be suspecting the CPU scheduler. But then, Meredith's profiles show almost completely idle CPUs. The simplest way to hunt this down is the old binary-search-through-the-patches process. But that requires some test which takes just a few minutes. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Poor DBT-3 pgsql 8way numbers on recent 2.6 mm kernels 2004-03-13 21:48 ` Andrew Morton @ 2004-03-15 16:45 ` Mary Edie Meredith 2004-03-15 17:16 ` Badari Pulavarty 2004-03-15 19:33 ` Ram Pai 0 siblings, 2 replies; 8+ messages in thread From: Mary Edie Meredith @ 2004-03-15 16:45 UTC (permalink / raw) To: Andrew Morton; +Cc: badari, linux-mm On Sat, 2004-03-13 at 13:48, Andrew Morton wrote: > badari <pbadari@us.ibm.com> wrote: > > > > Andrew, > > > > We don't see any degradation with -mm trees with DSS workloads. Is your database using direct I/O? PostgreSQL does not and that could be the difference. Also we are doing very little I/O during this part of the run--only at the beginning of the Throughput part until the database gets cached in the page cache. The database size is very small compared to most DSS workloads. > > Meredith mentioned that the workload is "cached". Not much > > IO activity. I wonder how it can be related to readahead ? If by readahead you mean file system readahead, then I do not think that would make a difference with this part of the workload, as there is not much Physical IO in the throughput part of the workload. (I am assuming that fs readahead would result in physical I/O's but I admit some degree of ignorance about file system behavior). > > Well I don't know what "cached" means really. On the 8way STP systems there is a total of 8GB of memory. The memory remaining after database structures leave enough such that most of the database will fit in page cache. Thus once it is read, any further references by the database will pull from the page cache rather than do a physical I/O. This is what I mean by "cached". > That's a reoccurring problem > with these complex performance tests which some groups are running: lack of > the really detailed information which kernel developers can use, long > turnaround times in gathering followup information, even slow email > turnaround times. It's been a bit frustrating from that point of view. Sorry, I could list why this is, but it wouldn't change the fact. I hope that I can provide some clarity. > > I read the dbt3-pgsql setup docs. It looks pretty formidable. For a > start, it provides waaaaaaaaaay too many options. Sure, tell people how to > tweak things, but provide some simple, standardised setup with works > out-of-the-box. Maybe it does, I don't know. Yes, there are many options. That's why we set it up on STP in a way that makes sense for that machine characteristic. The setting used by STP (what I called the default) is what is reasonable for that system size. > > > > Anyway, if it means that the database is indeed in pagecache and this test > is not using direct-io then presumably there's a lot of synchronous write > traffic happening and not much reading? A vmstat strace would tell. > There is little to no synch write activity. There are no database transactions after the first few minutes of the throughput phase when the updates occur. After that it is all reads, so there is no logging, which would be the cause of synch writes. vmstat info is at: http://khack.osdl.org/stp/289860/results/plot/thuput.vmstat.txt In fact at that top level URL: http://khack.osdl.org/stp/289860/ You can get more stats (sar for example). Be sure to look at things referenced as "throughput" or "thuput" as the problem is in this part of the test. The "load" and "power" portions are fine. (The power portion is the single stream part - one process running a query). > And if that is indeed the case I'd be suspecting the CPU scheduler. But > then, Meredith's profiles show almost completely idle CPUs. > > The simplest way to hunt this down is the old binary-search-through-the-patches process. But that requires some test which takes just a few minutes. If you are referring to a binary search to find when the performance changed, I can do this with STP. It may take some time, but I'm willing. I didnt want to do that if the problem was a known problem. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Poor DBT-3 pgsql 8way numbers on recent 2.6 mm kernels 2004-03-15 16:45 ` Mary Edie Meredith @ 2004-03-15 17:16 ` Badari Pulavarty 2004-03-15 19:33 ` Ram Pai 1 sibling, 0 replies; 8+ messages in thread From: Badari Pulavarty @ 2004-03-15 17:16 UTC (permalink / raw) To: maryedie, Andrew Morton; +Cc: linux-mm On Monday 15 March 2004 08:45 am, Mary Edie Meredith wrote: > On Sat, 2004-03-13 at 13:48, Andrew Morton wrote: > > badari <pbadari@us.ibm.com> wrote: > > > Andrew, > > > > > > We don't see any degradation with -mm trees with DSS workloads. > > Is your database using direct I/O? PostgreSQL does not and > that could be the difference. Also we are doing very little > I/O during this part of the run--only at the beginning of > the Throughput part until the database gets cached in the > page cache. The database size is very small compared to > most DSS workloads. We are using filesystem buffered IO for our DSS workload testing. (no direct IO). But our workload is very IO intensive. So its possible that we don't see the problem you are seeing with "cached" workload. Thanks, Badari -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Poor DBT-3 pgsql 8way numbers on recent 2.6 mm kernels 2004-03-15 16:45 ` Mary Edie Meredith 2004-03-15 17:16 ` Badari Pulavarty @ 2004-03-15 19:33 ` Ram Pai 2004-03-17 19:31 ` Mary Edie Meredith 1 sibling, 1 reply; 8+ messages in thread From: Ram Pai @ 2004-03-15 19:33 UTC (permalink / raw) To: maryedie; +Cc: Andrew Morton, Badari Pulavarty, linux-mm On Mon, 2004-03-15 at 08:45, Mary Edie Meredith wrote: > > > > And if that is indeed the case I'd be suspecting the CPU scheduler. But > > then, Meredith's profiles show almost completely idle CPUs. > > > > The simplest way to hunt this down is the old binary-search-through-the-patches process. But that requires some test which takes just a few minutes. > > If you are referring to a binary search to find when the > performance changed, I can do this with STP. It may take > some time, but I'm willing. I didnt want to do that if > the problem was a known problem. Based on your data, I dont think readahead patch is responsible. However since you are seeing this only on mm kernel there is a small needle of suspicion on the readahead patch. How about reverting only the readahaed patch in mm tree and trying it out? http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc1/2.6.3-rc1-mm1/broken-out/adaptive-lazy-readahead.patch My DSS workload benchmarks always touches the disk because I have only 4GB memory configured. I will give a try with 8GB memory and see if I see any of your behavior. (I wont be able to put all my database in memory)... RP > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Poor DBT-3 pgsql 8way numbers on recent 2.6 mm kernels 2004-03-15 19:33 ` Ram Pai @ 2004-03-17 19:31 ` Mary Edie Meredith 2004-03-17 20:33 ` Ram Pai 0 siblings, 1 reply; 8+ messages in thread From: Mary Edie Meredith @ 2004-03-17 19:31 UTC (permalink / raw) To: Ram Pai; +Cc: Andrew Morton, Badari Pulavarty, linux-mm Ram, it took a while to implement your suggestion. Your patch was again 2.6.3-rc1-mm1. Unfortunately, the series of mm kernels from 2.6.3-rc1-mm1 thru 2.6.3-mm3 failed to run on STP. Judith made a patch (PLM 2766) by reverting the patch you referenced below using 2.6.5-rc1-mm1 (PLM 2760) as the original. It compiled and ran without error. The performance did not significantly improve. So I think we can conclude that readahead is not the problem. Here is the data (bigger is better on metric): PLM..Kernel........Runid..CPUs..Thruput Metric 2760 2.6.5-rc1-mm1 290149 8 86.82 2766 2.6.5-rc1-mm1*290197 8 88.70 *(rev-readahead) 2760 2.6.5-rc1-mm1 290120 4 114.41 (worse than 8) 2757 2.6.5-rc1 290064 4 122.74 (baseline 4way) (8way run on 2.6.5-rc1 hasn't completed yet) 2679 2.6.4 base 289421 8 137.2 (baseline 8way) Meantime I attempted to do a binary search to find the point where the mm kernel performance went bad. It unfortunately appears to have occurred during the period of time that the mm kernels did not run on STP: (These are all 8-way results) PLM..Kernel........RUNID...Thruput Metric 2656 2.6.3-mm4 288850 87.82 [2.6.3-rc1-mm1 thru 2.6.3-mm3 fail to run on STP] 2603 2.6.2-mm1 290003 115.24 2582 2.6.2-rc2-mm1 290005 115.85 2564 2.6.1-mm5 289381 124.02 So there is a little hit between 2.6.1-mm5 and 2.6.2-rc2-mm1 but a very big hit between 2.6.2.mm1 and 2.6.3-mm4. Cliff is on vacation so it may take me a while to track down patches to rix 2.6.3 mm kernels. I see a patch he tried with reaim on 2.6.3-mm1 (PLM2654) so I'll give that a try. It may take a while but I'll report back. Another thing I may not have mentioned before is that we use LVM in this workload. We are also using LVM for our dbt2 (OLTP) postgreSQL workload. Markm is doing some runs to see how the latest mm kernel compares with baseline. Thanks. On Mon, 2004-03-15 at 11:33, Ram Pai wrote: > On Mon, 2004-03-15 at 08:45, Mary Edie Meredith wrote: > > > > > > > > And if that is indeed the case I'd be suspecting the CPU scheduler. But > > > then, Meredith's profiles show almost completely idle CPUs. > > > > > > The simplest way to hunt this down is the old binary-search-through-the-patches process. But that requires some test which takes just a few minutes. > > > > If you are referring to a binary search to find when the > > performance changed, I can do this with STP. It may take > > some time, but I'm willing. I didnt want to do that if > > the problem was a known problem. > > Based on your data, I dont think readahead patch is responsible. However > since you are seeing this only on mm kernel there is a small needle of > suspicion on the readahead patch. > > How about reverting only the readahaed patch in mm tree and trying it > out? > > http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc1/2.6.3-rc1-mm1/broken-out/adaptive-lazy-readahead.patch > > My DSS workload benchmarks always touches the disk because I have only > 4GB memory configured. I will give a try with 8GB memory and see if I > see any of your behavior. (I wont be able to put all my database in > memory)... > > RP > > > > -- > > To unsubscribe, send a message with 'unsubscribe linux-mm' in > > the body to majordomo@kvack.org. For more info on Linux MM, > > see: http://www.linux-mm.org/ . > > Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> > > -- Mary Edie Meredith maryedie@osdl.org 503-626-2455 x42 Open Source Development Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Poor DBT-3 pgsql 8way numbers on recent 2.6 mm kernels 2004-03-17 19:31 ` Mary Edie Meredith @ 2004-03-17 20:33 ` Ram Pai 0 siblings, 0 replies; 8+ messages in thread From: Ram Pai @ 2004-03-17 20:33 UTC (permalink / raw) To: maryedie; +Cc: Andrew Morton, Badari Pulavarty, linux-mm On Wed, 2004-03-17 at 11:31, Mary Edie Meredith wrote: > Ram, it took a while to implement your suggestion. Your > patch was again 2.6.3-rc1-mm1. Unfortunately, the series > of mm kernels from 2.6.3-rc1-mm1 thru 2.6.3-mm3 failed > to run on STP. > > Judith made a patch (PLM 2766) by reverting > the patch you referenced below using 2.6.5-rc1-mm1 > (PLM 2760) as the original. It compiled and ran without error. The > performance did not significantly improve. So I think we can conclude > that readahead is not the problem. > > Ok. that brings down my blood pressure :) Also Badari/myself ran our DSS workload on 264mm1 with 8GB physical memory(our database size is much larger, about 30GB) and found the performance was steady. Yes this is a 8-way system. Given that your database fits into memory, and your workload is readonly, I wonder if there were any changes in radix-tree code that could have regressed the performance? [About 6months back I had tried to optimize the radix tree handling, but did not see much improvement, but again that was probably because my i/o were hitting the disk most of the time.] RP > Here is the data (bigger is better on metric): > > PLM..Kernel........Runid..CPUs..Thruput Metric > 2760 2.6.5-rc1-mm1 290149 8 86.82 > 2766 2.6.5-rc1-mm1*290197 8 88.70 *(rev-readahead) > 2760 2.6.5-rc1-mm1 290120 4 114.41 (worse than 8) > 2757 2.6.5-rc1 290064 4 122.74 (baseline 4way) > (8way run on 2.6.5-rc1 hasn't completed yet) > 2679 2.6.4 base 289421 8 137.2 (baseline 8way) > > Meantime I attempted to do a binary search to > find the point where the mm kernel performance > went bad. It unfortunately appears to have > occurred during the period of time that the > mm kernels did not run on STP: > > (These are all 8-way results) > PLM..Kernel........RUNID...Thruput Metric > 2656 2.6.3-mm4 288850 87.82 > [2.6.3-rc1-mm1 thru 2.6.3-mm3 fail to run on STP] > 2603 2.6.2-mm1 290003 115.24 > 2582 2.6.2-rc2-mm1 290005 115.85 > 2564 2.6.1-mm5 289381 124.02 > > So there is a little hit between 2.6.1-mm5 and 2.6.2-rc2-mm1 > but a very big hit between 2.6.2.mm1 and 2.6.3-mm4. > > Cliff is on vacation so it may take me a while to > track down patches to rix 2.6.3 mm kernels. I see > a patch he tried with reaim on 2.6.3-mm1 (PLM2654) > so I'll give that a try. > > It may take a while but I'll report back. > > Another thing I may not have mentioned before is that > we use LVM in this workload. We are also using LVM for > our dbt2 (OLTP) postgreSQL workload. Markm is doing > some runs to see how the latest mm kernel compares > with baseline. > > Thanks. > > On Mon, 2004-03-15 at 11:33, Ram Pai wrote: > > On Mon, 2004-03-15 at 08:45, Mary Edie Meredith wrote: > > > > > > > > > > > > And if that is indeed the case I'd be suspecting the CPU scheduler. But > > > > then, Meredith's profiles show almost completely idle CPUs. > > > > > > > > The simplest way to hunt this down is the old binary-search-through-the-patches process. But that requires some test which takes just a few minutes. > > > > > > If you are referring to a binary search to find when the > > > performance changed, I can do this with STP. It may take > > > some time, but I'm willing. I didnt want to do that if > > > the problem was a known problem. > > > > Based on your data, I dont think readahead patch is responsible. However > > since you are seeing this only on mm kernel there is a small needle of > > suspicion on the readahead patch. > > > > How about reverting only the readahaed patch in mm tree and trying it > > out? > > > > http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc1/2.6.3-rc1-mm1/broken-out/adaptive-lazy-readahead.patch > > > > My DSS workload benchmarks always touches the disk because I have only > > 4GB memory configured. I will give a try with 8GB memory and see if I > > see any of your behavior. (I wont be able to put all my database in > > memory)... > > > > RP > > > > > > > -- > > > To unsubscribe, send a message with 'unsubscribe linux-mm' in > > > the body to majordomo@kvack.org. For more info on Linux MM, > > > see: http://www.linux-mm.org/ . > > > Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-03-17 20:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-12 22:31 Poor DBT-3 pgsql 8way numbers on recent 2.6 mm kernels Mary Edie Meredith
2004-03-13 7:39 ` Andrew Morton
[not found] ` <405379ED.A7D6B1E4@us.ibm.com>
2004-03-13 21:48 ` Andrew Morton
2004-03-15 16:45 ` Mary Edie Meredith
2004-03-15 17:16 ` Badari Pulavarty
2004-03-15 19:33 ` Ram Pai
2004-03-17 19:31 ` Mary Edie Meredith
2004-03-17 20:33 ` Ram Pai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox