From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset=US-ASCII From: Daniel Phillips Subject: Re: 2.4.8-pre1 and dbench -20% throughput Date: Sat, 28 Jul 2001 22:13:13 +0200 References: In-Reply-To: MIME-Version: 1.0 Message-Id: <01072822131300.00315@starship> Content-Transfer-Encoding: 7BIT Sender: owner-linux-mm@kvack.org Return-Path: To: Marcelo Tosatti Cc: linux-mm@kvack.org, Rik van Riel , Linus Torvalds , Andrew Morton , Mike Galbraith , Steven Cole , Roger Larsson List-ID: On Saturday 28 July 2001 15:40, Marcelo Tosatti wrote: > On Sat, 28 Jul 2001, Daniel Phillips wrote: > > On Saturday 28 July 2001 01:43, Roger Larsson wrote: > > > Hi again, > > > > > > It might be variations in dbench - but I am not sure since I run > > > the same script each time. > > > > I believe I can reproduce the effect here, even with dbench 2. So > > the next two steps: > > > > 1) Get some sleep > > 2) Find out why > > I would suggest getting the SAR patch to measure amount of successful > request merges and compare that between the different kernels. > > It sounds like the test being done is doing a lot of contiguous IO, > so increasing readahead also increases throughtput. I used /proc/stat to determine whether the problem is more IO or more scanning. The answer is: more IO. Next I took a look at the dbench code to see what it's actually doing, including stracing it. It does a lot of different kinds of things, some of them very strange. (To see what it does, read the client.txt file in the dbench directory, it's more-or-less self explanatory.) One strange thing it does is a lot of 1K, 2K or 4K sized reads at small-number overlapping offsets. Weird. Why would anybody do that? On the theory that it's the odd-sized offsets that cause the problem I hacked dbench so it always reads and writes at even page offsets, see the patch below. Sure enough, the use-once patch then outperformed drop-behind, by about 7%. Now what's going on? I still don't know, but I'm getting warmer. The leading suspect on my list is that aging isn't really working very well, and this is aggravated by the fact that I haven't implemented any clusered-access optimization (as Rik pointed out earlier). Treating an intial cluster of accesses as separate accesses, wrongly activating the page, really should not make more than a couple of percent difference. What we see is closer to 30%. My tentative conclusion is that, once activated, pages are taking far longer to deactivate than they should. Here is what I think is happening on a typical burst of small, non-page aligned reads: - Page is read the 1st time: age = 2, inactive - Page is read the second time: age = 5, active - Two more reads immediately on the same page: age = 11 Then the page isn't ever used again. Now it has to go around the active ring 5 times: 1, age = 11 2, age = 5 3, age = 2 4, age = 1 5, age = 0, deactivated So this page that should have been discarded early is now competing with swap pages, buffers, and file pages that truly are used more than once. And, despite the fact that we found it unreferenced four times in a row, that still wasn't enough to convince us that the page should be tested for short-term popularity, i.e., deactivated. Implementing some sort of clustered-use detection will avoid this problem. I must do this, but it will just paper over what I see as the bigger problem, an out-of-balance active scanning strategy. So how come mm is in general working so well if active scanning isn't? I think the real work is being done by the inactive queue at this point (that is, without the use-once patch) and it works so well it covers up problems with the active scanning. The result being that performance on some loads is beautiful, others suck. Please treat all of the above as speculation at this point, this has not been properly confirmed by measurements. Oh, by the way, my suspicions about the flakiness of dbench as a benchmark were confirmed: under X, having been running various memory hungry applications for a while, dbench on vanilla 2.4.7 turned in a 7% better performance (with a distinctly different process termination pattern) than in text mode after a clean reboot. Maybe somebody can explain to me why there is sometimes a long wait between the "+" a process prints when it exits and the "*" printed in the parent's loop on waitpid(0, &status, 0). And similarly, why all the "*"'s are always printed together. Patch for page-aligned IO in dbench: --- old/fileio.c Sat Jul 28 20:18:38 2001 +++ fileio.c Sat Jul 28 19:43:30 2001 @@ -115,7 +115,7 @@ #endif return; } - lseek(ftable[i].fd, offset, SEEK_SET); + lseek(ftable[i].fd, offset & 4095, SEEK_SET); if (write(ftable[i].fd, buf, size) != size) { printf("write failed on handle %d\n", handle); } @@ -132,7 +132,7 @@ line_count, handle, size, offset); return; } - lseek(ftable[i].fd, offset, SEEK_SET); + lseek(ftable[i].fd, offset & 4095, SEEK_SET); read(ftable[i].fd, buf, size); } @@ -197,7 +197,7 @@ return; } if (S_ISDIR(st.st_mode)) return; - + return; if (st.st_size != size) { printf("(%d) nb_stat: %s wrong size %d %d\n", line_count, fname, (int)st.st_size, size); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/