Re: 2.4.8-pre1 and dbench -20% throughput

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Daniel Phillips <phillips@bonn-fries.net>
To: Marcelo Tosatti <marcelo@conectiva.com.br>
Cc: linux-mm@kvack.org, Rik van Riel <riel@conectiva.com.br>,
	Linus Torvalds <torvalds@transmeta.com>,
	Andrew Morton <akpm@zip.com.au>,
	Mike Galbraith <mikeg@wen-online.de>,
	Steven Cole <elenstev@mesatop.com>,
	Roger Larsson <roger.larsson@skelleftea.mail.telia.com>
Subject: Re: 2.4.8-pre1 and dbench -20% throughput
Date: Sat, 28 Jul 2001 22:13:13 +0200	[thread overview]
Message-ID: <01072822131300.00315@starship> (raw)
In-Reply-To: <Pine.LNX.4.21.0107281035380.5720-100000@freak.distro.conectiva>

On Saturday 28 July 2001 15:40, Marcelo Tosatti wrote:
> On Sat, 28 Jul 2001, Daniel Phillips wrote:
> > On Saturday 28 July 2001 01:43, Roger Larsson wrote:
> > > Hi again,
> > >
> > > It might be variations in dbench - but I am not sure since I run
> > > the same script each time.
> >
> > I believe I can reproduce the effect here, even with dbench 2.  So
> > the next two steps:
> >
> >   1) Get some sleep
> >   2) Find out why
>
> I would suggest getting the SAR patch to measure amount of successful
> request merges and compare that between the different kernels.
>
> It sounds like the test being done is doing a lot of contiguous IO,
> so increasing readahead also increases throughtput.

I used /proc/stat to determine whether the problem is more IO or more 
scanning.  The answer is: more IO.

Next I took a look at the dbench code to see what it's actually doing, 
including stracing it.  It does a lot of different kinds of things, 
some of them very strange.  (To see what it does, read the client.txt 
file in the dbench directory, it's more-or-less self explanatory.)

One strange thing it does is a lot of 1K, 2K or 4K sized reads at 
small-number overlapping offsets.  Weird.  Why would anybody do that?

On the theory that it's the odd-sized offsets that cause the problem I 
hacked dbench so it always reads and writes at even page offsets, see 
the patch below.  Sure enough, the use-once patch then outperformed 
drop-behind, by about 7%.

Now what's going on?  I still don't know, but I'm getting warmer.  The 
leading suspect on my list is that aging isn't really working very 
well, and this is aggravated by the fact that I haven't implemented any 
clusered-access optimization (as Rik pointed out earlier).  Treating an 
intial cluster of accesses as separate accesses, wrongly activating the 
page, really should not make more than a couple of percent difference.  
What we see is closer to 30%.  My tentative conclusion is that, once 
activated, pages are taking far longer to deactivate than they should.

Here is what I think is happening on a typical burst of small, non-page 
aligned reads:

  - Page is read the 1st time: age = 2, inactive
  - Page is read the second time: age = 5, active
  - Two more reads immediately on the same page: age = 11

Then the page isn't ever used again.  Now it has to go around the 
active ring 5 times:

  1, age = 11
  2, age = 5
  3, age = 2
  4, age = 1
  5, age = 0, deactivated

So this page that should have been discarded early is now competing 
with swap pages, buffers, and file pages that truly are used more than 
once.  And, despite the fact that we found it unreferenced four times 
in a row, that still wasn't enough to convince us that the page should 
be tested for short-term popularity, i.e., deactivated.

Implementing some sort of clustered-use detection will avoid this 
problem.  I must do this, but it will just paper over what I see as the 
bigger problem, an out-of-balance active scanning strategy.

So how come mm is in general working so well if active scanning isn't?  
I think the real work is being done by the inactive queue at this point 
(that is, without the use-once patch) and it works so well it covers up 
problems with the active scanning.  The result being that performance 
on some loads is beautiful, others suck.

Please treat all of the above as speculation at this point, this has 
not been properly confirmed by measurements.

Oh, by the way, my suspicions about the flakiness of dbench as a 
benchmark were confirmed: under X, having been running various memory 
hungry applications for a while, dbench on vanilla 2.4.7 turned in a 7%
better performance (with a distinctly different process termination 
pattern) than in text mode after a clean reboot.

Maybe somebody can explain to me why there is sometimes a long wait 
between the "+" a process prints when it exits and the "*" printed in 
the parent's loop on waitpid(0, &status, 0).  And similarly, why all 
the "*"'s are always printed together.

Patch for page-aligned IO in dbench:

--- old/fileio.c	Sat Jul 28 20:18:38 2001
+++ fileio.c	Sat Jul 28 19:43:30 2001
@@ -115,7 +115,7 @@
 #endif
 		return;
 	}
-	lseek(ftable[i].fd, offset, SEEK_SET);
+	lseek(ftable[i].fd, offset & 4095, SEEK_SET);
 	if (write(ftable[i].fd, buf, size) != size) {
 		printf("write failed on handle %d\n", handle);
 	}
@@ -132,7 +132,7 @@
 		       line_count, handle, size, offset);
 		return;
 	}
-	lseek(ftable[i].fd, offset, SEEK_SET);
+	lseek(ftable[i].fd, offset & 4095, SEEK_SET);
 	read(ftable[i].fd, buf, size);
 }

@@ -197,7 +197,7 @@
 		return;
 	}
 	if (S_ISDIR(st.st_mode)) return;
-
+	return;
 	if (st.st_size != size) {
 		printf("(%d) nb_stat: %s wrong size %d %d\n", 
 		       line_count, fname, (int)st.st_size, size);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

next prev parent reply	other threads:[~2001-07-28 20:13 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200107272112.f6RLC3d28206@maila.telia.com>
     [not found] ` <0107280034050V.00285@starship>
2001-07-27 23:43   ` Roger Larsson
2001-07-28  1:11     ` Daniel Phillips
2001-07-28  3:18     ` Daniel Phillips
2001-07-28 13:40       ` Marcelo Tosatti
2001-07-28 20:13         ` Daniel Phillips [this message]
2001-07-28 20:26           ` Linus Torvalds
2001-07-29 14:10             ` Daniel Phillips
2001-07-29 14:48               ` Rik van Riel
2001-07-29 15:34                 ` Daniel Phillips
2001-07-29 15:31               ` Mike Galbraith
2001-07-29 16:05               ` Linus Torvalds
2001-07-29 20:19                 ` Hugh Dickins
2001-07-29 20:25                   ` Rik van Riel
2001-07-29 20:44                     ` Hugh Dickins
2001-07-29 21:20                       ` Daniel Phillips
2001-07-29 21:51                         ` Hugh Dickins
2001-07-29 23:23                           ` Rik van Riel
2001-07-31  7:30                             ` Kai Henningsen
2001-07-31 14:13                               ` Daniel Phillips
2001-07-31 17:37                                 ` Jonathan Morton
2001-07-29  1:41           ` Andrew Morton
2001-07-29 14:39             ` Daniel Phillips
2001-07-30  3:19             ` Theodore Tso
2001-07-30 15:17               ` Randy.Dunlap
2001-07-30 16:41                 ` Theodore Tso
2001-07-30 17:52                 ` Mike Galbraith
2001-07-30 19:39                 ` Zlatko Calusic
2001-07-29 17:48           ` Steven Cole

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01072822131300.00315@starship \
    --to=phillips@bonn-fries.net \
    --cc=akpm@zip.com.au \
    --cc=elenstev@mesatop.com \
    --cc=linux-mm@kvack.org \
    --cc=marcelo@conectiva.com.br \
    --cc=mikeg@wen-online.de \
    --cc=riel@conectiva.com.br \
    --cc=roger.larsson@skelleftea.mail.telia.com \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox