Re: 2.4.8-pre1 and dbench -20% throughput

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.4.8-pre1 and dbench -20% throughput
       [not found] ` <0107280034050V.00285@starship>
@ 2001-07-27 23:43   ` Roger Larsson
  2001-07-28  1:11     ` Daniel Phillips
  2001-07-28  3:18     ` Daniel Phillips
  0 siblings, 2 replies; 28+ messages in thread
From: Roger Larsson @ 2001-07-27 23:43 UTC (permalink / raw)
  To: Daniel Phillips, linux-kernel; +Cc: linux-mm

Hi again,

It might be variations in dbench - but I am not sure since I run
the same script each time.

(When I made a testrun in a terminal window - with X running, but not doing 
anything activly, I got
[some '.' deleted] 
.............++++++++++++++++++++++++++++++++********************************
Throughput 15.8859 MB/sec (NB=19.8573 MB/sec  158.859 MBit/sec)
14.74user 22.92system 4:26.91elapsed 14%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (912major+1430minor)pagefaults 0swaps

I have never seen anyting like this - all '+' together! 

I logged off and tried again - got more normal values 32 MB/s
and '+' were spread out.

More testing needed...

/RogerL

On Saturdayen den 28 July 2001 00:34, Daniel Phillips wrote:
> On Friday 27 July 2001 23:08, Roger Larsson wrote:
> > Hi all,
> >
> > I have done some throughput testing again.
> > Streaming write, copy, read, diff are almost identical to earlier 2.4
> > kernels. (Note: 2.4.0 was clearly better when reading from two files
> > - i.e. diff - 15.4 MB/s v. around 11 MB/s with later kenels - can be
> > a result of disk layout too...)
> >
> > But "dbench 32" (on my 256 MB box) results has are the most
> > interesting:
> >
> > 2.4.0 gave 33 MB/s
> > 2.4.8-pre1 gives 26.1 MB/s (-21%)
> >
> > Do we now throw away pages that would be reused?
> >
> > [I have also verified that mmap002 still works as expected]
>
> Could you run that test again with /usr/bin/time (the GNU time
> function) so we can see what kind of swapping it's doing?
>
> The use-once approach depends on having a fairly stable inactive_dirty
> + inactive_clean queue size, to give use-often pages a fair chance to
> be rescued.  To see how the sizes of the queues are changing, use
> Shift-ScrollLock on your text console.
>
> To tell the truth, I don't have a deep understanding of how dbench
> works.  I should read the code now and see if I can learn more about it
>
> :-/  I have noticed that it tends to be highly variable in performance,
>
> sometimes showing variation of a few 10's of percents from run to run.
> This variation seems to depend a lot on scheduling.  Do you see "*"'s
> evenly spaced throughout the tracing output, or do you see most of them
> bunched up near the end?
>
> --
> Daniel

-- 
Roger Larsson
Skelleftea
Sweden
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-27 23:43   ` 2.4.8-pre1 and dbench -20% throughput Roger Larsson
@ 2001-07-28  1:11     ` Daniel Phillips
  2001-07-28  3:18     ` Daniel Phillips
  1 sibling, 0 replies; 28+ messages in thread
From: Daniel Phillips @ 2001-07-28  1:11 UTC (permalink / raw)
  To: Roger Larsson, linux-kernel; +Cc: linux-mm

On Saturday 28 July 2001 01:43, Roger Larsson wrote:
> Hi again,
>
> It might be variations in dbench - but I am not sure since I run
> the same script each time.
>
> (When I made a testrun in a terminal window - with X running, but not
> doing anything activly, I got
> [some '.' deleted]
> .............++++++++++++++++++++++++++++++++************************
>******** Throughput 15.8859 MB/sec (NB=19.8573 MB/sec  158.859
> MBit/sec) 14.74user 22.92system 4:26.91elapsed 14%CPU
> (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs
> (912major+1430minor)pagefaults 0swaps
>
> I have never seen anyting like this - all '+' together!
>
> I logged off and tried again - got more normal values 32 MB/s
> and '+' were spread out.
>
> More testing needed...

Truly wild, truly crazy.  OK, this is getting interesting.  I'll go 
read the dbench source now, I really want to understand how the IO and 
thread sheduling are interrelated.  I'm not even going to try to 
advance a theory just yet ;-)

I'd mentioned that dbench seems to run fastest when threads run and 
complete all at different times instead of all together.  It's easy to 
see why this might be so: if the sum of all working sets is bigger than 
memory then the system will thrash and do its work much more slowly.  
If the threads *can* all run independently (which I think is true of 
dbench because it simulates SMB accesses from a number of unrelated 
sources) then the optimal strategy is to suspend enough processes so 
that all the working sets do fit in memory.  Linux has no mechanism for 
detecting or responding to such situations (whereas FreeBSD - our 
arch-rival in the mm sweepstakes - does) so we sometimes see what are 
essentially random variations in scheduling causing very measurable 
differences in throughput.  (The "butterfly effect" where the beating 
wings of a butterfly in Alberta set in motion a chain of events that 
culminates with a hurricane in Florida.)

I am not saying this is the effect we're seeing here (the working set 
effect, not the butterfly:-) but it is something to keep in mind when 
investigating this.  There is such a thing as being too fair, and maybe 
that's what we're running into here.

--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-27 23:43   ` 2.4.8-pre1 and dbench -20% throughput Roger Larsson
  2001-07-28  1:11     ` Daniel Phillips
@ 2001-07-28  3:18     ` Daniel Phillips
  2001-07-28 13:40       ` Marcelo Tosatti
  1 sibling, 1 reply; 28+ messages in thread
From: Daniel Phillips @ 2001-07-28  3:18 UTC (permalink / raw)
  To: linux-mm; +Cc: Steven Cole, Roger Larsson

On Saturday 28 July 2001 01:43, Roger Larsson wrote:
> Hi again,
>
> It might be variations in dbench - but I am not sure since I run
> the same script each time.

I believe I can reproduce the effect here, even with dbench 2.  So the 
next two steps:

  1) Get some sleep
  2) Find out why

Bis Morgen

--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-28  3:18     ` Daniel Phillips
@ 2001-07-28 13:40       ` Marcelo Tosatti
  2001-07-28 20:13         ` Daniel Phillips
  0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2001-07-28 13:40 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-mm, Steven Cole, Roger Larsson


On Sat, 28 Jul 2001, Daniel Phillips wrote:

> On Saturday 28 July 2001 01:43, Roger Larsson wrote:
> > Hi again,
> >
> > It might be variations in dbench - but I am not sure since I run
> > the same script each time.
> 
> I believe I can reproduce the effect here, even with dbench 2.  So the 
> next two steps:
> 
>   1) Get some sleep
>   2) Find out why

I would suggest getting the SAR patch to measure amount of successful
request merges and compare that between the different kernels.

It sounds like the test being done is doing a lot of contiguous IO, so
increasing readahead also increases throughtput.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-28 13:40       ` Marcelo Tosatti
@ 2001-07-28 20:13         ` Daniel Phillips
  2001-07-28 20:26           ` Linus Torvalds
                             ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Daniel Phillips @ 2001-07-28 20:13 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-mm, Rik van Riel, Linus Torvalds, Andrew Morton,
	Mike Galbraith, Steven Cole, Roger Larsson

On Saturday 28 July 2001 15:40, Marcelo Tosatti wrote:
> On Sat, 28 Jul 2001, Daniel Phillips wrote:
> > On Saturday 28 July 2001 01:43, Roger Larsson wrote:
> > > Hi again,
> > >
> > > It might be variations in dbench - but I am not sure since I run
> > > the same script each time.
> >
> > I believe I can reproduce the effect here, even with dbench 2.  So
> > the next two steps:
> >
> >   1) Get some sleep
> >   2) Find out why
>
> I would suggest getting the SAR patch to measure amount of successful
> request merges and compare that between the different kernels.
>
> It sounds like the test being done is doing a lot of contiguous IO,
> so increasing readahead also increases throughtput.

I used /proc/stat to determine whether the problem is more IO or more 
scanning.  The answer is: more IO.

Next I took a look at the dbench code to see what it's actually doing, 
including stracing it.  It does a lot of different kinds of things, 
some of them very strange.  (To see what it does, read the client.txt 
file in the dbench directory, it's more-or-less self explanatory.)

One strange thing it does is a lot of 1K, 2K or 4K sized reads at 
small-number overlapping offsets.  Weird.  Why would anybody do that?

On the theory that it's the odd-sized offsets that cause the problem I 
hacked dbench so it always reads and writes at even page offsets, see 
the patch below.  Sure enough, the use-once patch then outperformed 
drop-behind, by about 7%.

Now what's going on?  I still don't know, but I'm getting warmer.  The 
leading suspect on my list is that aging isn't really working very 
well, and this is aggravated by the fact that I haven't implemented any 
clusered-access optimization (as Rik pointed out earlier).  Treating an 
intial cluster of accesses as separate accesses, wrongly activating the 
page, really should not make more than a couple of percent difference.  
What we see is closer to 30%.  My tentative conclusion is that, once 
activated, pages are taking far longer to deactivate than they should.

Here is what I think is happening on a typical burst of small, non-page 
aligned reads:

  - Page is read the 1st time: age = 2, inactive
  - Page is read the second time: age = 5, active
  - Two more reads immediately on the same page: age = 11

Then the page isn't ever used again.  Now it has to go around the 
active ring 5 times:

  1, age = 11
  2, age = 5
  3, age = 2
  4, age = 1
  5, age = 0, deactivated

So this page that should have been discarded early is now competing 
with swap pages, buffers, and file pages that truly are used more than 
once.  And, despite the fact that we found it unreferenced four times 
in a row, that still wasn't enough to convince us that the page should 
be tested for short-term popularity, i.e., deactivated.

Implementing some sort of clustered-use detection will avoid this 
problem.  I must do this, but it will just paper over what I see as the 
bigger problem, an out-of-balance active scanning strategy.

So how come mm is in general working so well if active scanning isn't?  
I think the real work is being done by the inactive queue at this point 
(that is, without the use-once patch) and it works so well it covers up 
problems with the active scanning.  The result being that performance 
on some loads is beautiful, others suck.

Please treat all of the above as speculation at this point, this has 
not been properly confirmed by measurements.

Oh, by the way, my suspicions about the flakiness of dbench as a 
benchmark were confirmed: under X, having been running various memory 
hungry applications for a while, dbench on vanilla 2.4.7 turned in a 7%
better performance (with a distinctly different process termination 
pattern) than in text mode after a clean reboot.

Maybe somebody can explain to me why there is sometimes a long wait 
between the "+" a process prints when it exits and the "*" printed in 
the parent's loop on waitpid(0, &status, 0).  And similarly, why all 
the "*"'s are always printed together.

Patch for page-aligned IO in dbench:

--- old/fileio.c	Sat Jul 28 20:18:38 2001
+++ fileio.c	Sat Jul 28 19:43:30 2001
@@ -115,7 +115,7 @@
 #endif
 		return;
 	}
-	lseek(ftable[i].fd, offset, SEEK_SET);
+	lseek(ftable[i].fd, offset & 4095, SEEK_SET);
 	if (write(ftable[i].fd, buf, size) != size) {
 		printf("write failed on handle %d\n", handle);
 	}
@@ -132,7 +132,7 @@
 		       line_count, handle, size, offset);
 		return;
 	}
-	lseek(ftable[i].fd, offset, SEEK_SET);
+	lseek(ftable[i].fd, offset & 4095, SEEK_SET);
 	read(ftable[i].fd, buf, size);
 }

@@ -197,7 +197,7 @@
 		return;
 	}
 	if (S_ISDIR(st.st_mode)) return;
-
+	return;
 	if (st.st_size != size) {
 		printf("(%d) nb_stat: %s wrong size %d %d\n", 
 		       line_count, fname, (int)st.st_size, size);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-28 20:13         ` Daniel Phillips
@ 2001-07-28 20:26           ` Linus Torvalds
  2001-07-29 14:10             ` Daniel Phillips
  2001-07-29  1:41           ` Andrew Morton
  2001-07-29 17:48           ` Steven Cole
  2 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2001-07-28 20:26 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Marcelo Tosatti, linux-mm, Rik van Riel, Andrew Morton,
	Mike Galbraith, Steven Cole, Roger Larsson

On Sat, 28 Jul 2001, Daniel Phillips wrote:
>
> Here is what I think is happening on a typical burst of small, non-page
> aligned reads:
>
>   - Page is read the 1st time: age = 2, inactive
>   - Page is read the second time: age = 5, active
>   - Two more reads immediately on the same page: age = 11

No.

We only mark the page referenced when we read it, we don't actually
increment the age.

The _aging_ is only done by the actual scanning routines.

At least that's how it should work. A quick grep for who does
"age_page_up()" shows that it is only done by refill_inactive_scan().
(page_launder() doesn't need to do it, because it already knows the age is
zero on the inactive list, so it just sets the age).

Maybe the problem is that use-once works on accesses, not on ages?

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-28 20:26           ` Linus Torvalds
@ 2001-07-29 14:10             ` Daniel Phillips
  2001-07-29 14:48               ` Rik van Riel
                                 ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Daniel Phillips @ 2001-07-29 14:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Marcelo Tosatti, linux-mm, Rik van Riel, Andrew Morton,
	Mike Galbraith, Steven Cole, Roger Larsson

On Saturday 28 July 2001 22:26, Linus Torvalds wrote:
> On Sat, 28 Jul 2001, Daniel Phillips wrote:
> > Here is what I think is happening on a typical burst of small,
> > non-page aligned reads:
> >
> >   - Page is read the 1st time: age = 2, inactive
> >   - Page is read the second time: age = 5, active
> >   - Two more reads immediately on the same page: age = 11
>
> No.
>
> We only mark the page referenced when we read it, we don't actually
> increment the age.

For already-cached pages we have:

   do_generic_file_read->__find_page_nolock->age_page_up

I haven't checked a running kernel yet to see whether pages really do 
get aged the way I described, but I'll do it soon.  My plan is to 'tag' 
selected pages and trace them through the system to see what actually 
happens to them.

When I looked at age_page_up, I saw an anomaly:

	void age_page_up_nolock(struct page * page)
	{
	 	if (!page->age)   /* wrong */
			activate_page_nolock(page);

		page->age += PAGE_AGE_ADV;
		if (page->age > PAGE_AGE_MAX)
			page->age = PAGE_AGE_MAX;
	}

The !page->age test was fine when all the ages on the inactive list 
were zero, it's not fine with the use-once patch.  When inactive, the 
sense of !page->age is now "on trial", whether the page got that way by 
being accessed the first time or aged all the way to zero.  The state 
change from !page->age to page->age == START_AGE allows used-often 
pages to be detected, again, while on the inactive list.  Yes, I could 
have used a real state flag for this, or a separate queue, but that 
would have been a more invasive change.

First I tried this:

- 	if (!page->age)
+ 	if (!PageActive(page))

Performance on dbench went way down.  So I did the obvious thing:

-	if (!page->age)
-		activate_page_nolock(page);

This produced a distinct improvement, bringing 2.4.7 use-once 
performance on dbench up to nearly as good as drop-behind.  Better, the 
performance on my make/grep load for 2.4.7+use.once was also improved, 
coming very close to what I saw on 2.4.5+use.once.

So this is promising.  What it does is bring age_page_up more in line 
with my theoretical model, that is, leaving each page to run its entire 
course on the inactive queue and relying on the Referenced bit to tell 
inactive_scan whether to rescue or continue the eviction process.

> Maybe the problem is that use-once works on accesses, not on ages?

I'm convinced that, for the inactive queue, relying on accesses is 
right.  My theory is that aging has the sole function of determining 
which pages should be tested for inactivity, and I suspect the current 
formula for aging doesn't do that optimally.  As soon as the more 
obvious problems settle down I'd like to take a look at the aging 
calculations and active scanning policy.

--
Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29 14:10             ` Daniel Phillips
@ 2001-07-29 14:48               ` Rik van Riel
  2001-07-29 15:34                 ` Daniel Phillips
  2001-07-29 15:31               ` Mike Galbraith
  2001-07-29 16:05               ` Linus Torvalds
  2 siblings, 1 reply; 28+ messages in thread
From: Rik van Riel @ 2001-07-29 14:48 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Linus Torvalds, Marcelo Tosatti, linux-mm, Andrew Morton,
	Mike Galbraith, Steven Cole, Roger Larsson

On Sun, 29 Jul 2001, Daniel Phillips wrote:
> On Saturday 28 July 2001 22:26, Linus Torvalds wrote:

> > We only mark the page referenced when we read it, we don't actually
> > increment the age.
>
> For already-cached pages we have:
>
>    do_generic_file_read->__find_page_nolock->age_page_up

s/have/had/

This was changed quite a while ago.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29 14:48               ` Rik van Riel
@ 2001-07-29 15:34                 ` Daniel Phillips
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Phillips @ 2001-07-29 15:34 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, Marcelo Tosatti, linux-mm, Andrew Morton,
	Mike Galbraith, Steven Cole, Roger Larsson

On Sunday 29 July 2001 16:48, Rik van Riel wrote:
> On Sun, 29 Jul 2001, Daniel Phillips wrote:
> > On Saturday 28 July 2001 22:26, Linus Torvalds wrote:
> > > We only mark the page referenced when we read it, we don't
> > > actually increment the age.
> >
> > For already-cached pages we have:
> >
> >    do_generic_file_read->__find_page_nolock->age_page_up
>
> s/have/had/
>
> This was changed quite a while ago.

Yes, correct.  (Should teach me not to rely on a 2.4.2 tree for my 
cross-reference.)  Hmm, so now age_page_up is unused and 
age_page_up_nolock is called from just one place, refill_inactive_scan. 
The !age test still doesn't make sense.

--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29 14:10             ` Daniel Phillips
  2001-07-29 14:48               ` Rik van Riel
@ 2001-07-29 15:31               ` Mike Galbraith
  2001-07-29 16:05               ` Linus Torvalds
  2 siblings, 0 replies; 28+ messages in thread
From: Mike Galbraith @ 2001-07-29 15:31 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Linus Torvalds, Marcelo Tosatti, linux-mm, Rik van Riel,
	Andrew Morton, Steven Cole, Roger Larsson

On Sun, 29 Jul 2001, Daniel Phillips wrote:

> On Saturday 28 July 2001 22:26, Linus Torvalds wrote:
> > On Sat, 28 Jul 2001, Daniel Phillips wrote:
> > > Here is what I think is happening on a typical burst of small,
> > > non-page aligned reads:
> > >
> > >   - Page is read the 1st time: age = 2, inactive
> > >   - Page is read the second time: age = 5, active
> > >   - Two more reads immediately on the same page: age = 11
> >
> > No.
> >
> > We only mark the page referenced when we read it, we don't actually
> > increment the age.
>
> For already-cached pages we have:
>
>    do_generic_file_read->__find_page_nolock->age_page_up
>
> I haven't checked a running kernel yet to see whether pages really do
> get aged the way I described, but I'll do it soon.  My plan is to 'tag'
> selected pages and trace them through the system to see what actually
> happens to them.
>
> When I looked at age_page_up, I saw an anomaly:
>
> 	void age_page_up_nolock(struct page * page)
> 	{
> 	 	if (!page->age)   /* wrong */
> 			activate_page_nolock(page);
>
> 		page->age += PAGE_AGE_ADV;
> 		if (page->age > PAGE_AGE_MAX)
> 			page->age = PAGE_AGE_MAX;
> 	}
>
> The !page->age test was fine when all the ages on the inactive list
> were zero, it's not fine with the use-once patch.  When inactive, the
> sense of !page->age is now "on trial", whether the page got that way by
> being accessed the first time or aged all the way to zero.  The state
> change from !page->age to page->age == START_AGE allows used-often
> pages to be detected, again, while on the inactive list.  Yes, I could
> have used a real state flag for this, or a separate queue, but that
> would have been a more invasive change.
>
> First I tried this:
>
> - 	if (!page->age)
> + 	if (!PageActive(page))
>
> Performance on dbench went way down.  So I did the obvious thing:
>
> -	if (!page->age)
> -		activate_page_nolock(page);
>
> This produced a distinct improvement, bringing 2.4.7 use-once
> performance on dbench up to nearly as good as drop-behind.  Better, the
> performance on my make/grep load for 2.4.7+use.once was also improved,
> coming very close to what I saw on 2.4.5+use.once.
>
> So this is promising.  What it does is bring age_page_up more in line
> with my theoretical model, that is, leaving each page to run its entire
> course on the inactive queue and relying on the Referenced bit to tell
> inactive_scan whether to rescue or continue the eviction process.
>
> > Maybe the problem is that use-once works on accesses, not on ages?
>
> I'm convinced that, for the inactive queue, relying on accesses is
> right.  My theory is that aging has the sole function of determining
> which pages should be tested for inactivity, and I suspect the current
> formula for aging doesn't do that optimally.  As soon as the more
> obvious problems settle down I'd like to take a look at the aging
> calculations and active scanning policy.

FWIW, I'm seeing that referenced is causing problems.  Running plain
jane Bonnie (the old one) I see (huge gobs of) pages being activated
when they are not in pre-use_once kernels.  Also FWIW, if I use my own
changes to buffer.c (which make write throttling work a bit too well;)
I see the problem _exaggerated_ (ie it's timing related, especially when
Bonnie is in the write intelligently phase).  I've eliminated most of
the problem here by means too ugly to be of any interest ;-)

If you grab a copy of xmm (Zlatko's utility for graphical display of
the mm lists), you'll see the problem instantly.  You'll see it with
virgin source as soon as the rewrite test starts.  With virgin source,
when Bonnie starts doing 'writing intellgently', you'll see the problem
BIGTIME.

There is also a problem with page_launder() in that the only thing
which stops it is hitting a page with buffers.  This is only a big
problem only on machines with tons of ram though.

	-Mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29 14:10             ` Daniel Phillips
  2001-07-29 14:48               ` Rik van Riel
  2001-07-29 15:31               ` Mike Galbraith
@ 2001-07-29 16:05               ` Linus Torvalds
  2001-07-29 20:19                 ` Hugh Dickins
  2 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2001-07-29 16:05 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Marcelo Tosatti, linux-mm, Rik van Riel, Andrew Morton,
	Mike Galbraith, Steven Cole, Roger Larsson

On Sun, 29 Jul 2001, Daniel Phillips wrote:
>
> When I looked at age_page_up, I saw an anomaly:
>
> 	void age_page_up_nolock(struct page * page)
> 	{
> 	 	if (!page->age)   /* wrong */
> 			activate_page_nolock(page);

I agree that it is wrong, but it's really strange that it should make a
difference.

The only user of age_page_up_nolock() is refill_inactive_scan(), which
already scans the active list, so the main reason the above is wrong is
that it makes no sense any more (the page is already on the active list,
and it should be a no-op).

It does set the page->age to a minimum of PAGE_AGE_START, which is
probably the _real_ bug. That's definitely wrong. Especially as we're just
about to bump it anyway.

Removed. Which makes all the "age_page_up*()" functions go away entirely.
They were mostly gone already.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29 16:05               ` Linus Torvalds
@ 2001-07-29 20:19                 ` Hugh Dickins
  2001-07-29 20:25                   ` Rik van Riel
  0 siblings, 1 reply; 28+ messages in thread
From: Hugh Dickins @ 2001-07-29 20:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Phillips, Marcelo Tosatti, linux-mm, Rik van Riel,
	Andrew Morton, Mike Galbraith, Steven Cole, Roger Larsson

On Sun, 29 Jul 2001, Linus Torvalds wrote:
> 
> Removed. Which makes all the "age_page_up*()" functions go away entirely.
> They were mostly gone already.

Applause!  And for your encore... see how many age_page_down*()s
there are (3), and how many uses (1).  Same fate, please!

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29 20:19                 ` Hugh Dickins
@ 2001-07-29 20:25                   ` Rik van Riel
  2001-07-29 20:44                     ` Hugh Dickins
  0 siblings, 1 reply; 28+ messages in thread
From: Rik van Riel @ 2001-07-29 20:25 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Daniel Phillips, Marcelo Tosatti, linux-mm,
	Andrew Morton, Mike Galbraith, Steven Cole, Roger Larsson

On Sun, 29 Jul 2001, Hugh Dickins wrote:
> On Sun, 29 Jul 2001, Linus Torvalds wrote:
> >
> > Removed. Which makes all the "age_page_up*()" functions go away entirely.
> > They were mostly gone already.
>
> Applause!  And for your encore... see how many age_page_down*()s
> there are (3), and how many uses (1).  Same fate, please!

Actually, I liked the fact that we could change the policy
of up and down aging of pages in one place instead of having
to edit the source in multiple places...

But yes, for this macros would be better than functions ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29 20:25                   ` Rik van Riel
@ 2001-07-29 20:44                     ` Hugh Dickins
  2001-07-29 21:20                       ` Daniel Phillips
  0 siblings, 1 reply; 28+ messages in thread
From: Hugh Dickins @ 2001-07-29 20:44 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, Daniel Phillips, Marcelo Tosatti, linux-mm,
	Andrew Morton, Mike Galbraith, Steven Cole, Roger Larsson

On Sun, 29 Jul 2001, Rik van Riel wrote:
> 
> Actually, I liked the fact that we could change the policy
> of up and down aging of pages in one place instead of having
> to edit the source in multiple places...

No question, that was a good principle; but in practice there were or
are very few places where they were used, yet far too many variants
provided, some with awkward side-effects on the lists.

I've no objection to one age_page_up() and one age_page_down()
(though I do find the term "age" unhelpful here), inline or macro,
but even so a lot seems to depend on where and when we initialize it.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29 20:44                     ` Hugh Dickins
@ 2001-07-29 21:20                       ` Daniel Phillips
  2001-07-29 21:51                         ` Hugh Dickins
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Phillips @ 2001-07-29 21:20 UTC (permalink / raw)
  To: Hugh Dickins, Rik van Riel
  Cc: Linus Torvalds, Marcelo Tosatti, linux-mm, Andrew Morton,
	Mike Galbraith, Steven Cole, Roger Larsson

On Sunday 29 July 2001 22:44, Hugh Dickins wrote:
> On Sun, 29 Jul 2001, Rik van Riel wrote:
> > Actually, I liked the fact that we could change the policy
> > of up and down aging of pages in one place instead of having
> > to edit the source in multiple places...
>
> No question, that was a good principle; but in practice there were or
> are very few places where they were used, yet far too many variants
> provided, some with awkward side-effects on the lists.
>
> I've no objection to one age_page_up() and one age_page_down()
> (though I do find the term "age" unhelpful here), inline or macro,
> but even so a lot seems to depend on where and when we initialize it.

"Age" is hugely misleading, I think everybody agrees, but we are still 
in a stable series, and a global name change would just make it harder 
to apply patches.

That said, I think BSD uses "weight".  It's not a lot better, but at 
least you know that the more heaviliy weighted page is one with the 
higher weight value, whereas we have "age up" meaning "make younger" :-/

And how can age go up and down anyway?  I'd prefer to talk about 
->temperature, more in line with what we see in the literature.

But then, it's so easy to talk about "aging", what would it be with 
->temperature:  Heating?  Cooling?  Stirring?  ;-)

--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29 21:20                       ` Daniel Phillips
@ 2001-07-29 21:51                         ` Hugh Dickins
  2001-07-29 23:23                           ` Rik van Riel
  0 siblings, 1 reply; 28+ messages in thread
From: Hugh Dickins @ 2001-07-29 21:51 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Rik van Riel,
	Linus Torvalds <torvalds@transmeta.com> Marcelo Tosatti,
	linux-mm, Andrew Morton, Mike Galbraith, Steven Cole,
	Roger Larsson

On Sun, 29 Jul 2001, Daniel Phillips wrote:
> 
> "Age" is hugely misleading, I think everybody agrees, but we are still 
> in a stable series, and a global name change would just make it harder 
> to apply patches.

There are very few places where "age" comes in.  Not my call,
but I doubt we're so frozen as to have to stick with that name.

> That said, I think BSD uses "weight".  It's not a lot better, but at 
> least you know that the more heaviliy weighted page is one with the 
> higher weight value, whereas we have "age up" meaning "make younger" :-/
> 
> And how can age go up and down anyway?  I'd prefer to talk about 
> ->temperature, more in line with what we see in the literature.
> 
> But then, it's so easy to talk about "aging", what would it be with 
> ->temperature:  Heating?  Cooling?  Stirring?  ;-)

That's much _much_ better: I'd go for "warmth" myself, warm_page_up()
and cool_page_down().  I particularly like the ambiguity, that a warmer
page may be a more recently used page or a more frequently used page.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29 21:51                         ` Hugh Dickins
@ 2001-07-29 23:23                           ` Rik van Riel
  2001-07-31  7:30                             ` Kai Henningsen
  0 siblings, 1 reply; 28+ messages in thread
From: Rik van Riel @ 2001-07-29 23:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Daniel Phillips,
	Linus Torvalds <torvalds@transmeta.com> Marcelo Tosatti,
	linux-mm, Andrew Morton, Mike Galbraith, Steven Cole,
	Roger Larsson

On Sun, 29 Jul 2001, Hugh Dickins wrote:
> On Sun, 29 Jul 2001, Daniel Phillips wrote:
> >
> > "Age" is hugely misleading, I think everybody agrees,

Yup. I mainly kept it because we called things this way
in the 1.2, 1.3, 2.0 and 2.1 kernels.

> > That said, I think BSD uses "weight".

> That's much _much_ better: I'd go for "warmth" myself,

FreeBSD uses act_count, short for activation count.

Showing how active a page is is probably a better analogy
than the temperature one ... but that's just IMHO ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29 23:23                           ` Rik van Riel
@ 2001-07-31  7:30                             ` Kai Henningsen
  2001-07-31 14:13                               ` Daniel Phillips
  0 siblings, 1 reply; 28+ messages in thread
From: Kai Henningsen @ 2001-07-31  7:30 UTC (permalink / raw)
  To: linux-mm

riel@conectiva.com.br (Rik van Riel)  wrote on 29.07.01 in <Pine.LNX.4.33L.0107292021480.11893-100000@imladris.rielhome.conectiva>:

> On Sun, 29 Jul 2001, Hugh Dickins wrote:
> > On Sun, 29 Jul 2001, Daniel Phillips wrote:
> > >
> > > "Age" is hugely misleading, I think everybody agrees,
>
> Yup. I mainly kept it because we called things this way
> in the 1.2, 1.3, 2.0 and 2.1 kernels.
>
> > > That said, I think BSD uses "weight".
>
> > That's much _much_ better: I'd go for "warmth" myself,
>
> FreeBSD uses act_count, short for activation count.
>
> Showing how active a page is is probably a better analogy
> than the temperature one ... but that's just IMHO ;)

Well, people do sometimes speak of "hot" pages (or spots) ... and there  
are no good verbs associated with "activation count". Oh, and you might  
say "the situation heats up" in case of increasing memory pressure.

And remember that in physics, temperature (at least in the cases where  
it's used by non-physicists) does measure something approximately like  
average particle velocity, which some (non-physicist) people might well  
call "activity".

MfG Kai
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-31  7:30                             ` Kai Henningsen
@ 2001-07-31 14:13                               ` Daniel Phillips
  2001-07-31 17:37                                 ` Jonathan Morton
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Phillips @ 2001-07-31 14:13 UTC (permalink / raw)
  To: Kai Henningsen, linux-mm

On Tuesday 31 July 2001 09:30, Kai Henningsen wrote:
> riel@conectiva.com.br (Rik van Riel)  wrote on 29.07.01 in 
<Pine.LNX.4.33L.0107292021480.11893-100000@imladris.rielhome.conectiva>:
> > On Sun, 29 Jul 2001, Hugh Dickins wrote:
> > > On Sun, 29 Jul 2001, Daniel Phillips wrote:
> > > > "Age" is hugely misleading, I think everybody agrees,
> >
> > Yup. I mainly kept it because we called things this way
> > in the 1.2, 1.3, 2.0 and 2.1 kernels.
> >
> > > > That said, I think BSD uses "weight".
> > >
> > > That's much _much_ better: I'd go for "warmth" myself,
> >
> > FreeBSD uses act_count, short for activation count.
> >
> > Showing how active a page is is probably a better analogy
> > than the temperature one ... but that's just IMHO ;)
>
> Well, people do sometimes speak of "hot" pages (or spots) ... and
> there are no good verbs associated with "activation count". Oh, and
> you might say "the situation heats up" in case of increasing memory
> pressure.
>
> And remember that in physics, temperature (at least in the cases
> where it's used by non-physicists) does measure something
> approximately like average particle velocity, which some
> (non-physicist) people might well call "activity".

Temperature also captures the idea of gradual decay.  Activity sounds 
fine too, both are way better than age.

--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-31 14:13                               ` Daniel Phillips
@ 2001-07-31 17:37                                 ` Jonathan Morton
  0 siblings, 0 replies; 28+ messages in thread
From: Jonathan Morton @ 2001-07-31 17:37 UTC (permalink / raw)
  To: Daniel Phillips, Kai Henningsen, linux-mm

>  > And remember that in physics, temperature (at least in the cases
>>  where it's used by non-physicists) does measure something
>>  approximately like average particle velocity, which some
>>  (non-physicist) people might well call "activity".
>
>Temperature also captures the idea of gradual decay.  Activity sounds
>fine too, both are way better than age.

Temperature works for me.  Activity would also work, but it's not as 
strong on verb-support for me either - you can say "tired", 
"more/less activity", "asleep" but that's about it.  With temperature 
you can say "hot", "cold", "warmer", "cooler" and all sorts of things 
based around that - it even fits with the theorem that CMOS-based 
circuitry tend to warm up when actively in use.  I like it...
-- 
--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
website:  http://www.chromatix.uklinux.net/vnc/
geekcode: GCS$/E dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$
           V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
tagline:  The key to knowledge is not to rely on people to teach you it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-28 20:13         ` Daniel Phillips
  2001-07-28 20:26           ` Linus Torvalds
@ 2001-07-29  1:41           ` Andrew Morton
  2001-07-29 14:39             ` Daniel Phillips
  2001-07-30  3:19             ` Theodore Tso
  2001-07-29 17:48           ` Steven Cole
  2 siblings, 2 replies; 28+ messages in thread
From: Andrew Morton @ 2001-07-29  1:41 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Marcelo Tosatti, linux-mm, Rik van Riel, Linus Torvalds,
	Mike Galbraith, Steven Cole, Roger Larsson

Daniel Phillips wrote:
> 
> Oh, by the way, my suspicions about the flakiness of dbench as a
> benchmark were confirmed: under X, having been running various memory
> hungry applications for a while, dbench on vanilla 2.4.7 turned in a 7%
> better performance (with a distinctly different process termination
> pattern) than in text mode after a clean reboot.

Be very wary of optimising for dbench.

It's a good stress tester, but I don't think it's a good indicator of how
well an fs or the VM is performing.  It does much more writing than a
normal workload mix.  It generates oceans of metadata.

It would be very useful to have a standardised and very carefully
chosen set of tests which we could use for evaluating fs and kernel
performance.  I'm not aware of anything suitable, really.  It would
have to be a whole bunch of datapoints sprinkled throughout a
multidimesional space.  That's what we do at present, but it's ad-hoc.

> Maybe somebody can explain to me why there is sometimes a long wait
> between the "+" a process prints when it exits and the "*" printed in
> the parent's loop on waitpid(0, &status, 0).  And similarly, why all
> the "*"'s are always printed together.

Heaven knows.  Seems that sometimes one client makes much more
progress than others.  When that happens, other clients coast
along on its coattails and they start exitting waaay earlier
than they normally do.  The overall runtime can vary by a factor
of two between identical invokations.

The fact that a kernel change causes a decrease in dbench throughput
is by no means a reliable indication that is was a bad change.  More
information needed.

-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29  1:41           ` Andrew Morton
@ 2001-07-29 14:39             ` Daniel Phillips
  2001-07-30  3:19             ` Theodore Tso
  1 sibling, 0 replies; 28+ messages in thread
From: Daniel Phillips @ 2001-07-29 14:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Marcelo Tosatti, linux-mm, Rik van Riel, Linus Torvalds,
	Mike Galbraith, Steven Cole, Roger Larsson

On Sunday 29 July 2001 03:41, Andrew Morton wrote:
> Daniel Phillips wrote:
> > Oh, by the way, my suspicions about the flakiness of dbench as a
> > benchmark were confirmed: under X, having been running various
> > memory hungry applications for a while, dbench on vanilla 2.4.7
> > turned in a 7% better performance (with a distinctly different
> > process termination pattern) than in text mode after a clean
> > reboot.
>
> Be very wary of optimising for dbench.

Agreed, but I still prefer to try to find that "never worse, usually 
better" performance sweet spot.

> It's a good stress tester, but I don't think it's a good indicator of
> how well an fs or the VM is performing.  It does much more writing
> than a normal workload mix.  It generates oceans of metadata.

I read the code and straced it.  I now understand *partially* what it 
does.  I'll take another look with your metadata comment in mind.  I 
have specifically not done anything about balancing with buffer pages 
yet, hoping that the current behaviour would work well for now.

One thing I noticed about dbench: it actually consists of a number of 
different loads, which you will see immediately if you do "tree" on its 
working directory or read the client.txt file.  One of those loads most 
probably is the worst for use-once, so what I should do is select loads 
one by one until I find the worst one.  Then it should be a short step 
to knowing why.

I did find and fix one genuine oversight, improving things
considerably, see my previous post.

> It would be very useful to have a standardised and very carefully
> chosen set of tests which we could use for evaluating fs and kernel
> performance.  I'm not aware of anything suitable, really.  It would
> have to be a whole bunch of datapoints sprinkled throughout a
> multidimesional space.  That's what we do at present, but it's
> ad-hoc.

Yes, now who will be the hero to come up with such a suite?

> > Maybe somebody can explain to me why there is sometimes a long wait
> > between the "+" a process prints when it exits and the "*" printed
> > in the parent's loop on waitpid(0, &status, 0).  And similarly, why
> > all the "*"'s are always printed together.
>
> Heaven knows.  Seems that sometimes one client makes much more
> progress than others.  When that happens, other clients coast
> along on its coattails and they start exitting waaay earlier
> than they normally do.  The overall runtime can vary by a factor
> of two between identical invokations.

That's what I've seen, though I haven't seen 2x variations since the 
days of 2.4.0-test.  I'd like to have a deeper understanding of this 
behaviour.  My guess is that somebody (Tridge) carefully tuned the 
dbench mix until its behaviour became "interesting".  Thanks ;-)

The butterfly effect here seems to be caused by the scheduler more than 
anything.  It seems that higher performance in dbench is nearly always 
associated with "bursty" scheduling.  Why the bursty scheduling 
sometimes happens and sometimes doesn't is not clear at all.  If we 
could understand and control the effect maybe we'd be able to eke out a 
system-wide performance boost under load.

I *think* it has to do with working sets, i.e., staying within 
non-thrashing limits by preferentially continuing to schedule a subset 
of processes that are currently active.  But I have not noticed yet 
exactly which resource might be being thrashed.

> The fact that a kernel change causes a decrease in dbench throughput
> is by no means a reliable indication that is was a bad change.  More
> information needed.

Yep, still digging, and making progress I think.

--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-29  1:41           ` Andrew Morton
  2001-07-29 14:39             ` Daniel Phillips
@ 2001-07-30  3:19             ` Theodore Tso
  2001-07-30 15:17               ` Randy.Dunlap
  1 sibling, 1 reply; 28+ messages in thread
From: Theodore Tso @ 2001-07-30  3:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Phillips, Marcelo Tosatti, linux-mm, Rik van Riel,
	Linus Torvalds, Mike Galbraith, Steven Cole, Roger Larsson

On Sun, Jul 29, 2001 at 11:41:50AM +1000, Andrew Morton wrote:
> 
> Be very wary of optimising for dbench.
> 
> It's a good stress tester, but I don't think it's a good indicator of how
> well an fs or the VM is performing.  It does much more writing than a
> normal workload mix.  It generates oceans of metadata.

People should keep in mind what dbench was originally written to do
--- to be a easy-to-run proxy for the netbench benchmark, so that
developers could have a relatively easy way to determine how
well/poorly their systems would run on netbench run without having to
set up an expensive and hard-to-maintain cluster of Windows clients in
order to do a full-blown netbench benchmark.

Most people agree that netbench is a horrible benchmark, but the
reality is that it's what a lot of the world (including folks like
Mindcraft) use it for benchmarking SMB/CIFS servers.  So while we
shouldn't optimize dbench/netbench numbers at the expense of
real-world performance, we can be sure that Microsoft will be doing so
(and will no doubt call in Mindcraft or some other "independent
benchmarking/testing company" to be their shill once they've finished
with their benchmark hacking. :-)

> It would be very useful to have a standardised and very carefully
> chosen set of tests which we could use for evaluating fs and kernel
> performance.  I'm not aware of anything suitable, really.  It would
> have to be a whole bunch of datapoints sprinkled throughout a
> multidimesional space.  That's what we do at present, but it's ad-hoc.

All the gripes about dbench/netbench aside, one good thing about them
is that they hit the filesystem with a large number of operations in
parallel, which is what a fileserver under heavy load will see.
Benchmarks like Andrew and Bonnie tend to have a much more serialized
pattern of filesystem access.

						- Ted
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-30  3:19             ` Theodore Tso
@ 2001-07-30 15:17               ` Randy.Dunlap
  2001-07-30 16:41                 ` Theodore Tso
                                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Randy.Dunlap @ 2001-07-30 15:17 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Andrew Morton, Daniel Phillips, Marcelo Tosatti, linux-mm,
	Rik van Riel, Mike Galbraith, Steven Cole, Roger Larsson

Theodore Tso wrote:
> 
> On Sun, Jul 29, 2001 at 11:41:50AM +1000, Andrew Morton wrote:
> 
> > It would be very useful to have a standardised and very carefully
> > chosen set of tests which we could use for evaluating fs and kernel
> > performance.  I'm not aware of anything suitable, really.  It would
> > have to be a whole bunch of datapoints sprinkled throughout a
> > multidimesional space.  That's what we do at present, but it's ad-hoc.
> 
> All the gripes about dbench/netbench aside, one good thing about them
> is that they hit the filesystem with a large number of operations in
> parallel, which is what a fileserver under heavy load will see.
> Benchmarks like Andrew and Bonnie tend to have a much more serialized
> pattern of filesystem access.

Is iozone (using threads) any better at this?
We are currently using iozone.

And where can I find Zlatko's xmm program that Mike mentioned?

Thanks,
-- 
~Randy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-30 15:17               ` Randy.Dunlap
@ 2001-07-30 16:41                 ` Theodore Tso
  2001-07-30 17:52                 ` Mike Galbraith
  2001-07-30 19:39                 ` Zlatko Calusic
  2 siblings, 0 replies; 28+ messages in thread
From: Theodore Tso @ 2001-07-30 16:41 UTC (permalink / raw)
  To: Randy.Dunlap
  Cc: Theodore Tso, Andrew Morton, Daniel Phillips, Marcelo Tosatti,
	linux-mm, Rik van Riel, Mike Galbraith, Steven Cole,
	Roger Larsson

On Mon, Jul 30, 2001 at 08:17:02AM -0700, Randy.Dunlap wrote:
> Is iozone (using threads) any better at this?

Yes, iozone in its throughput testing looks promising.  I haven't
played with iozone much myself; I'll have to give it a try.

						- Ted
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-30 15:17               ` Randy.Dunlap
  2001-07-30 16:41                 ` Theodore Tso
@ 2001-07-30 17:52                 ` Mike Galbraith
  2001-07-30 19:39                 ` Zlatko Calusic
  2 siblings, 0 replies; 28+ messages in thread
From: Mike Galbraith @ 2001-07-30 17:52 UTC (permalink / raw)
  To: Randy.Dunlap
  Cc: Theodore Tso, Andrew Morton, Daniel Phillips, Marcelo Tosatti,
	linux-mm, Rik van Riel, Steven Cole, Roger Larsson

On Mon, 30 Jul 2001, Randy.Dunlap wrote:

> Theodore Tso wrote:
> >
> > On Sun, Jul 29, 2001 at 11:41:50AM +1000, Andrew Morton wrote:
> >
> > > It would be very useful to have a standardised and very carefully
> > > chosen set of tests which we could use for evaluating fs and kernel
> > > performance.  I'm not aware of anything suitable, really.  It would
> > > have to be a whole bunch of datapoints sprinkled throughout a
> > > multidimesional space.  That's what we do at present, but it's ad-hoc.
> >
> > All the gripes about dbench/netbench aside, one good thing about them
> > is that they hit the filesystem with a large number of operations in
> > parallel, which is what a fileserver under heavy load will see.
> > Benchmarks like Andrew and Bonnie tend to have a much more serialized
> > pattern of filesystem access.
>
> Is iozone (using threads) any better at this?
> We are currently using iozone.
>
> And where can I find Zlatko's xmm program that Mike mentioned?

I lost the original URL, but have the source if you want it.  It's
a simple histogram, with zero stats.  You can't do detailed analysis,
but if you only need to see the big picture, it's useful.

If you search the archives, you'll find the URL.  (or ask Zlatko)

	-Mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-30 15:17               ` Randy.Dunlap
  2001-07-30 16:41                 ` Theodore Tso
  2001-07-30 17:52                 ` Mike Galbraith
@ 2001-07-30 19:39                 ` Zlatko Calusic
  2 siblings, 0 replies; 28+ messages in thread
From: Zlatko Calusic @ 2001-07-30 19:39 UTC (permalink / raw)
  To: Randy.Dunlap
  Cc: Theodore Tso, Andrew Morton, Daniel Phillips, Marcelo Tosatti,
	linux-mm, Rik van Riel, Mike Galbraith, Steven Cole,
	Roger Larsson

"Randy.Dunlap" <rddunlap@osdlab.org> writes:

> Theodore Tso wrote:
> > 
> > On Sun, Jul 29, 2001 at 11:41:50AM +1000, Andrew Morton wrote:
> > 
> > > It would be very useful to have a standardised and very carefully
> > > chosen set of tests which we could use for evaluating fs and kernel
> > > performance.  I'm not aware of anything suitable, really.  It would
> > > have to be a whole bunch of datapoints sprinkled throughout a
> > > multidimesional space.  That's what we do at present, but it's ad-hoc.
> > 
> > All the gripes about dbench/netbench aside, one good thing about them
> > is that they hit the filesystem with a large number of operations in
> > parallel, which is what a fileserver under heavy load will see.
> > Benchmarks like Andrew and Bonnie tend to have a much more serialized
> > pattern of filesystem access.
> 
> Is iozone (using threads) any better at this?
> We are currently using iozone.
> 
> And where can I find Zlatko's xmm program that Mike mentioned?
> 

http://linux.inet.hr/
-- 
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.4.8-pre1 and dbench -20% throughput
  2001-07-28 20:13         ` Daniel Phillips
  2001-07-28 20:26           ` Linus Torvalds
  2001-07-29  1:41           ` Andrew Morton
@ 2001-07-29 17:48           ` Steven Cole
  2 siblings, 0 replies; 28+ messages in thread
From: Steven Cole @ 2001-07-29 17:48 UTC (permalink / raw)
  To: Daniel Phillips, Marcelo Tosatti
  Cc: linux-mm, Rik van Riel, Linus Torvalds, Andrew Morton,
	Mike Galbraith <mikeg@wen-online.de> Roger Larsson

On Saturday 28 July 2001 14:13, Daniel Phillips wrote:
[snippage]
> Oh, by the way, my suspicions about the flakiness of dbench as a
> benchmark were confirmed: under X, having been running various memory
> hungry applications for a while, dbench on vanilla 2.4.7 turned in a 7%
> better performance (with a distinctly different process termination
> pattern) than in text mode after a clean reboot.
>

>From the FWIW department, apologies in advance if this is all moot.

Here are the results of nine runs of dbench 32.  I ran vmstat before and after
each instance of running time  ./dbench 32.  These verbose results are provided
after the following summary.  The test machine is a PIII 450, 384MB, ReiserFS on
all partitions, disks IDE. Tests were all run from an xterm and KDE2.

Steven


2.4.8-pre2 	After running 8 hours
Run #1	Throughput 5.77702 MB/sec
Run #2	Throughput 5.8781 MB/sec
Run #3	Throughput 6.08052 MB/sec

2.4.8-pre2 	After fresh reboot
Run #4	Throughput 7.18107 MB/sec
Run #5	Throughput 7.0096 MB/sec
Run #6	Throughput 7.1165 MB/sec

2.4.7      	After fresh reboot
Run #7	Throughput 8.96163 MB/sec
Run #8	Throughput 9.20907 MB/sec
Run #9	Throughput 9.88017 MB/sec

-------------------------------------------------------------------------------
2.4.8-pre2 	After running 8 hours
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0  26272  31248  12100  63688   2   1    49   185  580   158   2   7  91

[....snipped]
Throughput 5.77702 MB/sec (NB=7.22127 MB/sec  57.7702 MBit/sec)
34.70user 426.52system 12:11.28elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0  26120 130808   5656  11236   2   1    53   249  716   156   2   9  89

[....snipped]
Throughput 5.8781 MB/sec (NB=7.34763 MB/sec  58.781 MBit/sec)
34.55user 439.76system 11:59.61elapsed 65%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0  26120 130200   5152  11884   2   1    56   310  844   154   2  11  87

[....snipped]
Throughput 6.08052 MB/sec (NB=7.60065 MB/sec  60.8052 MBit/sec)
34.36user 409.73system 11:35.69elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0  26120 144324   4888  12024   2   1    59   366  962   153   2  13  86


-------------------------------------------------------------------------------
2.4.8-pre2 	After fresh reboot
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0 278468   9152  56208   0   0   296   118  950   248  13  15  73

[....snipped]
Throughput 7.18107 MB/sec (NB=8.97633 MB/sec  71.8107 MBit/sec)
34.35user 348.11system 9:48.24elapsed 65%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0 287192  13064  34696   0   0   167  2090 4619   125   8  69  23

[....snipped]
Throughput 7.0096 MB/sec (NB=8.762 MB/sec  70.096 MBit/sec)
33.05user 348.67system 10:03.62elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0 285804  14440  34700   0   0   145  2368 5128   107   7  76  17

[....snipped]
Throughput 7.1165 MB/sec (NB=8.89563 MB/sec  71.165 MBit/sec)
34.67user 352.81system 9:54.57elapsed 65%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0 285192  15152  34700   0   0   136  2475 5324   101   7  79  14


-------------------------------------------------------------------------------
2.4.7      	After fresh reboot
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0      0 278356   9288  56176   0   0   293   117  941   238  13  15  73

[....snipped]
Throughput 8.96163 MB/sec (NB=11.202 MB/sec  89.6163 MBit/sec)
33.91user 244.57system 7:52.40elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0 309540   5808  22792   0   0   193  1761 4013   133   9  64  27

[....snipped]
Throughput 9.20907 MB/sec (NB=11.5113 MB/sec  92.0907 MBit/sec)
34.43user 255.59system 7:39.69elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0 310780   5796  21920   0   0   175  2028 4511   113   9  72  19

[....snipped]
Throughput 9.88017 MB/sec (NB=12.3502 MB/sec  98.8017 MBit/sec)
33.30user 248.82system 7:08.54elapsed 65%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1008major+1402minor)pagefaults 0swaps
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0 311180   5356  22024   0   0   172  2124 4694   107   8  76  16
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2001-07-31 17:37 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200107272112.f6RLC3d28206@maila.telia.com>
     [not found] ` <0107280034050V.00285@starship>
2001-07-27 23:43   ` 2.4.8-pre1 and dbench -20% throughput Roger Larsson
2001-07-28  1:11     ` Daniel Phillips
2001-07-28  3:18     ` Daniel Phillips
2001-07-28 13:40       ` Marcelo Tosatti
2001-07-28 20:13         ` Daniel Phillips
2001-07-28 20:26           ` Linus Torvalds
2001-07-29 14:10             ` Daniel Phillips
2001-07-29 14:48               ` Rik van Riel
2001-07-29 15:34                 ` Daniel Phillips
2001-07-29 15:31               ` Mike Galbraith
2001-07-29 16:05               ` Linus Torvalds
2001-07-29 20:19                 ` Hugh Dickins
2001-07-29 20:25                   ` Rik van Riel
2001-07-29 20:44                     ` Hugh Dickins
2001-07-29 21:20                       ` Daniel Phillips
2001-07-29 21:51                         ` Hugh Dickins
2001-07-29 23:23                           ` Rik van Riel
2001-07-31  7:30                             ` Kai Henningsen
2001-07-31 14:13                               ` Daniel Phillips
2001-07-31 17:37                                 ` Jonathan Morton
2001-07-29  1:41           ` Andrew Morton
2001-07-29 14:39             ` Daniel Phillips
2001-07-30  3:19             ` Theodore Tso
2001-07-30 15:17               ` Randy.Dunlap
2001-07-30 16:41                 ` Theodore Tso
2001-07-30 17:52                 ` Mike Galbraith
2001-07-30 19:39                 ` Zlatko Calusic
2001-07-29 17:48           ` Steven Cole

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox