linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Page cache write performance issue
@ 2004-10-13  5:44 Nathan Scott
  2004-10-13  6:19 ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Scott @ 2004-10-13  5:44 UTC (permalink / raw)
  To: Andrew Morton, Nick Piggin; +Cc: linux-kernel, linux-mm, linux-xfs

Hi guys,

I've noticed the following performance regression from
between 2.6.8.1 and 2.6.9-rc.  It seems to have a very
pronounced affect on both ext2 and xfs.

- single thread, writing (what should be) straight into
the page cache, file size 1/2 of memory size (500MB vs
1GB), writes are in 1K chunks, most of memory is free,
machine was just booted;

- on 2.6.8 (and earlier 2.6 releases) I can typically
get ~50MB/sec on this machine doing this;  (or better
with larger I/O sizes, but thats not the point here)

- on 2.4.28-pre (and all 2.4 releases) I can typically
get ~70MB/sec, presumably writeback kicks in earlier on
2.6; OK, I guess we can live with that... probably some
tradeoff is being made there in the VM;

- on 2.6.9-rc I can only get _4_MB/sec (ext2 or xfs);
writeback commences very quickly, CPU utilisation drops
way down (from 100% to <10%)... looks like we go slower
cos we're initiating I/O almost from the start.

Now if I bump up /proc/sys/vm/dirty_background_ratio and
/proc/sys/vm/dirty_ratio from 40 to 80, I see the expected
performance again (actually, I see the 2.4 performance,
so the poorer early-2.6 numbers were probably due to I/O
commencing at the tail end of all the writes, due to 50%
being more than 40% :).  But 2.6.8 had the same default
dirty writeout ratios (40) as 2.6.9-rc does, didn't it?

So, any ideas what happened to 2.6.9?  Whats the rationale
for commencing writeout earlier in 2.6 (even when there's
so much free memory available)?  Any chance we can get the
defaults set to something much larger in the wake of the
other 2.6.9 VM changes, so we don't regress here?

thanks!

-- 
Nathan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Page cache write performance issue
  2004-10-13  5:44 Page cache write performance issue Nathan Scott
@ 2004-10-13  6:19 ` Andrew Morton
  2004-10-13  6:39   ` Nathan Scott
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2004-10-13  6:19 UTC (permalink / raw)
  To: Nathan Scott; +Cc: piggin, linux-kernel, linux-mm, linux-xfs

Nathan Scott <nathans@sgi.com> wrote:
>
>  So, any ideas what happened to 2.6.9?

Does reverting the below fix it up?

>   Whats the rationale for commencing writeout earlier in 2.6
> (even when there's
>  so much free memory available)?

There wasn't much rationale behind that patch - that's why I dropped it the
first three times ;)  I have no problem with making it four times.

It could be that small values of unmapped_ratio are making background_ratio
too small.


--- a/mm/page-writeback.c	10 Aug 2004 04:16:17 -0000	1.43
+++ a/mm/page-writeback.c	13 Oct 2004 06:12:03 -0000
@@ -153,9 +153,11 @@
 	if (dirty_ratio < 5)
 		dirty_ratio = 5;
 
-	background_ratio = dirty_background_ratio;
-	if (background_ratio >= dirty_ratio)
-		background_ratio = dirty_ratio / 2;
+	/*
+	 * Keep the ratio between dirty_ratio and background_ratio roughly
+	 * what the sysctls are after dirty_ratio has been scaled (above).
+	 */
+	background_ratio = dirty_background_ratio * dirty_ratio/vm_dirty_ratio;
 
 	background = (background_ratio * total_pages) / 100;
 	dirty = (dirty_ratio * total_pages) / 100;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Page cache write performance issue
  2004-10-13  6:19 ` Andrew Morton
@ 2004-10-13  6:39   ` Nathan Scott
  2004-10-13  7:02     ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Scott @ 2004-10-13  6:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: piggin, linux-kernel, linux-mm, linux-xfs

Hi Andrew,

On Tue, Oct 12, 2004 at 11:19:45PM -0700, Andrew Morton wrote:
> Nathan Scott <nathans@sgi.com> wrote:
> >
> >  So, any ideas what happened to 2.6.9?
> 
> Does reverting the below fix it up?

Reverting that one improves things slightly - I move up from
~4MB/sec to ~17MB/sec; thats just under a third of the 2.6.8
numbers I was seeing though, unfortunately.

cheers.

-- 
Nathan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Page cache write performance issue
  2004-10-13  6:39   ` Nathan Scott
@ 2004-10-13  7:02     ` Andrew Morton
  2004-10-13  7:23       ` Nathan Scott
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2004-10-13  7:02 UTC (permalink / raw)
  To: Nathan Scott; +Cc: piggin, linux-kernel, linux-mm, linux-xfs

Nathan Scott <nathans@sgi.com> wrote:
>
> Hi Andrew,
> 
> On Tue, Oct 12, 2004 at 11:19:45PM -0700, Andrew Morton wrote:
> > Nathan Scott <nathans@sgi.com> wrote:
> > >
> > >  So, any ideas what happened to 2.6.9?
> > 
> > Does reverting the below fix it up?
> 
> Reverting that one improves things slightly - I move up from
> ~4MB/sec to ~17MB/sec; thats just under a third of the 2.6.8
> numbers I was seeing though, unfortunately.
> 

Well something else if fishy: how can you possibly achieve only 4MB/sec? 
Using floppy disks or something?

Does the same happen on ext2?

It's exactly a 500MB write on a 1000MB machine, yes?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Page cache write performance issue
  2004-10-13  7:02     ` Andrew Morton
@ 2004-10-13  7:23       ` Nathan Scott
  2004-10-13  8:15         ` Nick Piggin
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Scott @ 2004-10-13  7:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: piggin, linux-kernel, linux-mm, linux-xfs

On Wed, Oct 13, 2004 at 12:02:06AM -0700, Andrew Morton wrote:
> 
> Well something else if fishy: how can you possibly achieve only 4MB/sec? 

These are 1K writes too remember, so it feels a bit like we
write 'em out one at a time, sync (though no O_SYNC, or fsync,
or such involved here).  This is on an i686, so 4K pages, and
using 4K filesystem blocksizes (both xfs and ext2).

And now that you mention, yes, this is multiple times below
the direct IO numbers too (which on this box are ~30MB/sec
for direct blkdev writes, IIRC, & XFS has similar numbers).

> Using floppy disks or something?

Heh, uh, no.  (and no, not "pencils" either ;)

> Does the same happen on ext2?

Yes.

> It's exactly a 500MB write on a 1000MB machine, yes?

Thats correct.

No slab/page/.. debug options enabled either - its the same
.config that was performing ~10x better on 2.6.8.  I also
verified that it wasn't any of the XFS changes either (they
wouldn't have affected ext2 anyway, of course) - the same
XFS code backported to 2.6.8 performs fine also.

cheers.

-- 
Nathan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Page cache write performance issue
  2004-10-13  7:23       ` Nathan Scott
@ 2004-10-13  8:15         ` Nick Piggin
  2004-10-13  8:39           ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Nick Piggin @ 2004-10-13  8:15 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Andrew Morton, linux-kernel, linux-mm, linux-xfs


Nathan Scott wrote:

>On Wed, Oct 13, 2004 at 12:02:06AM -0700, Andrew Morton wrote:
>
>>Well something else if fishy: how can you possibly achieve only 4MB/sec? 
>>
>
>These are 1K writes too remember, so it feels a bit like we
>write 'em out one at a time, sync (though no O_SYNC, or fsync,
>or such involved here).  This is on an i686, so 4K pages, and
>using 4K filesystem blocksizes (both xfs and ext2).
>
>

Still shouldn't cause such a big slowdown. Seems like they
might be getting written off the end of the page reclaim
LRU (although in that case it is a bit odd that increasing
the dirty thresholds are improving performance).

I don't think we have any vmscan metrics for this... kswapd
definitely has become more active in 2.6.9-rc. If you're stuck
for ideas, try editing mm/vmscan.c:may_write_to_queue - comment
out the if(current_is_kswapd()) check.

It is a long shot though. Andrew probably has better ideas.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Page cache write performance issue
  2004-10-13  8:15         ` Nick Piggin
@ 2004-10-13  8:39           ` Andrew Morton
  2004-10-14  0:53             ` Nathan Scott
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2004-10-13  8:39 UTC (permalink / raw)
  To: Nick Piggin; +Cc: nathans, linux-kernel, linux-mm, linux-xfs

Nick Piggin <piggin@cyberone.com.au> wrote:
>
>  Andrew probably has better ideas.

uh, is this an ia32 highmem box?

If so, you've hit the VM sour spot.  That 128M highmem zone gets 100%
filled with dirty pages and we end up doing a ton of writeout off the page
LRU.  And we do that while `dd' is cheerfully writing to a totally
different part of the disk via balance_dirty_pages().  Seekstorm ensues. 
Although last time I looked (a long time ago) the slowdown was only 2:1 -
perhaps your disk is in writethrough mode??

Basically, *any* other config is fine.  896MB and below, 1.5GB and above.

I could well understand that a minor kswapd tweak would make this bad
situation worse.  Making the dirty ratios really small (dirty_ratio less
than the 128MB) should make it go away.

If it's not ia32 then dunno.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Page cache write performance issue
  2004-10-13  8:39           ` Andrew Morton
@ 2004-10-14  0:53             ` Nathan Scott
  2004-10-14  3:20               ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Scott @ 2004-10-14  0:53 UTC (permalink / raw)
  To: Nick Piggin, Andrew Morton; +Cc: linux-kernel, linux-mm, linux-xfs

On Wed, Oct 13, 2004 at 01:39:41AM -0700, Andrew Morton wrote:
> Nick Piggin <piggin@cyberone.com.au> wrote:
> >
> >  Andrew probably has better ideas.
> 
> uh, is this an ia32 highmem box?

Yep, it is.

> If so, you've hit the VM sour spot.
> ...
> Basically, *any* other config is fine.  896MB and below, 1.5GB and above.

I just tried switching CONFIG_HIGHMEM off, and so running the
machine with 512MB; then adjusted the test to write 256M into
the page cache, again in 1K sequential chunks.  A similar mis-
behaviour happens, though the numbers are slightly better (up
from ~4 to ~6.5MB/sec).  Both ext2 and xfs see this.  When I
drop the file size down to 128M with this kernel, I see good
results again (as we'd expect).

I'm being pulled onto other issues atm, but in the background
I could try reverting specific changesets if you guys can
suggest anything in particular that might be triggering this?

thanks!

-- 
Nathan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Page cache write performance issue
  2004-10-14  0:53             ` Nathan Scott
@ 2004-10-14  3:20               ` Andrew Morton
  2004-10-14  7:16                 ` Nathan Scott
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2004-10-14  3:20 UTC (permalink / raw)
  To: Nathan Scott; +Cc: piggin, linux-kernel, linux-mm, linux-xfs

Nathan Scott <nathans@sgi.com> wrote:
>
> On Wed, Oct 13, 2004 at 01:39:41AM -0700, Andrew Morton wrote:
>  > Nick Piggin <piggin@cyberone.com.au> wrote:
>  > >
>  > >  Andrew probably has better ideas.
>  > 
>  > uh, is this an ia32 highmem box?
> 
>  Yep, it is.
> 
>  > If so, you've hit the VM sour spot.
>  > ...
>  > Basically, *any* other config is fine.  896MB and below, 1.5GB and above.
> 
>  I just tried switching CONFIG_HIGHMEM off, and so running the
>  machine with 512MB; then adjusted the test to write 256M into
>  the page cache, again in 1K sequential chunks.  A similar mis-
>  behaviour happens, though the numbers are slightly better (up
>  from ~4 to ~6.5MB/sec).  Both ext2 and xfs see this.  When I
>  drop the file size down to 128M with this kernel, I see good
>  results again (as we'd expect).

No such problem here, with

	dd if=/dev/zero of=x bs=1k count=128k

on a 256MB machine.  xfs and ext2.

Can you exhibit this one more than one machine?

Silly question: what does `grep sync' /etc/fstab say over there? ;)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Page cache write performance issue
  2004-10-14  3:20               ` Andrew Morton
@ 2004-10-14  7:16                 ` Nathan Scott
  2004-10-14  7:31                   ` Nick Piggin
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Scott @ 2004-10-14  7:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: piggin, linux-kernel, linux-mm, linux-xfs

On Wed, Oct 13, 2004 at 08:20:41PM -0700, Andrew Morton wrote:
> Nathan Scott <nathans@sgi.com> wrote:
> >  I just tried switching CONFIG_HIGHMEM off, and so running the
> >  machine with 512MB; then adjusted the test to write 256M into
> >  the page cache, again in 1K sequential chunks.  A similar mis-
> >  behaviour happens, though the numbers are slightly better (up
> >  from ~4 to ~6.5MB/sec).  Both ext2 and xfs see this.  When I
> >  drop the file size down to 128M with this kernel, I see good
> >  results again (as we'd expect).
> 
> No such problem here, with
> 
> 	dd if=/dev/zero of=x bs=1k count=128k
> 
> on a 256MB machine.  xfs and ext2.

Yup, rebooted with mem=128M and on my box, & that crawls.
Maybe its just this old hunk 'o junk, I suppose; odd that
2.6.8 was OK with this though.

> Can you exhibit this one more than one machine?

I haven't got a second ia32 box atm - setting one up soon,
will let you know how it goes.

> Silly question: what does `grep sync' /etc/fstab say over there? ;)

Same thing it said on 2.6.8. :)  Nada.

cheers.

-- 
Nathan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Page cache write performance issue
  2004-10-14  7:16                 ` Nathan Scott
@ 2004-10-14  7:31                   ` Nick Piggin
  0 siblings, 0 replies; 11+ messages in thread
From: Nick Piggin @ 2004-10-14  7:31 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Andrew Morton, piggin, linux-kernel, linux-mm, linux-xfs

Nathan Scott wrote:
> On Wed, Oct 13, 2004 at 08:20:41PM -0700, Andrew Morton wrote:
> 
>>Nathan Scott <nathans@sgi.com> wrote:
>>
>>> I just tried switching CONFIG_HIGHMEM off, and so running the
>>> machine with 512MB; then adjusted the test to write 256M into
>>> the page cache, again in 1K sequential chunks.  A similar mis-
>>> behaviour happens, though the numbers are slightly better (up
>>> from ~4 to ~6.5MB/sec).  Both ext2 and xfs see this.  When I
>>> drop the file size down to 128M with this kernel, I see good
>>> results again (as we'd expect).
>>
>>No such problem here, with
>>
>>	dd if=/dev/zero of=x bs=1k count=128k
>>
>>on a 256MB machine.  xfs and ext2.
> 
> 
> Yup, rebooted with mem=128M and on my box, & that crawls.
> Maybe its just this old hunk 'o junk, I suppose; odd that
> 2.6.8 was OK with this though.
> 

Just out of interest, can you get profiles and a few lines
of vmstat 1 from 2.6.8 and 2.6.9-rc, please?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-10-14  7:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-13  5:44 Page cache write performance issue Nathan Scott
2004-10-13  6:19 ` Andrew Morton
2004-10-13  6:39   ` Nathan Scott
2004-10-13  7:02     ` Andrew Morton
2004-10-13  7:23       ` Nathan Scott
2004-10-13  8:15         ` Nick Piggin
2004-10-13  8:39           ` Andrew Morton
2004-10-14  0:53             ` Nathan Scott
2004-10-14  3:20               ` Andrew Morton
2004-10-14  7:16                 ` Nathan Scott
2004-10-14  7:31                   ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox