2.2.20 suspends everything then recovers during heavy I/O

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* 2.2.20 suspends everything then recovers during heavy I/O
@ 2002-04-04 22:06 Jim Wilcoxson
  2002-04-05  5:29 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Jim Wilcoxson @ 2002-04-04 22:06 UTC (permalink / raw)
  To: linux-mm

I'm setting up a new system with 2.2.20, Ingo's raid patches, plus 
Hedrick's IDE patches.

When doing heavy I/O, like copying partitions between drives using tar in a 
pipeline, I've noticed that things will just stop for long periods of time, 
presumably while buffers are written out to the destination disk.  The 
destination drive light is on and the system is not exactly hung, because I 
can switch consoles and stuff, but a running vmstat totally suspends for 
10-15 seconds.

Any tips or patches that will avoid this?  If our server hangs for 15 
seconds, we're going to have tons of web requests piled up for it when it 
decides to wakeup...

Thanks for any advice you may have.  (I'm not on the mailing list BTW).

Jim
___________________________________________
Jim Wilcoxson, Owner
Ruby Lane Antiques, Collectibles & Fine Art
1.313.274.0788
http://www.rubylane.com 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.2.20 suspends everything then recovers during heavy I/O
  2002-04-04 22:06 2.2.20 suspends everything then recovers during heavy I/O Jim Wilcoxson
@ 2002-04-05  5:29 ` Andrew Morton
  2002-04-05 18:27   ` jim
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2002-04-05  5:29 UTC (permalink / raw)
  To: Jim Wilcoxson; +Cc: linux-mm

Jim Wilcoxson wrote:
> 
> I'm setting up a new system with 2.2.20, Ingo's raid patches, plus
> Hedrick's IDE patches.
> 
> When doing heavy I/O, like copying partitions between drives using tar in a
> pipeline, I've noticed that things will just stop for long periods of time,
> presumably while buffers are written out to the destination disk.  The
> destination drive light is on and the system is not exactly hung, because I
> can switch consoles and stuff, but a running vmstat totally suspends for
> 10-15 seconds.
> 
> Any tips or patches that will avoid this?  If our server hangs for 15
> seconds, we're going to have tons of web requests piled up for it when it
> decides to wakeup...
> 

Which filesystem are you using?

First thing to do is to ensure that your disks are achieving
the expected bandwidth.  Measure them with `hdparm -t'.
If the throughput is poor, and they're IDE, check the
chipset tuning options in your kernel config and/or
tune the disks with hdparm.

If all that fails, you can probably smooth things
out by tuning the writeback parameters in /proc/sys/vm/bdflush
(if that's there in 2.2.  It's certainly somewhere :))
Set the `interval' value smaller than the default five
seconds, set `nfract' higher.  Set `age_buffer' lower..

And finally: don't go copying entire partitions around
on a live web server :)

-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.2.20 suspends everything then recovers during heavy I/O
  2002-04-05  5:29 ` Andrew Morton
@ 2002-04-05 18:27   ` jim
  2002-04-05 18:47     ` Martin J. Bligh
  0 siblings, 1 reply; 6+ messages in thread
From: jim @ 2002-04-05 18:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm

> Jim Wilcoxson wrote:
> > 
> > I'm setting up a new system with 2.2.20, Ingo's raid patches, plus
> > Hedrick's IDE patches.
> > 
> > When doing heavy I/O, like copying partitions between drives using tar in a
> > pipeline, I've noticed that things will just stop for long periods of time,
> > presumably while buffers are written out to the destination disk.  The
> > destination drive light is on and the system is not exactly hung, because I
> > can switch consoles and stuff, but a running vmstat totally suspends for
> > 10-15 seconds.
> > 
> > Any tips or patches that will avoid this?  If our server hangs for 15
> > seconds, we're going to have tons of web requests piled up for it when it
> > decides to wakeup...
> > 
> 
> Which filesystem are you using?

ext2

> First thing to do is to ensure that your disks are achieving
> the expected bandwidth.  Measure them with `hdparm -t'.
> If the throughput is poor, and they're IDE, check the
> chipset tuning options in your kernel config and/or
> tune the disks with hdparm.

# hdparm -tT /dev/hdg /dev/hdg:
Timing buffer-cache reads:   128 MB in  0.65 seconds =196.92 MB/sec
Timing buffered disk reads:  64 MB in  1.78 seconds = 35.96 MB/sec             

Is this fast?  I dunno - seems fast.  The Promise cards are in a 66MHz
bus slot, so I thought about using the idebus= thing to tell it that,
but I'm gun shy.  Probably not worth it for real-world accesses.  All
the drives are in UDMA5 mode:

# hdparm -i /dev/hdg

/dev/hdg:

 Model=Maxtor 5T060H6, FwRev=TAH71DP0, SerialNo=T6HMF4EC
 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=57
 BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=120103200
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 
 Drive Supports : Reserved : ATA-1 ATA-2 ATA-3 ATA-4 ATA-5 ATA-6 
 Kernel Drive Geometry LogicalCHS=119150/16/63 PhysicalCHS=119150/16/63

> If all that fails, you can probably smooth things
> out by tuning the writeback parameters in /proc/sys/vm/bdflush
> (if that's there in 2.2.  It's certainly somewhere :))
> Set the `interval' value smaller than the default five
> seconds, set `nfract' higher.  Set `age_buffer' lower..

Thanks, I'll try these tips.  IMO, one of Linux's weaknesses is that
it is not easy to run I/O bound jobs without killing the performance
of everything on the machine because of buffer cacheing.  I know lots
of people are working on solving this and that 2.4 is much better in
this regard.  It just takes time for a production site to have the
warm fuzzies about changing their OS.

> And finally: don't go copying entire partitions around
> on a live web server :)

What would be really great is some way to indicate, maybe with an
O_SEQ flag or something, that an application is going to sequentially
access a file, so cacheing it is a no-win proposition.  Production
servers do have situations where lots of data has to be copied or
accessed, for example, to do a backup, but doing a backup shouldn't
mean that all of the important stuff gets continuously thrown out of
memory while the backup is running.  Saving metadata during a backup
is useful.  Saving file data isn't.  It's seems hard to do this
without an application hint because I may scan a database
sequentially but I'd still want those buffers to stay resident.

Linux's I/O strategy (2.2.20), IMO, is kinda flawed because a very
high priority process (kswapd) is used to cleanup the mess that other
I/O-bound processes leave behind.  To me, it would be better to
penalize the applications that are causing the phsical I/O and slow
them down rather than letting them have free reign when there is
buffer space available, they instantly fill it, and then invoke this
high-priority process in quasi-emergency mode to flush the buffers.

The other thing I suggested to Alan Cox is a new ulimit that limits
how many file buffers a process can acquire.  If the buffer is
referenced by another process other than the one that caused it to be
created, then maybe it isn't counted in the limit (sharing).  This
way, without changing any applications, I can set the ulimit before a
backup procedure w/o having to change any applications.  Another
suggestion is to limit disk and network I/O bandwidth by process using
ulimits.  If I have a 1GB link between machines, I don't necessarily
want to kill two computers to transfer a large file across the link.
Maybe I don't care how long it takes.  I know some applications are
adding support for throttling, and there are various other ways to do
it - shaper, QOS, throttled pipes, etc. - but a general, easy-to-use
mechanism would be very helpful to production sites.  We don't always
have a lot of time to learn the ins and outs of setting up complex (to
us) things like QOS.  Hell, I couldn't even wade through all the
kernel build options for QOS. :) It's a great feature for sites using
Linux as routers, but too complex for general purpose use, IMO.

I've been reading some about the new O(1) CPU scheduler and it sounds
interesting.  Scheduling CPUs is only part of the problem.  In an I/O
bound situation, there is plenty of CPU to go around.  The problem
becomes fair, smooth access to the drives for all processes that need
that resource, also recognizing that different processes have
different completion constraints.  Right now I have to copy a 30GB
partition to another drive in order to do an upgrade for RAID.  I
don't care if it takes 3 days cause I still have to rsync it
afterwards, but I do have to run it on a live server.  I had to write
a pipe throttling thingy to run tar data through so it didn't kill our
server.

Okay, end of my rant.  I have my raid running now, my IDE problems
have subsided, and I'm a happy Linux camper again.  THanks again for
the tips.

Jim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.2.20 suspends everything then recovers during heavy I/O
  2002-04-05 18:27   ` jim
@ 2002-04-05 18:47     ` Martin J. Bligh
  2002-04-05 19:52       ` jim
  0 siblings, 1 reply; 6+ messages in thread
From: Martin J. Bligh @ 2002-04-05 18:47 UTC (permalink / raw)
  To: jim, Andrew Morton; +Cc: linux-mm

> What would be really great is some way to indicate, maybe with an
> O_SEQ flag or something, that an application is going to sequentially
> access a file, so cacheing it is a no-win proposition.  Production
> servers do have situations where lots of data has to be copied or
> accessed, for example, to do a backup, but doing a backup shouldn't
> mean that all of the important stuff gets continuously thrown out of
> memory while the backup is running.  Saving metadata during a backup
> is useful.  Saving file data isn't.  It's seems hard to do this
> without an application hint because I may scan a database
> sequentially but I'd still want those buffers to stay resident.

Doesn't the raw IO stuff do this, effectively?

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.2.20 suspends everything then recovers during heavy I/O
  2002-04-05 18:47     ` Martin J. Bligh
@ 2002-04-05 19:52       ` jim
  2002-04-05 19:55         ` jim
  0 siblings, 1 reply; 6+ messages in thread
From: jim @ 2002-04-05 19:52 UTC (permalink / raw)
  To: Martin.Bligh; +Cc: jim, Andrew Morton, linux-mm

But tar & rsync don't work on raw partitions.  There are lots of times
when individual file data has to be processed, and lots of it, like
running stats on large web server logs, compressing the logs, copying
a DB backup to a remote machine for offsite backup, sorting a huge
file, etc. where putting the file in the buffer cache is a waste.

Even in the case of a sort, where you are going to go back and
reference the data again, these often work by reading sequential
through the data once, sorting the keys, then reordering the file.
The initial sequential scan won't benefit from the buffer cache unless
the whole file fits in memory.  The reorder pass would benefit.

An idea I had a while back was to keep track of whether a file has
been randomly positioned or not.  If not, and you have more than a
certain amount of the file already in memory, start reusing buffers
with early parts of the file instead of hogging more.  To me this
is not as good of a solution because there are likely many cases
where this will hurt performance, like repeatedly fgreping a file
larger than the threshold.  If there was a manual tweak, it would
be guaranteed to be used in only the right places.  If tar used
the flag, I guess it's theoretically possible someone would do
repeated tars of the same data, but that seems improbable.  And if
they do that and it takes longer, it's still probably better than
hogging buffers.  Who cares how long a tar takes?

Jim

> 
> > What would be really great is some way to indicate, maybe with an
> > O_SEQ flag or something, that an application is going to sequentially
> > access a file, so cacheing it is a no-win proposition.  Production
> > servers do have situations where lots of data has to be copied or
> > accessed, for example, to do a backup, but doing a backup shouldn't
> > mean that all of the important stuff gets continuously thrown out of
> > memory while the backup is running.  Saving metadata during a backup
> > is useful.  Saving file data isn't.  It's seems hard to do this
> > without an application hint because I may scan a database
> > sequentially but I'd still want those buffers to stay resident.
> 
> Doesn't the raw IO stuff do this, effectively?
> 
> M.
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.2.20 suspends everything then recovers during heavy I/O
  2002-04-05 19:52       ` jim
@ 2002-04-05 19:55         ` jim
  0 siblings, 0 replies; 6+ messages in thread
From: jim @ 2002-04-05 19:55 UTC (permalink / raw)
  To: jim; +Cc: Martin.Bligh, Andrew Morton, linux-mm

I just realized, you are probably talking about raw IO on a file, not
raw IO on a partition like dump.  I don't know anything about it so
my ignorance is starting to show here... :)

J

> > 
> > Doesn't the raw IO stuff do this, effectively?
> > 
> > M.
> > 
> > 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-04-05 19:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-04-04 22:06 2.2.20 suspends everything then recovers during heavy I/O Jim Wilcoxson
2002-04-05  5:29 ` Andrew Morton
2002-04-05 18:27   ` jim
2002-04-05 18:47     ` Martin J. Bligh
2002-04-05 19:52       ` jim
2002-04-05 19:55         ` jim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox