* Interesting item came up while working on FreeBSD's pageout daemon
@ 2000-12-16 20:16 Matthew Dillon
2000-12-21 16:47 ` Daniel Phillips
0 siblings, 1 reply; 10+ messages in thread
From: Matthew Dillon @ 2000-12-16 20:16 UTC (permalink / raw)
To: Rik van Riel; +Cc: Linus Torvalds
I've been working with a particular news user who runs FreeBSD very
close to the page thrashing limit testing different pageout algorithms
and I came up with an interesting conclusion in regards to the flushing
of dirty pages in the pageout scan which I think may interest you.
This particular user runs around 500-600 newsreader processes on a 512M
machine. The processes eat around 8MB/sec in 'new' memory from reading
the news spool and also, of course, dirty a certain amount of memory
updating the news history file and for their run-time VM. The disks
are near saturation (70-90% busy) just from normal operation.
In testing, we've found something quite interesting - the cost of flushing
a dirty page on a system which is already close to disk saturation
seriously degrades performance. Not just a little, but a lot.
My solution is to run dirty pages through the FBsd inactive queue twice.
i.e. the first time a dirty page shows up at the head of the queue I set
a flag and send it to the end of the queue. The second time it reaches
the head of the queue I flush it. This gives the dirty page extra time
in memory during which it could be reactivated or reused by the process
before the system decides to spend resources flushing it out.
FreeBSD-4.0 dirty pages are flushed in queue-order, but
the number of pages laundered per pass was
limited. Unknown load characteristics.
In general this worked pretty well but had
the side effect of fragmenting the ordering
of dirty pages in the queue.
FreeBSD-4.1.1 and 4.2: dirty pages are skipped unless we run out
of clean pages. News load: 1-2, but pageout
activity was very inconsistent (not a smooth
load curve).
FreeBSD-stable: dirty pages are flushed in queue-order.
News load: 50-150 (i.e. it blew up).
(patches in development): dirty pages are given two go-arounds in the
queue before being flushed. News load: 1-2.
Very consistent pageout activity.
My conclusion from this is that I was wrong before when I thought that
clean and dirty pages should be treated the same, and I was also wrong
trying to give clean pages 'ultimate' priority over dirty pages, but I
think I may be right giving dirty pages two go-arounds in the queue
before flushing. Limiting the number of dirty page flushes allowed per
pass also works but has unwanted side effects.
--
I have also successfully tested a low (real) memory deadlock handling
solution, which is currently in FreeBSD-stable and FreeBSD-current.
I'm not sure what Linux is using for low-memory deadlock handling right
now, so this may be of interest. The solution is as follows:
* We operate under the assumption that I/O *must* be able to continue
to operate no matter what the memory circumstances.
* All allocations made by the I/O subsystem are allowed to dig into
the system memory reserve.
* No allocation outside the I/O subsystem is allowed to dig into the
memory reserve (i.e. such allocations block until the system has
freed enough pages to get out of the red area).
* The I/O subsystem explicitly detects a severe memory shortage but
rather then blocking it instead simply frees I/O resources as I/O's
complete, allowing I/O to continue at near full bore without eating
additional memory.
This solution appears to work under all load conditions. Yahoo was unable
to crash or deadlock extremely heavily loaded FreeBSD boxes after
I comitted this solution.
-
Finally, I came up with a solution to deal with heavy write loads
interfering with read loads. If I remember correctly Linux tries to
reorder reads in front of writes. FreeBSD does not do that, because
once FreeBSD determines that a delayed write should go out it wants it
to go out, and because you get non-deterministic loads when you make
that kind of ad-hoc ordering decision. I personally do not like
reordering reads before writes, it makes too many assumptions on
the type of load the disks will have.
The solution I'm testing for FreeBSD involves keeping track of the
number of bytes that are in-transit to/from the device. On a heavily
loaded system this number is typically in the 16K-512K range.
However, when writing out a huge file the write I/O can not only
saturate your VM system, it can also saturate the device (even with
your read reordering you can saturate the device with pending writes).
I've measured in-transit loads of up to 8 MB on FreeBSD in such cases.
I have found a solution which prevents saturation of the VM system
*AND* prevents saturation of the device. Saturation of the VM system
is prevented by immediately issuing an async write when sequential write
operation is detected... i.e. FreeBSD's clustering code. (Actually,
this part of the solution was already implemented). Device saturation
is avoided by blocking writes at the queue-to-the-device level when
the number of in-transit bytes already queued to the device reaches a
high water mark. For example, 1 MB, then wakeup those blocked processes
when the number of in-transit bytes drops to a low water mark.
For example, 512K.
We never block read requests. Read requests also have the side effect
of 'eating into' the in-transit count, reducing the number of
in-transit bytes available for writes before the writes block. So
a heavy read load automatically reduces write priority. (since reads
are synchronous or have only limited read-ahead, reads do not present
the same device saturation issue as writes do).
In my testing this solution resulted in a system that appeared to behave
normally and efficiently even under extremely heavy write loads. The
system load also became much more deterministic (fewer spikes).
(You can repost this if you want).
Later!
-Matt
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interesting item came up while working on FreeBSD's pageout daemon
2000-12-16 20:16 Interesting item came up while working on FreeBSD's pageout daemon Matthew Dillon
@ 2000-12-21 16:47 ` Daniel Phillips
2000-12-21 19:42 ` Rik van Riel
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2000-12-21 16:47 UTC (permalink / raw)
To: Matthew Dillon; +Cc: linux-mm
Matthew Dillon wrote:
> My conclusion from this is that I was wrong before when I thought that
> clean and dirty pages should be treated the same, and I was also wrong
> trying to give clean pages 'ultimate' priority over dirty pages, but I
> think I may be right giving dirty pages two go-arounds in the queue
> before flushing. Limiting the number of dirty page flushes allowed per
> pass also works but has unwanted side effects.
Hi, I'm a newcomer to the mm world, but it looks like fun, so I'm
jumping in. :-)
It looks like what you really want are separate lru lists for clean and
dirty. That way you can tune the rate at which dirty vs clean pages are
moved from active to inactive.
It makes sense that dirty pages should be treated differently from clean
ones because guessing wrong about the inactiveness of a dirty page costs
twice as much as guessing wrong about a clean page (write+read vs just
read). Does that mean that make dirty pages should hang around on
probation twice as long as clean ones? Sounds reasonable.
I was going to suggest aging clean and dirty pages at different rates,
then I realized that an inactive_dirty page actually has two chance to
be reactivated, once while it's on inactive_dirty, and again while it's
on inactive_clean, and you get a double-length probation from that.
--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interesting item came up while working on FreeBSD's pageout daemon
2000-12-21 16:47 ` Daniel Phillips
@ 2000-12-21 19:42 ` Rik van Riel
2000-12-22 3:20 ` Matthew Dillon
2000-12-28 23:04 ` Daniel Phillips
0 siblings, 2 replies; 10+ messages in thread
From: Rik van Riel @ 2000-12-21 19:42 UTC (permalink / raw)
To: Daniel Phillips; +Cc: Matthew Dillon, linux-mm
On Thu, 21 Dec 2000, Daniel Phillips wrote:
> Matthew Dillon wrote:
> > My conclusion from this is that I was wrong before when I thought that
> > clean and dirty pages should be treated the same, and I was also wrong
> > trying to give clean pages 'ultimate' priority over dirty pages, but I
> > think I may be right giving dirty pages two go-arounds in the queue
> > before flushing. Limiting the number of dirty page flushes allowed per
> > pass also works but has unwanted side effects.
>
> Hi, I'm a newcomer to the mm world, but it looks like fun, so I'm
> jumping in. :-)
>
> It looks like what you really want are separate lru lists for
> clean and dirty. That way you can tune the rate at which dirty
> vs clean pages are moved from active to inactive.
Let me clear up one thing. The whole clean/dirty story
Matthew wrote down only goes for the *inactive* pages,
not for the active ones...
regards,
Rik
--
Hollywood goes for world dumbination,
Trailer at 11.
http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interesting item came up while working on FreeBSD's pageout daemon
2000-12-21 19:42 ` Rik van Riel
@ 2000-12-22 3:20 ` Matthew Dillon
2000-12-28 23:04 ` Daniel Phillips
1 sibling, 0 replies; 10+ messages in thread
From: Matthew Dillon @ 2000-12-22 3:20 UTC (permalink / raw)
To: Rik van Riel; +Cc: Daniel Phillips, linux-mm
Right. I am going to add another addendum... let me give a little
background first. I've been testing the FBsd VM system with two
extremes... on one extreme is Yahoo which tends to wind up running
servers which collect a huge number of dirty pages that need to be
flushed, but have lots of disk bandwidth available to flush them.
The other extreme is a heavily loaded newsreader box which operate
under extreme memory pressure but has mostly clean pages. Heavy
load in this case means 400-600 newsreader processes on a 512MB box
eating around 8MB/sec in new memory, but which has mostly clean pages.
My original solution for Yahoo was to treat clean and dirty pages at
the head of the inactive queue the same... that is, flush dirty pages
as they were encountered in the inactive queue and free clean pages,
with no limit on dirty page flushes. This worked great for yahoo,
but failed utterly with the poor news machines. News machines that
were running at a load of 1-2 were suddenly running at lods of 50-150.
i.e. they began to thrash and get really sludgy.
It took me a few days to figure out what was going on, because the
stats from the news machines showed the pageout daemon having no
problems... it was finding around 10,000 clean pages and 200-400
dirty pages per pass, and flushing the 200-400 dirty pages. That's
a 25:1 clean:dirty ratio.
Well, it turns out that the flushing of 200-400 dirty pages per pageout
pass was responsible for the load blowups. The machines had already
been running at 100% disk load, you may recall. Adding the additional
write load, even at 25:1, slowed the drives down enough that suddenly
many of the newsreader processes were blocking on disk I/O. Hence the
load shot through the roof.
I tried to 'fix' the problem by saying "well, ok, so we won't flush
dirty pages immediately, we will give them another runaround in the
inactive queue before we flush them". This worked for medium loads and
I thought I was done, so I wrote my first summary message to Rik and
Linus describing the problem and solution.
--
But the story continues. It turns out that that has NOT fixed the
problem. The number of dirty pages being flushed went down, but
not enough. Newsreader machine loads still ran in the 50-100 range.
At this point we really are talking about truely idle-but-dirty pages.
No matter, the machines were still blowing up.
So, to make a long story even longer, after further experiments I
determined that it was the write-load itself blowin up the machines.
Never mind what they were writing ... the simple *act* of writing
anything made the HD's much less efficient then under a read-only load.
Even limiting the number of pages flushed to a reasonable sounding
number like 64 didn't solve the problem... the load still hovered around
20.
The patch I currently have under test which solves the problem is a
combination of what I had in 4.2-release, which limited the dirty page
flushing to 32 pages per pass, and what I have in 4.2-stable which
has no limit. The new patch basically does this:
(remember pageout passes always free/flush pages from the inactive
queue, never the active queue!)
* Run a pageout pass with a dirty page flushing limit of 32 plus
give dirty inactive pages a second go-around in the inactive
queue.
* If the pass succeeds we are done.
* If the pass cannot free up enough pages (i.e. the machine happens
to have a huge number of dirty pages sitting around, aka the Yahoo
scenario), then take a second pass immediately and do not have any
limit whatsoever on dirty page flushes in the second pass.
*THIS* appears to work for both extremes. It's what I'm going to be
committing in the next few days to FreeBSD. BTW, years ago John Dyson
theorized that disk writing could have this effect on read efficiency,
which is why FBsd originally had a 32 page dirty flush limit per pass.
Now it all makes sense, and I've got proof that it's still a problem
with modern systems.
-Matt
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interesting item came up while working on FreeBSD's pageout daemon
2000-12-21 19:42 ` Rik van Riel
2000-12-22 3:20 ` Matthew Dillon
@ 2000-12-28 23:04 ` Daniel Phillips
2000-12-29 6:24 ` Matthew Dillon
1 sibling, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2000-12-28 23:04 UTC (permalink / raw)
To: Rik van Riel; +Cc: Matthew Dillon, linux-mm
On Thu, 21 Dec 2000, Rik van Riel wrote:
> On Thu, 21 Dec 2000, Daniel Phillips wrote:
> > Matthew Dillon wrote:
> > > My conclusion from this is that I was wrong before when I thought that
> > > clean and dirty pages should be treated the same, and I was also wrong
> > > trying to give clean pages 'ultimate' priority over dirty pages, but I
> > > think I may be right giving dirty pages two go-arounds in the queue
> > > before flushing. Limiting the number of dirty page flushes allowed per
> > > pass also works but has unwanted side effects.
> >
> > Hi, I'm a newcomer to the mm world, but it looks like fun, so I'm
> > jumping in. :-)
> >
> > It looks like what you really want are separate lru lists for
> > clean and dirty. That way you can tune the rate at which dirty
> > vs clean pages are moved from active to inactive.
>
> Let me clear up one thing. The whole clean/dirty story
> Matthew wrote down only goes for the *inactive* pages,
> not for the active ones...
Thanks for clearing that up, but it doesn't change the observation -
it still looks like he's keeping dirty pages 'on probation' twice as
long as before. Having each page take an extra lap the inactive_dirty
list isn't exactly equivalent to just scanning the list more slowly,
but it's darn close. Is there a fundamental difference?
--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interesting item came up while working on FreeBSD's pageout daemon
2000-12-28 23:04 ` Daniel Phillips
@ 2000-12-29 6:24 ` Matthew Dillon
2000-12-29 14:19 ` Daniel Phillips
2000-12-29 23:00 ` Daniel Phillips
0 siblings, 2 replies; 10+ messages in thread
From: Matthew Dillon @ 2000-12-29 6:24 UTC (permalink / raw)
To: Daniel Phillips; +Cc: Rik van Riel, linux-mm
:Thanks for clearing that up, but it doesn't change the observation -
:it still looks like he's keeping dirty pages 'on probation' twice as
:long as before. Having each page take an extra lap the inactive_dirty
:list isn't exactly equivalent to just scanning the list more slowly,
:but it's darn close. Is there a fundamental difference?
:
:--
:Daniel
Well, scanning the list more slowly would still give dirty and clean
pages the same effective priority relative to each other before being
cleaned. Giving the dirty pages an extra lap around the inactive
queue gives clean pages a significantly higher priority over dirty
pages in regards to choosing which page to launder next.
So there is a big difference there.
The effect of this (and, more importantly, limiting the number of dirty
pages one is willing to launder in the first pageout pass) is rather
significant due to the big difference in cost in dealing with clean
pages verses dirty pages.
'cleaning' a clean page means simply throwing it away, which costs maybe
a microsecond of cpu time and no I/O. 'cleaning' a dirty page requires
flushing it to its backing store prior to throwing it away, which costs
a significant bit of cpu and at least one write I/O. One write I/O
may not seem like a lot, but if the disk is already loaded down and the
write I/O has to seek we are talking at least 5 milliseconds of disk
time eaten by the operation. Multiply this by the number of dirty pages
being flushed and it can cost a huge and very noticeable portion of
your disk bandwidth, verses zip for throwing away a clean page.
Due to the (relatively speaking) huge cost involved in laundering a dirty
page, the extra cpu time we eat giving the dirty pages a longer life on
the inactive queue in the hopes of avoiding the flush, or skipping them
entirely with a per-pass dirty page flushing limit, is well worth it.
This is a classic algorithmic tradeoff... spend a little extra cpu to
choose the best pages to launder in order to save a whole lot of cpu
(and disk I/O) later on.
-Matt
Matthew Dillon
<dillon@backplane.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interesting item came up while working on FreeBSD's pageout daemon
2000-12-29 6:24 ` Matthew Dillon
@ 2000-12-29 14:19 ` Daniel Phillips
2000-12-29 19:58 ` James Antill
2000-12-29 23:00 ` Daniel Phillips
1 sibling, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2000-12-29 14:19 UTC (permalink / raw)
To: linux-mm
Matthew Dillon wrote:
> :Thanks for clearing that up, but it doesn't change the observation -
> :it still looks like he's keeping dirty pages 'on probation' twice as
> :long as before. Having each page take an extra lap the inactive_dirty
> :list isn't exactly equivalent to just scanning the list more slowly,
> :but it's darn close. Is there a fundamental difference?
> :
> :--
> :Daniel
>
> Well, scanning the list more slowly would still give dirty and clean
> pages the same effective priority relative to each other before being
> cleaned. Giving the dirty pages an extra lap around the inactive
> queue gives clean pages a significantly higher priority over dirty
> pages in regards to choosing which page to launder next.
> So there is a big difference there.
There's the second misunderstanding. I assumed you had separate clean
vs dirty inactive lists.
> The effect of this (and, more importantly, limiting the number of dirty
> pages one is willing to launder in the first pageout pass) is rather
> significant due to the big difference in cost in dealing with clean
> pages verses dirty pages.
>
> 'cleaning' a clean page means simply throwing it away, which costs maybe
> a microsecond of cpu time and no I/O. 'cleaning' a dirty page requires
> flushing it to its backing store prior to throwing it away, which costs
> a significant bit of cpu and at least one write I/O. One write I/O
> may not seem like a lot, but if the disk is already loaded down and the
> write I/O has to seek we are talking at least 5 milliseconds of disk
> time eaten by the operation. Multiply this by the number of dirty pages
> being flushed and it can cost a huge and very noticeable portion of
> your disk bandwidth, verses zip for throwing away a clean page.
To estimate the cost of paging io you have to think in terms of the
extra work you have to do because you don't have infinite memory. In
other words, you would have had to write those dirty pages anyway - this
is an unavoidable cost. You incur an avoidable cost when you reclaim a
page that will be needed again sooner than some other candidate. If the
page was clean the cost is an extra read, if dirty it's a write plus a
read. Alternatively, the dirty page might be written again soon - if
it's a partial page write the cost is an extra read and a write, if it's
a full page the cost is just a write. So it costs at most twice as much
to guess wrong about a dirty vs clean page. This difference is
significant, but it's not as big as the 1 usec vs 5 msec you suggesed.
If I'm right then making the dirty page go 3 times around the loop
should result in worse performance vs 2 times.
> Due to the (relatively speaking) huge cost involved in laundering a dirty
> page, the extra cpu time we eat giving the dirty pages a longer life on
> the inactive queue in the hopes of avoiding the flush, or skipping them
> entirely with a per-pass dirty page flushing limit, is well worth it.
>
> This is a classic algorithmic tradeoff... spend a little extra cpu to
> choose the best pages to launder in order to save a whole lot of cpu
> (and disk I/O) later on.
--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interesting item came up while working on FreeBSD's pageout daemon
2000-12-29 14:19 ` Daniel Phillips
@ 2000-12-29 19:58 ` James Antill
2000-12-29 23:12 ` Daniel Phillips
0 siblings, 1 reply; 10+ messages in thread
From: James Antill @ 2000-12-29 19:58 UTC (permalink / raw)
To: Daniel Phillips; +Cc: linux-mm
> Matthew Dillon wrote:
> > The effect of this (and, more importantly, limiting the number of dirty
> > pages one is willing to launder in the first pageout pass) is rather
> > significant due to the big difference in cost in dealing with clean
> > pages verses dirty pages.
> >
> > 'cleaning' a clean page means simply throwing it away, which costs maybe
> > a microsecond of cpu time and no I/O. 'cleaning' a dirty page requires
> > flushing it to its backing store prior to throwing it away, which costs
> > a significant bit of cpu and at least one write I/O. One write I/O
> > may not seem like a lot, but if the disk is already loaded down and the
> > write I/O has to seek we are talking at least 5 milliseconds of disk
> > time eaten by the operation. Multiply this by the number of dirty pages
> > being flushed and it can cost a huge and very noticeable portion of
> > your disk bandwidth, verses zip for throwing away a clean page.
>
> To estimate the cost of paging io you have to think in terms of the
> extra work you have to do because you don't have infinite memory. In
> other words, you would have had to write those dirty pages anyway - this
> is an unavoidable cost. You incur an avoidable cost when you reclaim a
> page that will be needed again sooner than some other candidate. If the
> page was clean the cost is an extra read, if dirty it's a write plus a
> read. Alternatively, the dirty page might be written again soon - if
> it's a partial page write the cost is an extra read and a write, if it's
> a full page the cost is just a write. So it costs at most twice as much
> to guess wrong about a dirty vs clean page. This difference is
> significant, but it's not as big as the 1 usec vs 5 msec you suggesed.
As I understand it you can't just add the costs of the reads and
writes as 1 each. So given...
Clean = 1r
Dirty = 1w + 1r
...it's assumed that a 1w is >= than a 1r, but what are the exact
values ?
It probably gets even more complex as if the dirty page is touched
between the write and the cleanup then it'll avoid the re-read
behavior and will appear faster (although it slowed the system down a
little doing it's write).
> If I'm right then making the dirty page go 3 times around the loop
> should result in worse performance vs 2 times.
It's quite possible, but if there were 2 lists and the dirty pages
were laundered at 33% the rate of the clean pages would that be better
than 50%?
--
# James Antill -- james@and.org
:0:
* ^From: .*james@and.org
/dev/null
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interesting item came up while working on FreeBSD's pageout daemon
2000-12-29 6:24 ` Matthew Dillon
2000-12-29 14:19 ` Daniel Phillips
@ 2000-12-29 23:00 ` Daniel Phillips
1 sibling, 0 replies; 10+ messages in thread
From: Daniel Phillips @ 2000-12-29 23:00 UTC (permalink / raw)
To: Matthew Dillon; +Cc: linux-mm
Matthew Dillon wrote:
> :Thanks for clearing that up, but it doesn't change the observation -
> :it still looks like he's keeping dirty pages 'on probation' twice as
> :long as before. Having each page take an extra lap the inactive_dirty
> :list isn't exactly equivalent to just scanning the list more slowly,
> :but it's darn close. Is there a fundamental difference?
> :
> :--
> :Daniel
>
> Well, scanning the list more slowly would still give dirty and clean
> pages the same effective priority relative to each other before being
> cleaned. Giving the dirty pages an extra lap around the inactive
> queue gives clean pages a significantly higher priority over dirty
> pages in regards to choosing which page to launder next.
> So there is a big difference there.
There's the second misunderstanding. I assumed you had separate clean
vs dirty inactive lists.
> The effect of this (and, more importantly, limiting the number of dirty
> pages one is willing to launder in the first pageout pass) is rather
> significant due to the big difference in cost in dealing with clean
> pages verses dirty pages.
>
> 'cleaning' a clean page means simply throwing it away, which costs maybe
> a microsecond of cpu time and no I/O. 'cleaning' a dirty page requires
> flushing it to its backing store prior to throwing it away, which costs
> a significant bit of cpu and at least one write I/O. One write I/O
> may not seem like a lot, but if the disk is already loaded down and the
> write I/O has to seek we are talking at least 5 milliseconds of disk
> time eaten by the operation. Multiply this by the number of dirty pages
> being flushed and it can cost a huge and very noticeable portion of
> your disk bandwidth, verses zip for throwing away a clean page.
To estimate the cost of paging io you have to think in terms of the
extra work you have to do because you don't have infinite memory. In
other words, you would have had to write those dirty pages anyway - this
is an unavoidable cost. You incur an avoidable cost when you reclaim a
page that will be needed again sooner than some other candidate. If the
page was clean the cost is an extra read, if dirty it's a write plus a
read. Alternatively, the dirty page might be written again soon - if
it's a partial page write the cost is an extra read and a write, if it's
a full page the cost is just a write. So it costs at most twice as much
to guess wrong about a dirty vs clean page. This difference is
significant, but it's not as big as the 1 usec vs 5 msec you suggesed.
If I'm right then making the dirty page go 3 times around the loop
should result in worse performance vs 2 times.
> Due to the (relatively speaking) huge cost involved in laundering a dirty
> page, the extra cpu time we eat giving the dirty pages a longer life on
> the inactive queue in the hopes of avoiding the flush, or skipping them
> entirely with a per-pass dirty page flushing limit, is well worth it.
>
> This is a classic algorithmic tradeoff... spend a little extra cpu to
> choose the best pages to launder in order to save a whole lot of cpu
> (and disk I/O) later on.
--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interesting item came up while working on FreeBSD's pageout daemon
2000-12-29 19:58 ` James Antill
@ 2000-12-29 23:12 ` Daniel Phillips
0 siblings, 0 replies; 10+ messages in thread
From: Daniel Phillips @ 2000-12-29 23:12 UTC (permalink / raw)
To: James Antill, linux-mm
James Antill wrote:
> > To estimate the cost of paging io you have to think in terms of the
> > extra work you have to do because you don't have infinite memory. In
> > other words, you would have had to write those dirty pages anyway - this
> > is an unavoidable cost. You incur an avoidable cost when you reclaim a
> > page that will be needed again sooner than some other candidate. If the
> > page was clean the cost is an extra read, if dirty it's a write plus a
> > read. Alternatively, the dirty page might be written again soon - if
> > it's a partial page write the cost is an extra read and a write, if it's
> > a full page the cost is just a write. So it costs at most twice as much
> > to guess wrong about a dirty vs clean page. This difference is
> > significant, but it's not as big as the 1 usec vs 5 msec you suggesed.
>
> As I understand it you can't just add the costs of the reads and
> writes as 1 each. So given...
>
> Clean = 1r
> Dirty = 1w + 1r
>
> ...it's assumed that a 1w is >= than a 1r, but what are the exact
> values ?
By read and write I am talking about the necessary transfers to disk,
not the higher level file IO. Transfers to and from disk are nearly
equal in cost.
> It probably gets even more complex as if the dirty page is touched
> between the write and the cleanup then it'll avoid the re-read
> behavior and will appear faster (although it slowed the system down a
> little doing it's write).
Oh yes, it gets more complex. I'm trying to nail down the main costs by
eliminating the constant factors.
> > If I'm right then making the dirty page go 3 times around the loop
> > should result in worse performance vs 2 times.
>
> It's quite possible, but if there were 2 lists and the dirty pages
> were laundered at 33% the rate of the clean pages would that be better
> than 50%?
Eh. I don't know, I was hoping Matt would try it.
--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2000-12-29 23:12 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-12-16 20:16 Interesting item came up while working on FreeBSD's pageout daemon Matthew Dillon
2000-12-21 16:47 ` Daniel Phillips
2000-12-21 19:42 ` Rik van Riel
2000-12-22 3:20 ` Matthew Dillon
2000-12-28 23:04 ` Daniel Phillips
2000-12-29 6:24 ` Matthew Dillon
2000-12-29 14:19 ` Daniel Phillips
2000-12-29 19:58 ` James Antill
2000-12-29 23:12 ` Daniel Phillips
2000-12-29 23:00 ` Daniel Phillips
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox