Interesting item came up while working on FreeBSD's pageout daemon

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Interesting item came up while working on FreeBSD's pageout daemon
@ 2000-12-16 20:16 Matthew Dillon
  2000-12-21 16:47 ` Daniel Phillips
  0 siblings, 1 reply; 10+ messages in thread
From: Matthew Dillon @ 2000-12-16 20:16 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Linus Torvalds

    I've been working with a particular news user who runs FreeBSD very
    close to the page thrashing limit testing different pageout algorithms
    and I came up with an interesting conclusion in regards to the flushing
    of dirty pages in the pageout scan which I think may interest you.

    This particular user runs around 500-600 newsreader processes on a 512M
    machine.  The processes eat around 8MB/sec in 'new' memory from reading
    the news spool and also, of course, dirty a certain amount of memory
    updating the news history file and for their run-time VM.  The disks
    are near saturation (70-90% busy) just from normal operation.

    In testing, we've found something quite interesting - the cost of flushing
    a dirty page on a system which is already close to disk saturation
    seriously degrades performance.   Not just a little, but a lot.

    My solution is to run dirty pages through the FBsd inactive queue twice.
    i.e. the first time a dirty page shows up at the head of the queue I set
    a flag and send it to the end of the queue.  The second time it reaches
    the head of the queue I flush it.  This gives the dirty page extra time
    in memory during which it could be reactivated or reused by the process
    before the system decides to spend resources flushing it out. 

    FreeBSD-4.0			dirty pages are flushed in queue-order, but
				the number of pages laundered per pass was
				limited.  Unknown load characteristics.
				In general this worked pretty well but had
				the side effect of fragmenting the ordering
				of dirty pages in the queue.

    FreeBSD-4.1.1 and 4.2:	dirty pages are skipped unless we run out
				of clean pages.  News load: 1-2, but pageout
				activity was very inconsistent (not a smooth
				load curve).

    FreeBSD-stable:		dirty pages are flushed in queue-order.
				News load: 50-150 (i.e. it blew up). 

    (patches in development):	dirty pages are given two go-arounds in the
				queue before being flushed.  News load: 1-2.
				Very consistent pageout activity.

    My conclusion from this is that I was wrong before when I thought that
    clean and dirty pages should be treated the same, and I was also wrong
    trying to give clean pages 'ultimate' priority over dirty pages, but I
    think I may be right giving dirty pages two go-arounds in the queue
    before flushing.  Limiting the number of dirty page flushes allowed per
    pass also works but has unwanted side effects.

    --

    I have also successfully tested a low (real) memory deadlock handling
    solution, which is currently in FreeBSD-stable and FreeBSD-current.
    I'm not sure what Linux is using for low-memory deadlock handling right
    now, so this may be of interest.  The solution is as follows:

	* We operate under the assumption that I/O *must* be able to continue
	  to operate no matter what the memory circumstances.

	* All allocations made by the I/O subsystem are allowed to dig into
	  the system memory reserve.

	* No allocation outside the I/O subsystem is allowed to dig into the
	  memory reserve (i.e. such allocations block until the system has
	  freed enough pages to get out of the red area).

	* The I/O subsystem explicitly detects a severe memory shortage but
	  rather then blocking it instead simply frees I/O resources as I/O's
	  complete, allowing I/O to continue at near full bore without eating
	  additional memory.

    This solution appears to work under all load conditions. Yahoo was unable
    to crash or deadlock extremely heavily loaded FreeBSD boxes after
    I comitted this solution.

    -

    Finally, I came up with a solution to deal with heavy write loads 
    interfering with read loads.  If I remember correctly Linux tries to
    reorder reads in front of writes.  FreeBSD does not do that, because
    once FreeBSD determines that a delayed write should go out it wants it
    to go out, and because you get non-deterministic loads when you make
    that kind of ad-hoc ordering decision.  I personally do not like
    reordering reads before writes, it makes too many assumptions on
    the type of load the disks will have.

    The solution I'm testing for FreeBSD involves keeping track of the
    number of bytes that are in-transit to/from the device.  On a heavily 
    loaded system this number is typically in the 16K-512K range.  
    However, when writing out a huge file the write I/O can not only
    saturate your VM system, it can also saturate the device (even with
    your read reordering you can saturate the device with pending writes).  
    I've measured in-transit loads of up to 8 MB on FreeBSD in such cases.

    I have found a solution which prevents saturation of the VM system
    *AND* prevents saturation of the device.  Saturation of the VM system
    is prevented by immediately issuing an async write when sequential write
    operation is detected... i.e. FreeBSD's clustering code.  (Actually,
    this part of the solution was already implemented).  Device saturation
    is avoided by blocking writes at the queue-to-the-device level when
    the number of in-transit bytes already queued to the device reaches a
    high water mark. For example, 1 MB, then wakeup those blocked processes
    when the number of in-transit bytes drops to a low water mark.
    For example, 512K. 

    We never block read requests.  Read requests also have the side effect
    of 'eating into' the in-transit count, reducing the number of
    in-transit bytes available for writes before the writes block.  So
    a heavy read load automatically reduces write priority.  (since reads
    are synchronous or have only limited read-ahead, reads do not present
    the same device saturation issue as writes do).

    In my testing this solution resulted in a system that appeared to behave
    normally and efficiently even under extremely heavy write loads.  The
    system load also became much more deterministic (fewer spikes).

    (You can repost this if you want).

						Later!

						-Matt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Interesting item came up while working on FreeBSD's pageout daemon
  2000-12-16 20:16 Interesting item came up while working on FreeBSD's pageout daemon Matthew Dillon
@ 2000-12-21 16:47 ` Daniel Phillips
  2000-12-21 19:42   ` Rik van Riel
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2000-12-21 16:47 UTC (permalink / raw)
  To: Matthew Dillon; +Cc: linux-mm

Matthew Dillon wrote:
>     My conclusion from this is that I was wrong before when I thought that
>     clean and dirty pages should be treated the same, and I was also wrong
>     trying to give clean pages 'ultimate' priority over dirty pages, but I
>     think I may be right giving dirty pages two go-arounds in the queue
>     before flushing.  Limiting the number of dirty page flushes allowed per
>     pass also works but has unwanted side effects.

Hi, I'm a newcomer to the mm world, but it looks like fun, so I'm
jumping in. :-)

It looks like what you really want are separate lru lists for clean and
dirty.  That way you can tune the rate at which dirty vs clean pages are
moved from active to inactive.

It makes sense that dirty pages should be treated differently from clean
ones because guessing wrong about the inactiveness of a dirty page costs
twice as much as guessing wrong about a clean page (write+read vs just
read).  Does that mean that make dirty pages should hang around on
probation twice as long as clean ones?  Sounds reasonable.

I was going to suggest aging clean and dirty pages at different rates,
then I realized that an inactive_dirty page actually has two chance to
be reactivated, once while it's on inactive_dirty, and again while it's
on inactive_clean, and you get a double-length probation from that.

--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Interesting item came up while working on FreeBSD's pageout daemon
  2000-12-21 16:47 ` Daniel Phillips
@ 2000-12-21 19:42   ` Rik van Riel
  2000-12-22  3:20     ` Matthew Dillon
  2000-12-28 23:04     ` Daniel Phillips
  0 siblings, 2 replies; 10+ messages in thread
From: Rik van Riel @ 2000-12-21 19:42 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Matthew Dillon, linux-mm

On Thu, 21 Dec 2000, Daniel Phillips wrote:
> Matthew Dillon wrote:
> >     My conclusion from this is that I was wrong before when I thought that
> >     clean and dirty pages should be treated the same, and I was also wrong
> >     trying to give clean pages 'ultimate' priority over dirty pages, but I
> >     think I may be right giving dirty pages two go-arounds in the queue
> >     before flushing.  Limiting the number of dirty page flushes allowed per
> >     pass also works but has unwanted side effects.
> 
> Hi, I'm a newcomer to the mm world, but it looks like fun, so I'm
> jumping in. :-)
> 
> It looks like what you really want are separate lru lists for
> clean and dirty.  That way you can tune the rate at which dirty
> vs clean pages are moved from active to inactive.

Let me clear up one thing. The whole clean/dirty story
Matthew wrote down only goes for the *inactive* pages,
not for the active ones...

regards,

Rik
--
Hollywood goes for world dumbination,
	Trailer at 11.

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Interesting item came up while working on FreeBSD's pageout daemon
  2000-12-21 19:42   ` Rik van Riel
@ 2000-12-22  3:20     ` Matthew Dillon
  2000-12-28 23:04     ` Daniel Phillips
  1 sibling, 0 replies; 10+ messages in thread
From: Matthew Dillon @ 2000-12-22  3:20 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Daniel Phillips, linux-mm

    Right.  I am going to add another addendum... let me give a little
    background first.  I've been testing the FBsd VM system with two
    extremes... on one extreme is Yahoo which tends to wind up running
    servers which collect a huge number of dirty pages that need to be
    flushed, but have lots of disk bandwidth available to flush them.
    The other extreme is a heavily loaded newsreader box which operate 
    under extreme memory pressure but has mostly clean pages.   Heavy
    load in this case means 400-600 newsreader processes on a 512MB box
    eating around 8MB/sec in new memory, but which has mostly clean pages.

    My original solution for Yahoo was to treat clean and dirty pages at
    the head of the inactive queue the same... that is, flush dirty pages
    as they were encountered in the inactive queue and free clean pages,
    with no limit on dirty page flushes.  This worked great for yahoo,
    but failed utterly with the poor news machines.  News machines that
    were running at a load of 1-2 were suddenly running at lods of 50-150.
    i.e. they began to thrash and get really sludgy.

    It took me a few days to figure out what was going on, because the
    stats from the news machines showed the pageout daemon having no
    problems... it was finding around 10,000 clean pages and 200-400
    dirty pages per pass, and flushing the 200-400 dirty pages.  That's
    a 25:1 clean:dirty ratio.

    Well, it turns out that the flushing of 200-400 dirty pages per pageout
    pass was responsible for the load blowups.  The machines had already
    been running at 100% disk load, you may recall.  Adding the additional
    write load, even at 25:1, slowed the drives down enough that suddenly
    many of the newsreader processes were blocking on disk I/O.  Hence the
    load shot through the roof.

    I tried to 'fix' the problem by saying "well, ok, so we won't flush
    dirty pages immediately, we will give them another runaround in the
    inactive queue before we flush them".  This worked for medium loads and
    I thought I was done, so I wrote my first summary message to Rik and
    Linus describing the problem and solution.

    --

    But the story continues.  It turns out that that has NOT fixed the
    problem.  The number of dirty pages being flushed went down, but
    not enough.  Newsreader machine loads still ran in the 50-100 range.
    At this point we really are talking about truely idle-but-dirty pages.
    No matter, the machines were still blowing up.

    So, to make a long story even longer, after further experiments I
    determined that it was the write-load itself blowin up the machines.
    Never mind what they were writing ... the simple *act* of writing
    anything made the HD's much less efficient then under a read-only load.
    Even limiting the number of pages flushed to a reasonable sounding
    number like 64 didn't solve the problem... the load still hovered around
    20.

    The patch I currently have under test which solves the problem is a
    combination of what I had in 4.2-release, which limited the dirty page
    flushing to 32 pages per pass, and what I have in 4.2-stable which
    has no limit.  The new patch basically does this:

	(remember pageout passes always free/flush pages from the inactive
	queue, never the active queue!)

	* Run a pageout pass with a dirty page flushing limit of 32 plus
	  give dirty inactive pages a second go-around in the inactive
	  queue.

	* If the pass succeeds we are done.

	* If the pass cannot free up enough pages (i.e. the machine happens
	  to have a huge number of dirty pages sitting around, aka the Yahoo
	  scenario), then take a second pass immediately and do not have any
	  limit whatsoever on dirty page flushes in the second pass.

    *THIS* appears to work for both extremes.  It's what I'm going to be
    committing in the next few days to FreeBSD.  BTW, years ago John Dyson 
    theorized that disk writing could have this effect on read efficiency,
    which is why FBsd originally had a 32 page dirty flush limit per pass.
    Now it all makes sense, and I've got proof that it's still a problem
    with modern systems.

						    -Matt


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Interesting item came up while working on FreeBSD's pageout daemon
  2000-12-21 19:42   ` Rik van Riel
  2000-12-22  3:20     ` Matthew Dillon
@ 2000-12-28 23:04     ` Daniel Phillips
  2000-12-29  6:24       ` Matthew Dillon
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2000-12-28 23:04 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Matthew Dillon, linux-mm

On Thu, 21 Dec 2000, Rik van Riel wrote:
> On Thu, 21 Dec 2000, Daniel Phillips wrote:
> > Matthew Dillon wrote:
> > >     My conclusion from this is that I was wrong before when I thought that
> > >     clean and dirty pages should be treated the same, and I was also wrong
> > >     trying to give clean pages 'ultimate' priority over dirty pages, but I
> > >     think I may be right giving dirty pages two go-arounds in the queue
> > >     before flushing.  Limiting the number of dirty page flushes allowed per
> > >     pass also works but has unwanted side effects.
> > 
> > Hi, I'm a newcomer to the mm world, but it looks like fun, so I'm
> > jumping in. :-)
> > 
> > It looks like what you really want are separate lru lists for
> > clean and dirty.  That way you can tune the rate at which dirty
> > vs clean pages are moved from active to inactive.
> 
> Let me clear up one thing. The whole clean/dirty story
> Matthew wrote down only goes for the *inactive* pages,
> not for the active ones...

Thanks for clearing that up, but it doesn't change the observation -
it still looks like he's keeping dirty pages 'on probation' twice as
long as before.  Having each page take an extra lap the inactive_dirty
list isn't exactly equivalent to just scanning the list more slowly,
but it's darn close.  Is there a fundamental difference?

-- 
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Interesting item came up while working on FreeBSD's pageout daemon
  2000-12-28 23:04     ` Daniel Phillips
@ 2000-12-29  6:24       ` Matthew Dillon
  2000-12-29 14:19         ` Daniel Phillips
  2000-12-29 23:00         ` Daniel Phillips
  0 siblings, 2 replies; 10+ messages in thread
From: Matthew Dillon @ 2000-12-29  6:24 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Rik van Riel, linux-mm

:Thanks for clearing that up, but it doesn't change the observation -
:it still looks like he's keeping dirty pages 'on probation' twice as
:long as before.  Having each page take an extra lap the inactive_dirty
:list isn't exactly equivalent to just scanning the list more slowly,
:but it's darn close.  Is there a fundamental difference?
:
:-- 
:Daniel

    Well, scanning the list more slowly would still give dirty and clean
    pages the same effective priority relative to each other before being
    cleaned.  Giving the dirty pages an extra lap around the inactive
    queue gives clean pages a significantly higher priority over dirty
    pages in regards to choosing which page to launder next.
    So there is a big difference there.

    The effect of this (and, more importantly, limiting the number of dirty
    pages one is willing to launder in the first pageout pass) is rather
    significant due to the big difference in cost in dealing with clean
    pages verses dirty pages.

    'cleaning' a clean page means simply throwing it away, which costs maybe 
    a microsecond of cpu time and no I/O.  'cleaning' a dirty page requires
    flushing it to its backing store prior to throwing it away, which costs 
    a significant bit of cpu and at least one write I/O.  One write I/O
    may not seem like a lot, but if the disk is already loaded down and the
    write I/O has to seek we are talking at least 5 milliseconds of disk
    time eaten by the operation.  Multiply this by the number of dirty pages
    being flushed and it can cost a huge and very noticeable portion of
    your disk bandwidth, verses zip for throwing away a clean page.

    Due to the (relatively speaking) huge cost involved in laundering a dirty
    page, the extra cpu time we eat giving the dirty pages a longer life on
    the inactive queue in the hopes of avoiding the flush, or skipping them 
    entirely with a per-pass dirty page flushing limit, is well worth it.  

    This is a classic algorithmic tradeoff... spend a little extra cpu to
    choose the best pages to launder in order to save a whole lot of cpu
    (and disk I/O) later on.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Interesting item came up while working on FreeBSD's pageout daemon
  2000-12-29  6:24       ` Matthew Dillon
@ 2000-12-29 14:19         ` Daniel Phillips
  2000-12-29 19:58           ` James Antill
  2000-12-29 23:00         ` Daniel Phillips
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2000-12-29 14:19 UTC (permalink / raw)
  To: linux-mm

Matthew Dillon wrote:
> :Thanks for clearing that up, but it doesn't change the observation -
> :it still looks like he's keeping dirty pages 'on probation' twice as
> :long as before.  Having each page take an extra lap the inactive_dirty
> :list isn't exactly equivalent to just scanning the list more slowly,
> :but it's darn close.  Is there a fundamental difference?
> :
> :--
> :Daniel
> 
>     Well, scanning the list more slowly would still give dirty and clean
>     pages the same effective priority relative to each other before being
>     cleaned.  Giving the dirty pages an extra lap around the inactive
>     queue gives clean pages a significantly higher priority over dirty
>     pages in regards to choosing which page to launder next.
>     So there is a big difference there.

There's the second misunderstanding.  I assumed you had separate clean
vs dirty inactive lists.

>     The effect of this (and, more importantly, limiting the number of dirty
>     pages one is willing to launder in the first pageout pass) is rather
>     significant due to the big difference in cost in dealing with clean
>     pages verses dirty pages.
> 
>     'cleaning' a clean page means simply throwing it away, which costs maybe
>     a microsecond of cpu time and no I/O.  'cleaning' a dirty page requires
>     flushing it to its backing store prior to throwing it away, which costs
>     a significant bit of cpu and at least one write I/O.  One write I/O
>     may not seem like a lot, but if the disk is already loaded down and the
>     write I/O has to seek we are talking at least 5 milliseconds of disk
>     time eaten by the operation.  Multiply this by the number of dirty pages
>     being flushed and it can cost a huge and very noticeable portion of
>     your disk bandwidth, verses zip for throwing away a clean page.

To estimate the cost of paging io you have to think in terms of the
extra work you have to do because you don't have infinite memory.  In
other words, you would have had to write those dirty pages anyway - this
is an unavoidable cost.  You incur an avoidable cost when you reclaim a
page that will be needed again sooner than some other candidate.  If the
page was clean the cost is an extra read, if dirty it's a write plus a
read.  Alternatively, the dirty page might be written again soon - if
it's a partial page write the cost is an extra read and a write, if it's
a full page the cost is just a write.  So it costs at most twice as much
to guess wrong about a dirty vs clean page.  This difference is
significant, but it's not as big as the 1 usec vs 5 msec you suggesed.

If I'm right then making the dirty page go 3 times around the loop
should result in worse performance vs 2 times.

>     Due to the (relatively speaking) huge cost involved in laundering a dirty
>     page, the extra cpu time we eat giving the dirty pages a longer life on
>     the inactive queue in the hopes of avoiding the flush, or skipping them
>     entirely with a per-pass dirty page flushing limit, is well worth it.
> 
>     This is a classic algorithmic tradeoff... spend a little extra cpu to
>     choose the best pages to launder in order to save a whole lot of cpu
>     (and disk I/O) later on.
 
--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Interesting item came up while working on FreeBSD's pageout daemon
  2000-12-29 14:19         ` Daniel Phillips
@ 2000-12-29 19:58           ` James Antill
  2000-12-29 23:12             ` Daniel Phillips
  0 siblings, 1 reply; 10+ messages in thread
From: James Antill @ 2000-12-29 19:58 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-mm

> Matthew Dillon wrote:
> >     The effect of this (and, more importantly, limiting the number of dirty
> >     pages one is willing to launder in the first pageout pass) is rather
> >     significant due to the big difference in cost in dealing with clean
> >     pages verses dirty pages.
> > 
> >     'cleaning' a clean page means simply throwing it away, which costs maybe
> >     a microsecond of cpu time and no I/O.  'cleaning' a dirty page requires
> >     flushing it to its backing store prior to throwing it away, which costs
> >     a significant bit of cpu and at least one write I/O.  One write I/O
> >     may not seem like a lot, but if the disk is already loaded down and the
> >     write I/O has to seek we are talking at least 5 milliseconds of disk
> >     time eaten by the operation.  Multiply this by the number of dirty pages
> >     being flushed and it can cost a huge and very noticeable portion of
> >     your disk bandwidth, verses zip for throwing away a clean page.
> 
> To estimate the cost of paging io you have to think in terms of the
> extra work you have to do because you don't have infinite memory.  In
> other words, you would have had to write those dirty pages anyway - this
> is an unavoidable cost.  You incur an avoidable cost when you reclaim a
> page that will be needed again sooner than some other candidate.  If the
> page was clean the cost is an extra read, if dirty it's a write plus a
> read.  Alternatively, the dirty page might be written again soon - if
> it's a partial page write the cost is an extra read and a write, if it's
> a full page the cost is just a write.  So it costs at most twice as much
> to guess wrong about a dirty vs clean page.  This difference is
> significant, but it's not as big as the 1 usec vs 5 msec you suggesed.

 As I understand it you can't just add the costs of the reads and
writes as 1 each. So given...

 Clean = 1r
 Dirty = 1w + 1r

...it's assumed that a 1w is >= than a 1r, but what are the exact
values ?
 It probably gets even more complex as if the dirty page is touched
between the write and the cleanup then it'll avoid the re-read
behavior and will appear faster (although it slowed the system down a
little doing it's write).

> If I'm right then making the dirty page go 3 times around the loop
> should result in worse performance vs 2 times.

 It's quite possible, but if there were 2 lists and the dirty pages
were laundered at 33% the rate of the clean pages would that be better
than 50%?

-- 
# James Antill -- james@and.org
:0:
* ^From: .*james@and.org
/dev/null
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Interesting item came up while working on FreeBSD's pageout daemon
  2000-12-29  6:24       ` Matthew Dillon
  2000-12-29 14:19         ` Daniel Phillips
@ 2000-12-29 23:00         ` Daniel Phillips
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel Phillips @ 2000-12-29 23:00 UTC (permalink / raw)
  To: Matthew Dillon; +Cc: linux-mm

Matthew Dillon wrote:
> :Thanks for clearing that up, but it doesn't change the observation -
> :it still looks like he's keeping dirty pages 'on probation' twice as
> :long as before.  Having each page take an extra lap the inactive_dirty
> :list isn't exactly equivalent to just scanning the list more slowly,
> :but it's darn close.  Is there a fundamental difference?
> :
> :--
> :Daniel
> 
>     Well, scanning the list more slowly would still give dirty and clean
>     pages the same effective priority relative to each other before being
>     cleaned.  Giving the dirty pages an extra lap around the inactive
>     queue gives clean pages a significantly higher priority over dirty
>     pages in regards to choosing which page to launder next.
>     So there is a big difference there.

There's the second misunderstanding.  I assumed you had separate clean
vs dirty inactive lists.

>     The effect of this (and, more importantly, limiting the number of dirty
>     pages one is willing to launder in the first pageout pass) is rather
>     significant due to the big difference in cost in dealing with clean
>     pages verses dirty pages.
> 
>     'cleaning' a clean page means simply throwing it away, which costs maybe
>     a microsecond of cpu time and no I/O.  'cleaning' a dirty page requires
>     flushing it to its backing store prior to throwing it away, which costs
>     a significant bit of cpu and at least one write I/O.  One write I/O
>     may not seem like a lot, but if the disk is already loaded down and the
>     write I/O has to seek we are talking at least 5 milliseconds of disk
>     time eaten by the operation.  Multiply this by the number of dirty pages
>     being flushed and it can cost a huge and very noticeable portion of
>     your disk bandwidth, verses zip for throwing away a clean page.

To estimate the cost of paging io you have to think in terms of the
extra work you have to do because you don't have infinite memory.  In
other words, you would have had to write those dirty pages anyway - this
is an unavoidable cost.  You incur an avoidable cost when you reclaim a
page that will be needed again sooner than some other candidate.  If the
page was clean the cost is an extra read, if dirty it's a write plus a
read.  Alternatively, the dirty page might be written again soon - if
it's a partial page write the cost is an extra read and a write, if it's
a full page the cost is just a write.  So it costs at most twice as much
to guess wrong about a dirty vs clean page.  This difference is
significant, but it's not as big as the 1 usec vs 5 msec you suggesed.

If I'm right then making the dirty page go 3 times around the loop
should result in worse performance vs 2 times.

>     Due to the (relatively speaking) huge cost involved in laundering a dirty
>     page, the extra cpu time we eat giving the dirty pages a longer life on
>     the inactive queue in the hopes of avoiding the flush, or skipping them
>     entirely with a per-pass dirty page flushing limit, is well worth it.
> 
>     This is a classic algorithmic tradeoff... spend a little extra cpu to
>     choose the best pages to launder in order to save a whole lot of cpu
>     (and disk I/O) later on.
 
--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Interesting item came up while working on FreeBSD's pageout daemon
  2000-12-29 19:58           ` James Antill
@ 2000-12-29 23:12             ` Daniel Phillips
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel Phillips @ 2000-12-29 23:12 UTC (permalink / raw)
  To: James Antill, linux-mm

James Antill wrote:
> > To estimate the cost of paging io you have to think in terms of the
> > extra work you have to do because you don't have infinite memory.  In
> > other words, you would have had to write those dirty pages anyway - this
> > is an unavoidable cost.  You incur an avoidable cost when you reclaim a
> > page that will be needed again sooner than some other candidate.  If the
> > page was clean the cost is an extra read, if dirty it's a write plus a
> > read.  Alternatively, the dirty page might be written again soon - if
> > it's a partial page write the cost is an extra read and a write, if it's
> > a full page the cost is just a write.  So it costs at most twice as much
> > to guess wrong about a dirty vs clean page.  This difference is
> > significant, but it's not as big as the 1 usec vs 5 msec you suggesed.
> 
>  As I understand it you can't just add the costs of the reads and
> writes as 1 each. So given...
> 
>  Clean = 1r
>  Dirty = 1w + 1r
> 
> ...it's assumed that a 1w is >= than a 1r, but what are the exact
> values ?

By read and write I am talking about the necessary transfers to disk,
not the higher level file IO.  Transfers to and from disk are nearly
equal in cost.

>  It probably gets even more complex as if the dirty page is touched
> between the write and the cleanup then it'll avoid the re-read
> behavior and will appear faster (although it slowed the system down a
> little doing it's write).

Oh yes, it gets more complex.  I'm trying to nail down the main costs by
eliminating the constant factors.

> > If I'm right then making the dirty page go 3 times around the loop
> > should result in worse performance vs 2 times.
> 
>  It's quite possible, but if there were 2 lists and the dirty pages
> were laundered at 33% the rate of the clean pages would that be better
> than 50%?

Eh.  I don't know, I was hoping Matt would try it.

--
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2000-12-29 23:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-12-16 20:16 Interesting item came up while working on FreeBSD's pageout daemon Matthew Dillon
2000-12-21 16:47 ` Daniel Phillips
2000-12-21 19:42   ` Rik van Riel
2000-12-22  3:20     ` Matthew Dillon
2000-12-28 23:04     ` Daniel Phillips
2000-12-29  6:24       ` Matthew Dillon
2000-12-29 14:19         ` Daniel Phillips
2000-12-29 19:58           ` James Antill
2000-12-29 23:12             ` Daniel Phillips
2000-12-29 23:00         ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox