From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 21 Dec 2000 19:20:53 -0800 (PST) From: Matthew Dillon Message-Id: <200012220320.eBM3Kr605128@apollo.backplane.com> Subject: Re: Interesting item came up while working on FreeBSD's pageout daemon References: Sender: owner-linux-mm@kvack.org Return-Path: To: Rik van Riel Cc: Daniel Phillips , linux-mm@kvack.org List-ID: Right. I am going to add another addendum... let me give a little background first. I've been testing the FBsd VM system with two extremes... on one extreme is Yahoo which tends to wind up running servers which collect a huge number of dirty pages that need to be flushed, but have lots of disk bandwidth available to flush them. The other extreme is a heavily loaded newsreader box which operate under extreme memory pressure but has mostly clean pages. Heavy load in this case means 400-600 newsreader processes on a 512MB box eating around 8MB/sec in new memory, but which has mostly clean pages. My original solution for Yahoo was to treat clean and dirty pages at the head of the inactive queue the same... that is, flush dirty pages as they were encountered in the inactive queue and free clean pages, with no limit on dirty page flushes. This worked great for yahoo, but failed utterly with the poor news machines. News machines that were running at a load of 1-2 were suddenly running at lods of 50-150. i.e. they began to thrash and get really sludgy. It took me a few days to figure out what was going on, because the stats from the news machines showed the pageout daemon having no problems... it was finding around 10,000 clean pages and 200-400 dirty pages per pass, and flushing the 200-400 dirty pages. That's a 25:1 clean:dirty ratio. Well, it turns out that the flushing of 200-400 dirty pages per pageout pass was responsible for the load blowups. The machines had already been running at 100% disk load, you may recall. Adding the additional write load, even at 25:1, slowed the drives down enough that suddenly many of the newsreader processes were blocking on disk I/O. Hence the load shot through the roof. I tried to 'fix' the problem by saying "well, ok, so we won't flush dirty pages immediately, we will give them another runaround in the inactive queue before we flush them". This worked for medium loads and I thought I was done, so I wrote my first summary message to Rik and Linus describing the problem and solution. -- But the story continues. It turns out that that has NOT fixed the problem. The number of dirty pages being flushed went down, but not enough. Newsreader machine loads still ran in the 50-100 range. At this point we really are talking about truely idle-but-dirty pages. No matter, the machines were still blowing up. So, to make a long story even longer, after further experiments I determined that it was the write-load itself blowin up the machines. Never mind what they were writing ... the simple *act* of writing anything made the HD's much less efficient then under a read-only load. Even limiting the number of pages flushed to a reasonable sounding number like 64 didn't solve the problem... the load still hovered around 20. The patch I currently have under test which solves the problem is a combination of what I had in 4.2-release, which limited the dirty page flushing to 32 pages per pass, and what I have in 4.2-stable which has no limit. The new patch basically does this: (remember pageout passes always free/flush pages from the inactive queue, never the active queue!) * Run a pageout pass with a dirty page flushing limit of 32 plus give dirty inactive pages a second go-around in the inactive queue. * If the pass succeeds we are done. * If the pass cannot free up enough pages (i.e. the machine happens to have a huge number of dirty pages sitting around, aka the Yahoo scenario), then take a second pass immediately and do not have any limit whatsoever on dirty page flushes in the second pass. *THIS* appears to work for both extremes. It's what I'm going to be committing in the next few days to FreeBSD. BTW, years ago John Dyson theorized that disk writing could have this effect on read efficiency, which is why FBsd originally had a 32 page dirty flush limit per pass. Now it all makes sense, and I've got proof that it's still a problem with modern systems. -Matt -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/