From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <3D7930D6.F658E5B9@zip.com.au> Date: Fri, 06 Sep 2002 15:48:54 -0700 From: Andrew Morton MIME-Version: 1.0 Subject: Re: inactive_dirty list References: <3D7929F7.7B19C9C@zip.com.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Rik van Riel Cc: "linux-mm@kvack.org" List-ID: Rik van Riel wrote: > > On Fri, 6 Sep 2002, Andrew Morton wrote: > > > > So basically pages should _only_ go into the inactive_dirty list > > > when they are under writeout. > > > > Or if they're just dirty. The thing I'm trying to achieve > > is to minimise the amount of scanning of unreclaimable pages. > > > > So park them elsewhere, and don't scan them. We know how many > > pages are there, so we can make decisions based on that. But let > > IO completion bring them back onto the inactive_reclaimable(?) > > list. > > I guess this means the dirty limit should be near 1% for the > VM. What is the thinking behind that? > Every time there is a noticable amount of dirty pages, kick > pdflush and have it write out a few of them, maybe the number > of pages needed to reach zone->pages_high ? Well we can certainly do that - the current wakeup_bdflush() is pretty crude: void wakeup_bdflush(void) { struct page_state ps; get_page_state(&ps); pdflush_operation(background_writeout, ps.nr_dirty); } We can pass background_writeout 42 pages if necessary. That's not aware of zones, of course. It will just write back the oldest 42 pages from the oldest dirty inode against the last-mounted superblock. I still have not got my head around: > We did this in early 2.4 kernels and it was a disaster. The > reason it was a disaster was that in many workloads we'd > always have some clean pages and we'd end up always reclaiming > those before even starting writeout on any of the dirty pages. Does this imply that we need to block on writeout *instead* of reclaiming clean pagecache? We could do something like: if (zone->nr_inactive_dirty > zone->nr_inactive_clean) { wakeup_bdflush(); /* Hope this writes the correct zone */ yield(); } which would get the IO underway promptly. But the caller would still go in and gobble remaining clean pagecache. The thing which happened (basically by accident) from my Wednesday hackery was a partitioning of the machine. 40% of memory is available to pagecache writeout, and that's clamped (ignoring MAP_SHARED for now..). And everyone else just walks around it. So a 1G box running dbench 1000 acts like a 600M box. Which is not a bad model, perhaps. If we can twiddle that 40% up and down based on criteria... But that separaton of the 40% of unusable memory from the 60% of usable memory is done by scanning at present, and it costs a bit of CPU. Not much, but a bit. (btw, is there any reason at all for having page reserves in ZONE_HIGHMEM? I have a suspicion that this is just wasted memory...) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/