From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <3D76549B.3C53D0AC@zip.com.au> Date: Wed, 04 Sep 2002 11:44:43 -0700 From: Andrew Morton MIME-Version: 1.0 Subject: Re: nonblocking-vm.patch References: <3D75E054.B341E067@zip.com.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Rik van Riel Cc: "linux-mm@kvack.org" List-ID: Rik van Riel wrote: > > On Wed, 4 Sep 2002, Andrew Morton wrote: > > > - If the page is dirty, and mapped into pagetables then write the > > thing anyway (haven't tested this yet). This is to get around the > > problem of big dirty mmaps - everything stalls on request queues. > > Oh well. > > I don't think we need this. If the request queue is saturated, and > free memory is low, the request queue is guaranteed to be full of > writes, which will result in memory becoming freeable soon. > OK. But I've gone and removed just about all the VM throttling (with some glee, I might add). We do need something in there to prevent kswapd from going berzerk. I'm thinking something like this: - My code only addresses write(2) pagecache. Need to handle the (IMO rare) situation of large amounts of dirty MAP_SHARED data. We do this by always writing it out, and blocking on the request queue. And by waiting on PageWriteback pages. That's just the pre-me behaviour. Should be OK for a first pass. - Similarly, always write out dirty pagecache, so we throttle on the swapdev's request queue. Which I think just leaves us with the no-swap-available problem. In this case we really do need to slow page allocators down (I think. I haven't done _any_ swapless testing). I have a new function in the block layer `blk_congestion_wait()' which will make the caller take a nap until some request queue comes unblocked. That's probably appropriate. There's a corner case where there's writeout underway, but no queues are congested. In that case we can probably add a wakeup to end_page_writeback(), and kick it on every 32nd page or whatever. I'll play with that a bit. Now, wrt the magical 40% thing. I'm thinking that we can change it in this manner: maximum amount of dirty+writeback pagecache = min((total memory - mapped memory) / 2, 40% or memory) (Need some more accurate logic to calculate "total memory") This means that half of the pool of unmapped memory is available to heavy writers. So if the machine is busy with lots of mapped memory, and a burst of writes happens then they will initially be throttled back fairly hard. But if the write activity continues, `mapped memory' will shrink due to swapout and pageout, and the amount of memory which is available to the heavy writer will climb until it hits the (configurable) 40%. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/