* inactive_dirty list @ 2002-09-06 20:42 Andrew Morton 2002-09-06 21:03 ` Rik van Riel 2002-09-07 2:14 ` Andrew Morton 0 siblings, 2 replies; 18+ messages in thread From: Andrew Morton @ 2002-09-06 20:42 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-mm Rik, it seems that the time has come... I was doing some testing overnight with mem=1024m. Page reclaim was pretty inefficient at that level: kswapd consumed 6% of CPU on a permanent basis (workload was heavy dbench plus looping make -j6 bzImage). kswapd was reclaiming only 3% of the pages which it was looking at. This doesn't happen at mem=768m, and I'm sure it won't happen at mem=1.5G. What is happening here is that the logic which clamps dirty+writeback pagecache at 40% of memory is working nicely, and the allocate-from- highmem-first logic is ensuring that all of ZONE_HIGHMEM is dirty or under writeback all the time. kswapd isn't allowed to block against that pagecache, so it's scanning zillions of pages. This is a fundamental problem when the size of the highmem zone is approximately equal to 40% of total memory. We could fix it by changing the page allocator to balance its allocations across zones, but I don't think we want to do that. I think it's best to split the inactive list into reclaimable and unreclaimable. (inactive_clean/inactive_dirty). I'll code that tonight; please let me run some thoughts by you: - inactive_dirty holds pages which are dirty or under writeback. - end_page_writeback() will move the page onto inactive_clean. - everywhere where we add a page to the inactive list will now add it to either inactive_clean or inactive_dirty, based on its PageDirty || PageWriteback state. - the inactive target logic will remain the same. So zone->nr_inactive_pages will be the sum of the pages on zone->inactive_clean and zone->inactive_dirty. - swapcache pages don't go on inactive_dirty(!). They remain on inactive_clean, so if a page allocator or kswapd hits a swapcache page, they block on it (swapout throttling). A result of this is that we never need to scan inactive_dirty. Those pages will always be written out in balance_dirty_pages by the write(2) caller, or by pdflush. (Hence: we don't need inactive_dirty at all. We could just cut those pages off the LRU altogether. But let's not do that). - Hence: the only pages which are written out from within the VM are swapcache. - So the only real source of throttling for tasks which aren't running generic_file_write() is the call to blk_congestion_wait() in try_to_free_pages(). Which seems sane to me - this will wake up after 1/4 of a second, or after someone frees a write request against *any* queue. We know that the pages which were covered by that request were just placed onto inactive_clean, so off we go again. Should work (heh). - with this scheme, we don't actually need zone->nr_inactive_dirty_pages and zone->nr_inactive_clean_pages, but I may as well do that - it's easy enough. - MAP_SHARED pages will be on inactive_clean, but if we change the logic in there to give these pages a second round on the LRU then the apges will automatically be added to inactive_dirty on the way out of shrink_zone(). How does that all sound? btw, it is approximately the case that the pages will come clean in LRU order (oldest-first) because of the writeback logic. fs-writeback.c walks the inodes in oldest-dirtied to newest-dirtied order, and it walks the inode pages in oldest-dirtied to newest-dirtied order. But I think that end_page_writeback() should still move cleaned pages onto the far (hot) end of inactive_clean? I think all of this will not result in the zone balancing logic going into a tailspin. I'm just a bit worried about corner cases when the number of reclaimable pages in highmem is getting low - the classzone balancing code may keep on trying to refill that zone's free memory pools too much. We'll see... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 20:42 inactive_dirty list Andrew Morton @ 2002-09-06 21:03 ` Rik van Riel 2002-09-06 21:40 ` Andrew Morton 2002-09-07 2:14 ` Andrew Morton 1 sibling, 1 reply; 18+ messages in thread From: Rik van Riel @ 2002-09-06 21:03 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm On Fri, 6 Sep 2002, Andrew Morton wrote: > What is happening here is that the logic which clamps dirty+writeback > pagecache at 40% of memory is working nicely, and the allocate-from- > highmem-first logic is ensuring that all of ZONE_HIGHMEM is dirty > or under writeback all the time. Does this mean that your 1024MB machine can degrade into the situation where userspace has an effective 128MB memory available for its working set ? Or is balancing between the zones still happening ? > We could fix it by changing the page allocator to balance its > allocations across zones, but I don't think we want to do that. Some balancing is needed, otherwise you'll have 800 MB of old data sitting in ZONE_NORMAL and userspace getting its hot data evicted from ZONE_HIGHMEM all the time. OTOH, you don't want to overdo things of course ;) > I think it's best to split the inactive list into reclaimable > and unreclaimable. (inactive_clean/inactive_dirty). > > I'll code that tonight; please let me run some thoughts by you: Sounds like you're reinventing the whole 2.4.0 -> 2.4.7 -> 2.4.9-ac -> 2.4.13-rmap -> 2.4.19-rmap evolution ;) > - inactive_dirty holds pages which are dirty or under writeback. > - everywhere where we add a page to the inactive list will now > add it to either inactive_clean or inactive_dirty, based on > its PageDirty || PageWriteback state. If I had veto power I'd use it here ;) We did this in early 2.4 kernels and it was a disaster. The reason it was a disaster was that in many workloads we'd always have some clean pages and we'd end up always reclaiming those before even starting writeout on any of the dirty pages. It also meant we could have dirty (or formerly dirty) inactive pages eating up memory and never being recycled for more active data. What you need to do instead is: - inactive_dirty contains pages from which we're not sure whether they're dirty or clean - everywhere we add a page to the inactive list now, we add the page to the inactive_dirty list This means we'll have a fairer scan and eviction rate between clean and dirty pages. > - swapcache pages don't go on inactive_dirty(!). They remain on > inactive_clean, so if a page allocator or kswapd hits a swapcache > page, they block on it (swapout throttling). We can also get rid of this logic. There is no difference between swap pages and mmap'd file pages. If blk_congestion_wait() works we can get rid of this special magic and just use it. If it doesn't work, we need to fix blk_congestion_wait() since otherwise the VM would fall over under heavy mmap() usage. > - So the only real source of throttling for tasks which aren't > running generic_file_write() is the call to blk_congestion_wait() > in try_to_free_pages(). Which seems sane to me - this will wake > up after 1/4 of a second, or after someone frees a write request > against *any* queue. We know that the pages which were covered > by that request were just placed onto inactive_clean, so off > we go again. Should work (heh). With this scheme, we can restrict tasks to scanning only the inactive_clean list. Kswapd's scanning of the inactive_dirty list is always asynchronous so we don't need to worry about latency. No need to waste CPU by having other tasks also scan this very same list and submit IO. > - with this scheme, we don't actually need zone->nr_inactive_dirty_pages > and zone->nr_inactive_clean_pages, but I may as well do that - it's > easy enough. Agreed, good statistics are essential when you're trying to balance a VM. > How does that all sound? Most of the plan sounds good, but your dirty/clean split is a tried and tested recipy for disaster. ;) > order. But I think that end_page_writeback() should still move > cleaned pages onto the far (hot) end of inactive_clean? IMHO inactive_clean should just contain KNOWN FREEABLE pages, as an area beyond the inactive_dirty list. > I think all of this will not result in the zone balancing logic > going into a tailspin. I'm just a bit worried about corner cases > when the number of reclaimable pages in highmem is getting low - the > classzone balancing code may keep on trying to refill that zone's free > memory pools too much. We'll see... There's a simple trick we can use here. If we _known_ that all the inactive_clean pages can be immediately reclaimed, we can count those as free pages for balancing purposes. This should make life easier when one of the zones is under heavy writeback pressure. kind regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 21:03 ` Rik van Riel @ 2002-09-06 21:40 ` Andrew Morton 2002-09-06 21:49 ` Rik van Riel 0 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2002-09-06 21:40 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-mm Rik van Riel wrote: > hm. Did that digeo.com address bounce? grr. > On Fri, 6 Sep 2002, Andrew Morton wrote: > > > What is happening here is that the logic which clamps dirty+writeback > > pagecache at 40% of memory is working nicely, and the allocate-from- > > highmem-first logic is ensuring that all of ZONE_HIGHMEM is dirty > > or under writeback all the time. > > Does this mean that your 1024MB machine can degrade into the > situation where userspace has an effective 128MB memory available > for its working set ? > > Or is balancing between the zones still happening ? No, that's OK. This problem is a consequence of the per-zone LRU. Whether it is kswapd or a direct-reclaimer, he always looks at highmem first. But we allocate pages from highmem first, too. With the non-blocking stuff, we blow a lot of CPU scanning past pages. Prior to the nonblocking stuff, we would get stuck on request queues trying to refill ZONE_HIGHMEM, probably needlessly, because there's lots of reclaimable stuff in ZONE_NORMAL. Maybe. > > We could fix it by changing the page allocator to balance its > > allocations across zones, but I don't think we want to do that. > > Some balancing is needed, otherwise you'll have 800 MB of > old data sitting in ZONE_NORMAL and userspace getting its > hot data evicted from ZONE_HIGHMEM all the time. > > OTOH, you don't want to overdo things of course ;) Well everyone still takes a pass across all zones, bringing them up to pages_high. It's just that the ZONE_HIGHMEM pass is expensive, because that is where all the dirty pagecache happens to be. See, the zone balancing is out of whack wrt the page allocation: we balance the zones nicely in reclaim, and we deliberately *unbalance* them in the allocator. > ... > > We did this in early 2.4 kernels and it was a disaster. The > reason it was a disaster was that in many workloads we'd > always have some clean pages and we'd end up always reclaiming > those before even starting writeout on any of the dirty pages. OK. > It also meant we could have dirty (or formerly dirty) inactive > pages eating up memory and never being recycled for more active > data. The interrupt-time page motion should reduce that... > What you need to do instead is: > > - inactive_dirty contains pages from which we're not sure whether > they're dirty or clean > > - everywhere we add a page to the inactive list now, we add > the page to the inactive_dirty list > > This means we'll have a fairer scan and eviction rate between > clean and dirty pages. And how do they get onto inactive_clean? > > - swapcache pages don't go on inactive_dirty(!). They remain on > > inactive_clean, so if a page allocator or kswapd hits a swapcache > > page, they block on it (swapout throttling). > > We can also get rid of this logic. There is no difference between > swap pages and mmap'd file pages. If blk_congestion_wait() works > we can get rid of this special magic and just use it. If it doesn't > work, we need to fix blk_congestion_wait() since otherwise the VM > would fall over under heavy mmap() usage. That would probably work. We'd need to do the pte_dirty->PageDirty translation carefully. > > - So the only real source of throttling for tasks which aren't > > running generic_file_write() is the call to blk_congestion_wait() > > in try_to_free_pages(). Which seems sane to me - this will wake > > up after 1/4 of a second, or after someone frees a write request > > against *any* queue. We know that the pages which were covered > > by that request were just placed onto inactive_clean, so off > > we go again. Should work (heh). > > With this scheme, we can restrict tasks to scanning only the > inactive_clean list. > > Kswapd's scanning of the inactive_dirty list is always asynchronous > so we don't need to worry about latency. No need to waste CPU by > having other tasks also scan this very same list and submit IO. Why does kswapd need to scan that list? > > - with this scheme, we don't actually need zone->nr_inactive_dirty_pages > > and zone->nr_inactive_clean_pages, but I may as well do that - it's > > easy enough. > > Agreed, good statistics are essential when you're trying to > balance a VM. > > > How does that all sound? > > Most of the plan sounds good, but your dirty/clean split is a > tried and tested recipy for disaster. ;) That's good to know, thanks. > > order. But I think that end_page_writeback() should still move > > cleaned pages onto the far (hot) end of inactive_clean? > > IMHO inactive_clean should just contain KNOWN FREEABLE pages, > as an area beyond the inactive_dirty list. Confused. So where do anon pages go? > > I think all of this will not result in the zone balancing logic > > going into a tailspin. I'm just a bit worried about corner cases > > when the number of reclaimable pages in highmem is getting low - the > > classzone balancing code may keep on trying to refill that zone's free > > memory pools too much. We'll see... > > There's a simple trick we can use here. If we _known_ that all > the inactive_clean pages can be immediately reclaimed, we can > count those as free pages for balancing purposes. OK. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 21:40 ` Andrew Morton @ 2002-09-06 21:49 ` Rik van Riel 2002-09-06 21:58 ` Andrew Morton 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2002-09-06 21:49 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm On Fri, 6 Sep 2002, Andrew Morton wrote: > > It also meant we could have dirty (or formerly dirty) inactive > > pages eating up memory and never being recycled for more active > > data. > > The interrupt-time page motion should reduce that... Not if you won't scan the dirty list as long as there are "enough" clean pages. > > What you need to do instead is: > > > > - inactive_dirty contains pages from which we're not sure whether > > they're dirty or clean > > > > - everywhere we add a page to the inactive list now, we add > > the page to the inactive_dirty list > > > > This means we'll have a fairer scan and eviction rate between > > clean and dirty pages. > > And how do they get onto inactive_clean? Once IO completes they get moved onto the clean list. > > We can also get rid of this logic. There is no difference between > > swap pages and mmap'd file pages. If blk_congestion_wait() works > > we can get rid of this special magic and just use it. If it doesn't > > work, we need to fix blk_congestion_wait() since otherwise the VM > > would fall over under heavy mmap() usage. > > That would probably work. We'd need to do the pte_dirty->PageDirty > translation carefully. Indeed. We probably want to give such pages a second chance on the inactive_dirty list without starting the writeout, so we've unmapped and PageDirtied all its friends for one big writeout. > > With this scheme, we can restrict tasks to scanning only the > > inactive_clean list. > > > > Kswapd's scanning of the inactive_dirty list is always asynchronous > > so we don't need to worry about latency. No need to waste CPU by > > having other tasks also scan this very same list and submit IO. > > Why does kswapd need to scan that list? The list should preferably only be scanned by one thread. Scanning with multiple threads is a waste of CPU. It doesn't really matter which thread is scanning, but I think we want some preferably simple way to prevent all CPUs in the system from going wild over the nonfreeable lists. > > > order. But I think that end_page_writeback() should still move > > > cleaned pages onto the far (hot) end of inactive_clean? > > > > IMHO inactive_clean should just contain KNOWN FREEABLE pages, > > as an area beyond the inactive_dirty list. > > Confused. So where do anon pages go? All pages go onto the inactive_dirty list. When they reach the end of the list either we move them to the inactive_clean list, we submit IO or (in the case of a mapped page) we give them another go-around on the list in order to build up a cluster from the other still-mapped pages near it. regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 21:49 ` Rik van Riel @ 2002-09-06 21:58 ` Andrew Morton 2002-09-06 22:04 ` Rik van Riel 0 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2002-09-06 21:58 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-mm Rik van Riel wrote: > > On Fri, 6 Sep 2002, Andrew Morton wrote: > > > > It also meant we could have dirty (or formerly dirty) inactive > > > pages eating up memory and never being recycled for more active > > > data. > > > > The interrupt-time page motion should reduce that... > > Not if you won't scan the dirty list as long as there are "enough" > clean pages. Well, something needs to start writeback of dirty pages. That is either: - The VM kicked pdflush or - We're over dirty limit, so the write(2) callers do it or - The kupdate function syncs the data. So why do we need to perform writeback from the VM? Just for swapcache, which pdflush doesn't do. > > > What you need to do instead is: > > > > > > - inactive_dirty contains pages from which we're not sure whether > > > they're dirty or clean > > > > > > - everywhere we add a page to the inactive list now, we add > > > the page to the inactive_dirty list > > > > > > This means we'll have a fairer scan and eviction rate between > > > clean and dirty pages. > > > > And how do they get onto inactive_clean? > > Once IO completes they get moved onto the clean list. But if there are clean pages accidentally on inactive_dirty, we need to scan for them. If that list only contains dirty pagecache and pagecache which is under writeback, then there should be no need to scan it? Those pages will automatically come back onto inactive_clean via pdflush/balance_dirty_pages writeout. > > > We can also get rid of this logic. There is no difference between > > > swap pages and mmap'd file pages. If blk_congestion_wait() works > > > we can get rid of this special magic and just use it. If it doesn't > > > work, we need to fix blk_congestion_wait() since otherwise the VM > > > would fall over under heavy mmap() usage. > > > > That would probably work. We'd need to do the pte_dirty->PageDirty > > translation carefully. > > Indeed. We probably want to give such pages a second chance on > the inactive_dirty list without starting the writeout, so we've > unmapped and PageDirtied all its friends for one big writeout. > > > > With this scheme, we can restrict tasks to scanning only the > > > inactive_clean list. > > > > > > Kswapd's scanning of the inactive_dirty list is always asynchronous > > > so we don't need to worry about latency. No need to waste CPU by > > > having other tasks also scan this very same list and submit IO. > > > > Why does kswapd need to scan that list? > > The list should preferably only be scanned by one thread. > Scanning with multiple threads is a waste of CPU. > > It doesn't really matter which thread is scanning, but I > think we want some preferably simple way to prevent all > CPUs in the system from going wild over the nonfreeable > lists. Let me rephrase: why does *anybody* need to scan inactive_dirty? > > > > order. But I think that end_page_writeback() should still move > > > > cleaned pages onto the far (hot) end of inactive_clean? > > > > > > IMHO inactive_clean should just contain KNOWN FREEABLE pages, > > > as an area beyond the inactive_dirty list. > > > > Confused. So where do anon pages go? > > All pages go onto the inactive_dirty list. When they reach > the end of the list either we move them to the inactive_clean > list, we submit IO or (in the case of a mapped page) we give > them another go-around on the list in order to build up a > cluster from the other still-mapped pages near it. hum. I'm trying to find a model where the VM can just ignore dirty|writeback pagecache. We know how many pages are out there, sure. But we don't scan them. Possible? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 21:58 ` Andrew Morton @ 2002-09-06 22:04 ` Rik van Riel 2002-09-06 22:19 ` Andrew Morton 2002-09-06 22:22 ` Rik van Riel 0 siblings, 2 replies; 18+ messages in thread From: Rik van Riel @ 2002-09-06 22:04 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm On Fri, 6 Sep 2002, Andrew Morton wrote: > hum. I'm trying to find a model where the VM can just ignore > dirty|writeback pagecache. We know how many pages are out > there, sure. But we don't scan them. Possible? Owww duh, I see it now. So basically pages should _only_ go into the inactive_dirty list when they are under writeout. Note that leaving dirty pages on the list can result in a waste of memory. Imagine the dirty limit being 40% and 30% of memory being dirty but not written out at the moment ... regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 22:04 ` Rik van Riel @ 2002-09-06 22:19 ` Andrew Morton 2002-09-06 22:23 ` Rik van Riel 2002-09-06 22:22 ` Rik van Riel 1 sibling, 1 reply; 18+ messages in thread From: Andrew Morton @ 2002-09-06 22:19 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-mm Rik van Riel wrote: > > On Fri, 6 Sep 2002, Andrew Morton wrote: > > > hum. I'm trying to find a model where the VM can just ignore > > dirty|writeback pagecache. We know how many pages are out > > there, sure. But we don't scan them. Possible? > > Owww duh, I see it now. > > So basically pages should _only_ go into the inactive_dirty list > when they are under writeout. Or if they're just dirty. The thing I'm trying to achieve is to minimise the amount of scanning of unreclaimable pages. So park them elsewhere, and don't scan them. We know how many pages are there, so we can make decisions based on that. But let IO completion bring them back onto the inactive_reclaimable(?) list. > Note that leaving dirty pages on the list can result in a waste > of memory. Imagine the dirty limit being 40% and 30% of memory > being dirty but not written out at the moment ... Right. So the VM needs to kick pdflush at the right time to get that happening. The nonblocking pdflush is very effective - I think it can keep a ton of queues saturated with just a single process. swapcache is a wart, because pdflush doesn't write swapcache. It certainly _could_, but you had reasons why that was the wrong thing to do? And something needs to be done with clean but unreclaimable pages. These will be on inactive_clean - I guess we just continue to activate these. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 22:19 ` Andrew Morton @ 2002-09-06 22:23 ` Rik van Riel 2002-09-06 22:48 ` Andrew Morton 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2002-09-06 22:23 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm On Fri, 6 Sep 2002, Andrew Morton wrote: > > So basically pages should _only_ go into the inactive_dirty list > > when they are under writeout. > > Or if they're just dirty. The thing I'm trying to achieve > is to minimise the amount of scanning of unreclaimable pages. > > So park them elsewhere, and don't scan them. We know how many > pages are there, so we can make decisions based on that. But let > IO completion bring them back onto the inactive_reclaimable(?) > list. I guess this means the dirty limit should be near 1% for the VM. Every time there is a noticable amount of dirty pages, kick pdflush and have it write out a few of them, maybe the number of pages needed to reach zone->pages_high ? regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 22:23 ` Rik van Riel @ 2002-09-06 22:48 ` Andrew Morton 2002-09-06 23:03 ` Rik van Riel 0 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2002-09-06 22:48 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-mm Rik van Riel wrote: > > On Fri, 6 Sep 2002, Andrew Morton wrote: > > > > So basically pages should _only_ go into the inactive_dirty list > > > when they are under writeout. > > > > Or if they're just dirty. The thing I'm trying to achieve > > is to minimise the amount of scanning of unreclaimable pages. > > > > So park them elsewhere, and don't scan them. We know how many > > pages are there, so we can make decisions based on that. But let > > IO completion bring them back onto the inactive_reclaimable(?) > > list. > > I guess this means the dirty limit should be near 1% for the > VM. What is the thinking behind that? > Every time there is a noticable amount of dirty pages, kick > pdflush and have it write out a few of them, maybe the number > of pages needed to reach zone->pages_high ? Well we can certainly do that - the current wakeup_bdflush() is pretty crude: void wakeup_bdflush(void) { struct page_state ps; get_page_state(&ps); pdflush_operation(background_writeout, ps.nr_dirty); } We can pass background_writeout 42 pages if necessary. That's not aware of zones, of course. It will just write back the oldest 42 pages from the oldest dirty inode against the last-mounted superblock. I still have not got my head around: > We did this in early 2.4 kernels and it was a disaster. The > reason it was a disaster was that in many workloads we'd > always have some clean pages and we'd end up always reclaiming > those before even starting writeout on any of the dirty pages. Does this imply that we need to block on writeout *instead* of reclaiming clean pagecache? We could do something like: if (zone->nr_inactive_dirty > zone->nr_inactive_clean) { wakeup_bdflush(); /* Hope this writes the correct zone */ yield(); } which would get the IO underway promptly. But the caller would still go in and gobble remaining clean pagecache. The thing which happened (basically by accident) from my Wednesday hackery was a partitioning of the machine. 40% of memory is available to pagecache writeout, and that's clamped (ignoring MAP_SHARED for now..). And everyone else just walks around it. So a 1G box running dbench 1000 acts like a 600M box. Which is not a bad model, perhaps. If we can twiddle that 40% up and down based on <mumble> criteria... But that separaton of the 40% of unusable memory from the 60% of usable memory is done by scanning at present, and it costs a bit of CPU. Not much, but a bit. (btw, is there any reason at all for having page reserves in ZONE_HIGHMEM? I have a suspicion that this is just wasted memory...) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 22:48 ` Andrew Morton @ 2002-09-06 23:03 ` Rik van Riel 2002-09-06 23:34 ` Andrew Morton 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2002-09-06 23:03 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm On Fri, 6 Sep 2002, Andrew Morton wrote: > Rik van Riel wrote: > > On Fri, 6 Sep 2002, Andrew Morton wrote: > > > > I guess this means the dirty limit should be near 1% for the > > VM. > > What is the thinking behind that? Dirty pages could sit on the list practically forever if there are enough clean pages. This means we can have a significant amount of memory "parked" on the dirty list, without it ever getting reclaimed, even if we could use the memory for something better. > I still have not got my head around: > > > We did this in early 2.4 kernels and it was a disaster. The > > reason it was a disaster was that in many workloads we'd > > always have some clean pages and we'd end up always reclaiming > > those before even starting writeout on any of the dirty pages. > > Does this imply that we need to block on writeout *instead* > of reclaiming clean pagecache? No, it means that whenever we reclaim clean pagecache pages, we should also start the writeout of some dirty pages. > We could do something like: > > if (zone->nr_inactive_dirty > zone->nr_inactive_clean) { > wakeup_bdflush(); /* Hope this writes the correct zone */ > yield(); > } > > which would get the IO underway promptly. But the caller would > still go in and gobble remaining clean pagecache. This is nice, but it would still be possible to have oodles of pages "parked" on the dirty list, which we definately need to prevent. > So a 1G box running dbench 1000 acts like a 600M box. Which > is not a bad model, perhaps. If we can twiddle that 40% > up and down based on <mumble> criteria... Writing out dirty pages whenever we reclaim free pages could fix that problem. > But that separaton of the 40% of unusable memory from the > 60% of usable memory is done by scanning at present, and > it costs a bit of CPU. Not much, but a bit. There are other reasons we're wasting CPU in scanning: 1) the scanning isn't really rate limited yet (or is it?) 2) every thread in the system can fall into the scanning function, so if we have 50 page allocators they'll all happily scan the list, even though the first of these threads already found there wasn't anything freeable > (btw, is there any reason at all for having page reserves > in ZONE_HIGHMEM? I have a suspicion that this is just wasted > memory...) Dunno, but I guess it is to prevent a 4GB box from acting like a 900MB box under corner conditions ;) regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 23:03 ` Rik van Riel @ 2002-09-06 23:34 ` Andrew Morton 2002-09-07 0:00 ` Rik van Riel 2002-09-08 21:21 ` Daniel Phillips 0 siblings, 2 replies; 18+ messages in thread From: Andrew Morton @ 2002-09-06 23:34 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-mm Rik van Riel wrote: > > On Fri, 6 Sep 2002, Andrew Morton wrote: > > Rik van Riel wrote: > > > On Fri, 6 Sep 2002, Andrew Morton wrote: > > > > > > I guess this means the dirty limit should be near 1% for the > > > VM. > > > > What is the thinking behind that? > > Dirty pages could sit on the list practically forever > if there are enough clean pages. This means we can have > a significant amount of memory "parked" on the dirty > list, without it ever getting reclaimed, even if we > could use the memory for something better. yes. We could have up to 10% (default value of dirty_background_ratio) of physical memory just sitting there for up to 30 seconds (default value of dirty_expire_centisecs) (And that 10% may well go back to 30% or 40% - starting writeback earlier will hurt some things such as copying 100M of files on a 256M machine). You're proposing that we get that IO underway sooner if there is page reclaim pressure, and that one way to do that is to write one page for every reclaimed one. Guess that makes sense as much as anything else ;) > > I still have not got my head around: > > > > > We did this in early 2.4 kernels and it was a disaster. The > > > reason it was a disaster was that in many workloads we'd > > > always have some clean pages and we'd end up always reclaiming > > > those before even starting writeout on any of the dirty pages. > > > > Does this imply that we need to block on writeout *instead* > > of reclaiming clean pagecache? > > No, it means that whenever we reclaim clean pagecache pages, > we should also start the writeout of some dirty pages. > > > We could do something like: > > > > if (zone->nr_inactive_dirty > zone->nr_inactive_clean) { > > wakeup_bdflush(); /* Hope this writes the correct zone */ > > yield(); > > } > > > > which would get the IO underway promptly. But the caller would > > still go in and gobble remaining clean pagecache. > > This is nice, but it would still be possible to have oodles > of pages "parked" on the dirty list, which we definately > need to prevent. > > > So a 1G box running dbench 1000 acts like a 600M box. Which > > is not a bad model, perhaps. If we can twiddle that 40% > > up and down based on <mumble> criteria... > > Writing out dirty pages whenever we reclaim free pages could > fix that problem. OK, I'll give that a whizz. > > But that separaton of the 40% of unusable memory from the > > 60% of usable memory is done by scanning at present, and > > it costs a bit of CPU. Not much, but a bit. > > There are other reasons we're wasting CPU in scanning: > 1) the scanning isn't really rate limited yet (or is it?) Not sure what you mean by this? My current code wastes CPU in the situation where the zone is choked with dirty pagecache. It works happily with mem=768M, because only 40% of the pages in the zone are dirty - worst case, we get a 60% reclaim success rate. So I'm looking for ways to fix that. The proposal is to move those known-to-be-unreclaimable pages elsewhere. Another possibility might be to say "gee, all dirty. Try the next zone". > 2) every thread in the system can fall into the scanning > function, so if we have 50 page allocators they'll all > happily scan the list, even though the first of these > threads already found there wasn't anything freeable hm. Well if we push dirty pages onto a different list, and pinned pages onto the active list then a zone with no freeable memory should have a short list to scan. more hm. It's possible that, because of the per-zone-lru, we end up putting way too much swap pressure onto anon pages in highmem. For the 1G boxes. This is an interaction between per-zone LRU and the page allocator's highmem-first policy. Have you seen this in 2.4-rmap? It would happen there, I suspect. > > (btw, is there any reason at all for having page reserves > > in ZONE_HIGHMEM? I have a suspicion that this is just wasted > > memory...) > > Dunno, but I guess it is to prevent a 4GB box from acting > like a 900MB box under corner conditions ;) But we have a meg or so of emergency reserve in ZONE_HIGHMEM which can only be used by a __GFP_HIGH|__GFP_HIGHMEM allocator and some more memory reserved for PF_MEMALLOC|__GFP_HIGHMEM. I don't think anybody actually does that. Bounce buffers can sometimes do __GFP_HIGHMEM|__GFP_HIGH I think. Strikes me that we could just give that memory back. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 23:34 ` Andrew Morton @ 2002-09-07 0:00 ` Rik van Riel 2002-09-07 0:29 ` Andrew Morton 2002-09-08 21:21 ` Daniel Phillips 1 sibling, 1 reply; 18+ messages in thread From: Rik van Riel @ 2002-09-07 0:00 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm On Fri, 6 Sep 2002, Andrew Morton wrote: > My current code wastes CPU in the situation where the > zone is choked with dirty pagecache. It works happily > with mem=768M, because only 40% of the pages in the zone > are dirty - worst case, we get a 60% reclaim success rate. Which still doesn't deal with the situation where the dirty pages are primarily anonymous or MAP_SHARED pages, which don't fall under your dirty page accounting. > So I'm looking for ways to fix that. The proposal is to > move those known-to-be-unreclaimable pages elsewhere. Basically, when scanning the zone we'll see "hmmm, all pages were dirty and I scheduled a whole bunch for writeout" and we _know_ it doesn't make sense for other threads to also scan this zone over and over again, at least not until a significant amount of IO has completed. > Another possibility might be to say "gee, all dirty. Try > the next zone". Note that this also means we shouldn't submit ALL dirty pages we run into for IO. If we submit a GB worth of dirty pages from ZONE_HIGHMEM for IO, it'll take _ages_ before the IO for ZONE_NORMAL completes. Worse, if we're keeping the IO queues busy with ZONE_HIGHMEM pages we could create starvation of the other zones. Another effect is that a GB of writes is sure to slow down any subsequent reads, even if 100 MB of RAM has already been freed... Because of this I want to make sure we only submit a sane amount of pages for IO at once, maybe <pulls number out of hat> max(zone->pages_high, 4 * (zone->pages_high - zone->free_pages) ? > more hm. It's possible that, because of the per-zone-lru, > we end up putting way too much swap pressure onto anon pages > in highmem. For the 1G boxes. This is an interaction between > per-zone LRU and the page allocator's highmem-first policy. > > Have you seen this in 2.4-rmap? It would happen there, I suspect. Shouldn't happen in 2.4-rmap, I've been careful to avoid any kind of worst-case scenarios like that by having a number of different watermarks. Basically kswapd won't free pages from a zone which isn't in severe trouble if we don't have a global memory shortage, so we will have allocated memory from each zone already before freeing the next batch of highmem pages. > I don't think anybody actually does that. Bounce buffers > can sometimes do __GFP_HIGHMEM|__GFP_HIGH I think. > > Strikes me that we could just give that memory back. You're right, duh. cheers, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-07 0:00 ` Rik van Riel @ 2002-09-07 0:29 ` Andrew Morton 0 siblings, 0 replies; 18+ messages in thread From: Andrew Morton @ 2002-09-07 0:29 UTC (permalink / raw) To: Rik van Riel; +Cc: Andrew Morton, linux-mm Rik van Riel wrote: > > On Fri, 6 Sep 2002, Andrew Morton wrote: > > > My current code wastes CPU in the situation where the > > zone is choked with dirty pagecache. It works happily > > with mem=768M, because only 40% of the pages in the zone > > are dirty - worst case, we get a 60% reclaim success rate. > > Which still doesn't deal with the situation where the > dirty pages are primarily anonymous or MAP_SHARED > pages, which don't fall under your dirty page accounting. That's right - we're writing those things out as soon as we scan them at present. If we move them over to the dirty page list when their dirtiness is discovered then the normal writeback stuff would kick in. But it's laggy, of course. > > So I'm looking for ways to fix that. The proposal is to > > move those known-to-be-unreclaimable pages elsewhere. > > Basically, when scanning the zone we'll see "hmmm, all pages > were dirty and I scheduled a whole bunch for writeout" and > we _know_ it doesn't make sense for other threads to also > scan this zone over and over again, at least not until a > significant amount of IO has completed. Yup. But with this proposal it's "hmm, the inactive_clean list has zero pages, and the inactive_dirty list has 100,000 pages". The VM knows exactly what is going on, without any scanning. The appropriate action would be to kick pdflush, advance to the next zone, and if that fails, take a nap. > > Another possibility might be to say "gee, all dirty. Try > > the next zone". > > Note that this also means we shouldn't submit ALL dirty > pages we run into for IO. If we submit a GB worth of dirty > pages from ZONE_HIGHMEM for IO, it'll take _ages_ before > the IO for ZONE_NORMAL completes. The mapping->dirty_pages-based writeback doesn't know about zones... Which is good in a way, because we can schedule IO in filesystem-friendly patterns. > Worse, if we're keeping the IO queues busy with ZONE_HIGHMEM > pages we could create starvation of the other zones. Right. So for a really big high:low ratio, that could be a problem. For these systems, in practice, we know where the cleanable ZONE_NORMAL pagecache lives: blockdev_superblock->inodes->mapping->dirty_pages. So we could easily schedule IO specifically targetted at the normal zone if needed. But it will be slow whatever we do, because dirty blockdev pagecache is splattered all over the platter. > Another effect is that a GB of writes is sure to slow down > any subsequent reads, even if 100 MB of RAM has already been > freed... > > Because of this I want to make sure we only submit a sane > amount of pages for IO at once, maybe <pulls number out of hat> > max(zone->pages_high, 4 * (zone->pages_high - zone->free_pages) ? And what, may I ask, was wrong with 42? ;) Point taken on the IO starvation thing. But you know my opinion of the read-vs-write policy in the IO scheduler... > > more hm. It's possible that, because of the per-zone-lru, > > we end up putting way too much swap pressure onto anon pages > > in highmem. For the 1G boxes. This is an interaction between > > per-zone LRU and the page allocator's highmem-first policy. > > > > Have you seen this in 2.4-rmap? It would happen there, I suspect. > > Shouldn't happen in 2.4-rmap, I've been careful to avoid any > kind of worst-case scenarios like that by having a number of > different watermarks. > > Basically kswapd won't free pages from a zone which isn't in > severe trouble if we don't have a global memory shortage, so > we will have allocated memory from each zone already before > freeing the next batch of highmem pages. I'm not sure that works... If the machine has 800M normal and 200M highmem and is cruising along with 190M of dirty pagecache (steady state, via balance_dirty_state) then surely the poor little 10M of anon pages which are in the highmem zone will be swapped out quite quickly? Probably it doesn't matter much - chances are they'll get swapped back into ZONE_NORMAL and then live a happy life. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 23:34 ` Andrew Morton 2002-09-07 0:00 ` Rik van Riel @ 2002-09-08 21:21 ` Daniel Phillips 1 sibling, 0 replies; 18+ messages in thread From: Daniel Phillips @ 2002-09-08 21:21 UTC (permalink / raw) To: Andrew Morton, Rik van Riel; +Cc: linux-mm On Saturday 07 September 2002 01:34, Andrew Morton wrote: > You're proposing that we get that IO underway sooner if there > is page reclaim pressure, and that one way to do that is to > write one page for every reclaimed one. Guess that makes > sense as much as anything else ;) Not really. The correct formula will incorporate the allocation rate and the inactive dirty/clean balance. The reclaim rate is not relevant, it is a time-delayed consequence of the above. Relying on it in a control loop is simply asking for oscillation. -- Daniel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 22:04 ` Rik van Riel 2002-09-06 22:19 ` Andrew Morton @ 2002-09-06 22:22 ` Rik van Riel 1 sibling, 0 replies; 18+ messages in thread From: Rik van Riel @ 2002-09-06 22:22 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm On Fri, 6 Sep 2002, Rik van Riel wrote: > On Fri, 6 Sep 2002, Andrew Morton wrote: > > > hum. I'm trying to find a model where the VM can just ignore > > dirty|writeback pagecache. We know how many pages are out > > there, sure. But we don't scan them. Possible? > > Owww duh, I see it now. > > So basically pages should _only_ go into the inactive_dirty list > when they are under writeout. As an aside, we might want to limit the amount of in-flight data to a sane limit and just go to sleep for a bit if the VM has far too much data in flight already. If we need 2 MB of extra free memory, it doesn't make sense to monopolise the whole IO subsystem by writing out 100 MB at once ;) regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-06 20:42 inactive_dirty list Andrew Morton 2002-09-06 21:03 ` Rik van Riel @ 2002-09-07 2:14 ` Andrew Morton 2002-09-07 2:10 ` Rik van Riel 1 sibling, 1 reply; 18+ messages in thread From: Andrew Morton @ 2002-09-07 2:14 UTC (permalink / raw) To: Rik van Riel, linux-mm Andrew Morton wrote: > > Rik, it seems that the time has come... > > I was doing some testing overnight with mem=1024m. Page reclaim > was pretty inefficient at that level: kswapd consumed 6% of CPU > on a permanent basis (workload was heavy dbench plus looping > make -j6 bzImage). kswapd was reclaiming only 3% of the pages > which it was looking at. I have a silly feeling that setting DEF_PRIORITY to "12" will simply fix this. Duh. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-07 2:14 ` Andrew Morton @ 2002-09-07 2:10 ` Rik van Riel 2002-09-07 5:28 ` Andrew Morton 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2002-09-07 2:10 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm On Fri, 6 Sep 2002, Andrew Morton wrote: > I have a silly feeling that setting DEF_PRIORITY to "12" will > simply fix this. > > Duh. Ideally we'd get rid of DEF_PRIORITY alltogether and would just scan each zone once. Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: inactive_dirty list 2002-09-07 2:10 ` Rik van Riel @ 2002-09-07 5:28 ` Andrew Morton 0 siblings, 0 replies; 18+ messages in thread From: Andrew Morton @ 2002-09-07 5:28 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-mm Rik van Riel wrote: > > On Fri, 6 Sep 2002, Andrew Morton wrote: > > > I have a silly feeling that setting DEF_PRIORITY to "12" will > > simply fix this. > > > > Duh. > > Ideally we'd get rid of DEF_PRIORITY alltogether and would > just scan each zone once. > What I'm doing now is: #define DEF_PRIORITY 12 /* puke */ for (priority = DEF_PRIORITY; priority; priority--) { int total_scanned = 0; shrink_caches(priority, &total_scanned); if (that didn't work) { wakeup_bdflush(total_scanned); blk_congestion_wait(WRITE, HZ/4); } } and in shrink_caches(): max_scan = zone->nr_inactive >> priority; if (max_scan < nr_pages * 2) max_scan = nr_pages * 2; nr_pages = shrink_zone(zone, max_scan, gfp_mask, nr_pages); So in effect, for a 32-page reclaim attempt we'll scan 64 pages of ZONE_HIGHMEM, then 128 pages of ZONE_NORMAL/DMA. If that doesn't yield 32 pages we ask pdflush to write 3*64 pages. Then take a nap. Then do it again: scan 64 pages of ZONE_HIGHMEM, then 128 of ZONE_NORMAL/DMA, then write back 192 pages then nap. Then do it again: scan 128 pages of ZONE_HIGHMEM, then 256 of ZONE_NORMAL/DMA, then write back 384 pages then nap. etc. Plus there are the actual pages which we started IO against during the LRU scan - there can be up to 32 of those. BTW, it turns out that the main reason why kswapd was going silly was that the VM is *not* treating the `priority' as a logarithmic thing at all: int max_scan = nr_inactive_pages / priority; so the claims about scanning 1/64th of the list are crap. That thing scans 1/6th of the queue on the first pass. In the mem=1G case, that's 30,000 damn pages. Maybe someone should take a look at Marcelo's kernel? There are a few warts: pdflush_operation will fail if all pdflush threads are out doing something (pretty unlikely with the nonblocking stuff. Might happen if writeback has to run get_block()). But we'll be writing back stuff anyway. I changed blk_congestion_wait a bit too. The first version would return immediately if no queues were congested ( > 75% full). Now, it will sleep even if no queues are congested. It will return as soon as someone puts back a write request. If someone is silly enough to call blk_congestion_wait() when there are no write requests in flight at all, they get to take the full 1/4 second sleep. The mem=1G corner case is fixed, and page reclaim just doesn't figure: c012c034 288 0.317709 do_wp_page c0144ae0 316 0.348597 __block_commit_write c012c910 342 0.377279 do_anonymous_page c0143efc 353 0.389414 __find_get_block c012f7e0 356 0.392724 find_lock_page c012f9f0 356 0.392724 do_generic_file_read c01832bc 367 0.404858 ext2_free_branches c0136e70 371 0.409271 __free_pages_ok c010e7b4 386 0.425818 timer_interrupt c01e3cfc 414 0.456707 radix_tree_lookup c0141894 434 0.47877 vfs_write c012f580 474 0.522896 unlock_page c0134348 500 0.551578 kmem_cache_alloc c01347d0 531 0.585776 kmem_cache_free c013712c 574 0.633212 rmqueue c0141320 605 0.667409 generic_file_llseek c0156924 616 0.679544 count_list c0142c04 617 0.680647 fget c01091e0 793 0.874803 system_call c0155914 860 0.948714 __d_lookup c0144674 1076 1.187 __block_prepare_write c014c63c 1184 1.30614 link_path_walk c012fcd4 10932 12.0597 file_read_actor c0130674 16443 18.1392 generic_file_write_nolock c0107048 31293 34.5211 poll_idle The balancing of the zones looks OK from a first glance and of course the change in system behaviour under heavy writeout loads is profound. Let's do the MAP_SHARED-pages-get-a-second-round thing, and it'd be good if we could come up with some algorithm for setting the current dirty pagecache clamping level rather than relying on the dopey /proc/sys/vm/dirty_async_ratio magic number. I'm thinking that dirty_async_ratio becomes a maximum ratio, and that we dynamically lower it when large amounts of dirty pagecache would be embarrassing. Or maybe there's just no need for this. Dunno. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2002-09-08 21:21 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-09-06 20:42 inactive_dirty list Andrew Morton 2002-09-06 21:03 ` Rik van Riel 2002-09-06 21:40 ` Andrew Morton 2002-09-06 21:49 ` Rik van Riel 2002-09-06 21:58 ` Andrew Morton 2002-09-06 22:04 ` Rik van Riel 2002-09-06 22:19 ` Andrew Morton 2002-09-06 22:23 ` Rik van Riel 2002-09-06 22:48 ` Andrew Morton 2002-09-06 23:03 ` Rik van Riel 2002-09-06 23:34 ` Andrew Morton 2002-09-07 0:00 ` Rik van Riel 2002-09-07 0:29 ` Andrew Morton 2002-09-08 21:21 ` Daniel Phillips 2002-09-06 22:22 ` Rik van Riel 2002-09-07 2:14 ` Andrew Morton 2002-09-07 2:10 ` Rik van Riel 2002-09-07 5:28 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox