* Re: 2.1.130 mem usage. [not found] <199812021749.RAA04575@dax.scot.redhat.com> @ 1998-12-11 0:38 ` Andrea Arcangeli 1998-12-11 14:05 ` Stephen C. Tweedie 0 siblings, 1 reply; 4+ messages in thread From: Andrea Arcangeli @ 1998-12-11 0:38 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: linux-kernel, linux-mm, Rik van Riel, Linus Torvalds On Wed, 2 Dec 1998, Stephen C. Tweedie wrote: >>> + /* >>> + * If the page we looked at was recyclable but we didn't >>> + * reclaim it (presumably due to PG_referenced), don't >>> + * count it as scanned. This way, the more referenced >>> + * page cache pages we encounter, the more rapidly we >>> + * will age them. >>> + */ >>> + if (atomic_read(&page->count) != 1 || >>> + (!page->inode && !page->buffers)) >>> count_min--; > >> I don' t think count_min should count the number of tries on pages we have >> no chance to free. It should be the opposite according to me. > >No, the objective is not to swap unnecessarily, but still to start >swapping if there is too much pressure on the cache. My idea is that your patch works well due subtle reason. The effect of the patch is that we try on a few freeable pages so we remove only a few refernce bits and so we don' t throw away aging (just the opposite you wrote in the comment :). The reason it works is that there are many more not freeable pages than orphaned not-used ones. shrink_mmap 30628, 0 shrink_mmap 30705, 0 shrink_mmap 30705, 0 shrink_mmap 30705, 0 shrink_mmap 30644, 0 shrink_mmap 30705, 0 shrink_mmap 30705, 0 shrink_mmap 30705, 0 shrink_mmap 30705, 0 shrink_mmap 30705, 0 shrink_mmap 30705, 0 The two numbers are the count_max and count_min a bit before returning from shrink_mmap() with your patch applyed (with the mm subsystem stressed a lot from my leak proggy). Basically your patch cause shrink_mmap() to play only on a very very little portion of memory every time. This give you a way to reference the page again and to reset the referenced flag on it again and avoing the kernel to really drop the page and having to do IO to pagein again then... So basically it' s the same of setting count_min to 100 200 (instead of 10000/20000) pages and decrease count_min when we don' t decrease it with your patch. That' s the only reason that you can switch from two virtual desktop without IO. The old shrink_mmap was used to throw out also our minimal cached working set. With the patch applyed instead we fail very more easily in shrink_mmap() and our working set is preserved (cool!). Basically without the patch with all older kernels do_try_to_free_pages exit from state ==0 (because shrink_mmap failed) only when we are then just forced to do IO to regain pages from disk. There are still two mm cycles: top: swapout == cache++ == state 1 swapout == cache++ == state 1 swapout == cache++ == state 1 swapout == cache++ == state 1 swapout == cache++ == state 1 swapout == cache++ == state 1 swapout == cache++ == state 1 last time I checked swapout was not able to fail but since we are \ over pg_borrow, state is now been set to 0 by me shrink_mmap() == cache-- == state 0 shrink_mmap() == cache-- == state 0 shrink_mmap() == cache-- == state 0 shrink_mmap() == cache-- == state 0 shrink_mmap() == cache-- == state 0 shrink_mmap() == cache-- == state 0 here with the old shrink_mmap pressure we was used to lose our working\ set and so everything was bad... with your patch the working set\ is preserved because you have the time to reference the pages shrink_mmap() failed so state == 1 goto top but as you can see at the end of the mmap cycle with your patch the cached working set is preserved. I think the natural way to do that is to decrease the pressure but decreasing very fast count_min has the same effect. Pratically we can also drop count_max since it never happens (at least here) that we stop because it' s 0. I am very tired :( so now my mind refuse to think if it would be better to set count_min to something like (limit >> 2) >> (priority >> 1) and reverse the check. For the s/free_page_and_swap_cache/free_page/ I agree with it completly. I only want to be sure that other mm parts are well balanced with the change. I guess that joining the filemap patch + the s/free.../free../ patch, we cause do_try_to_free_pages to switch more easly from one state to the next and the system is probably more balanced than 2.1.130 that way. It would also be nice to not have two separate mm cycles (one that grow the cache until borrow percentage and the other one that shrink and that reach very near the limit of the working set). We should have always the same level of cache in the system if the mm stress is constant. This could be easily done by a state++ inside do_try_to_free_pages() after some (how many??) susccesfully returns. We should also take care of not decrease i (priority) if we switched due a balancing factor (and not because we failed). I' ll try that in my next very little spare time... Comments? (Today I am really very tired so my mind can fail right now..) Andrea Arcangeli -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.1.130 mem usage. 1998-12-11 0:38 ` 2.1.130 mem usage Andrea Arcangeli @ 1998-12-11 14:05 ` Stephen C. Tweedie 1998-12-11 18:08 ` Andrea Arcangeli 0 siblings, 1 reply; 4+ messages in thread From: Stephen C. Tweedie @ 1998-12-11 14:05 UTC (permalink / raw) To: Andrea Arcangeli Cc: Stephen C. Tweedie, linux-kernel, linux-mm, Rik van Riel, Linus Torvalds Hi, On Fri, 11 Dec 1998 01:38:47 +0100 (CET), Andrea Arcangeli <andrea@e-mind.com> said: >>>> + if (atomic_read(&page->count) != 1 || >>>> + (!page->inode && !page->buffers)) >>>> count_min--; > My idea is that your patch works well due subtle reason. The effect of the > patch is that we try on a few freeable pages so we remove only a few > refernce bits and so we don' t throw away aging (just the opposite you > wrote in the comment :). The reason it works is that there are many more > not freeable pages than orphaned not-used ones. > So basically it' s the same of setting count_min to 100 200 (instead of > 10000/20000) pages and decrease count_min when we don' t decrease it with > your patch. No, no, not at all. The whole point is that this patch does indeed behave as you describe if the cache is small or moderately sized, but if you have something like a "cat /usr/bin/* > /dev/null" going on, the large fraction of cached but referenced pages will cause the new code to become more aggressive in its scanning (because the pages which contribute to the loop exit condition become more dilute). This is exactly what you want for self-balancing behaviour. > For the s/free_page_and_swap_cache/free_page/ I agree with it completly. I > only want to be sure that other mm parts are well balanced with the > change. Please try 2.1.131-ac8, then, as it not only includes the patches we're talking about here, but it also adds Rik's swap readahead stuff extended to do aligned block readahead for both swap and normal mmap paging. > It would also be nice to not have two separate mm cycles (one that > grow the cache until borrow percentage and the other one that shrink > and that reach very near the limit of the working set). We should > have always the same level of cache in the system if the mm stress > is constant. This could be easily done by a state++ inside > do_try_to_free_pages() after some (how many??) susccesfully returns. I'm seeing a pretty stable cache behaviour here, on everything from 4MB to 64MB systems. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.1.130 mem usage. 1998-12-11 14:05 ` Stephen C. Tweedie @ 1998-12-11 18:08 ` Andrea Arcangeli 1998-12-12 15:14 ` Andrea Arcangeli 0 siblings, 1 reply; 4+ messages in thread From: Andrea Arcangeli @ 1998-12-11 18:08 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: linux-kernel, linux-mm, Rik van Riel, Linus Torvalds On Fri, 11 Dec 1998, Stephen C. Tweedie wrote: >the large fraction of cached but referenced pages will cause the new >code to become more aggressive in its scanning (because the pages >which contribute to the loop exit condition become more dilute). This >is exactly what you want for self-balancing behaviour. Yes is that what I want. With the past email I only wanted to pointed out that if I remeber well you published the patch as fix to the excessive swapout (look the report of people that was pointing out the swpd field of `vmstat 1`). Your patch instead will cause still more swapout, note I am not talking about I/O. This is the reason I didn' t agreed with your patch at first because I thought you would get the opposite effect (and I couldn' t understand why it could improve things). The reason is that your patch will cause less IO (cool) since the cache working set will be preserved fine. I agree with the patch as far I agree with decreasing the pressure on shrink_mmap(). Also your comment is not exaustive since you say that the new check will cause the cache to be aged faster while instead it reduces _radically_ the pressure of shrink_mmap() and so the cache will be aged slower than with the previous code. The improvement is not because we age faster but because we age slower and we don' t throw away the cache of our working set (and so reducing very a not needed sloww IO). As always correct me if I am wrong or I am misunderstanding something. >> For the s/free_page_and_swap_cache/free_page/ I agree with it completly. I >> only want to be sure that other mm parts are well balanced with the >> change. > >Please try 2.1.131-ac8, then, as it not only includes the patches I am just running with the ac6 mm (except for kswapd but that will make no difference for what we are discussing here since do_try_to_free_pages() is the same). ac6 seems good to me (for the reason above) and now it make sense to me (too ;). >we're talking about here, but it also adds Rik's swap readahead stuff >extended to do aligned block readahead for both swap and normal mmap >paging. Downloading ac8 from here is a pain (I was used to get patches from linux-kernel-patches). A guy sent me by email ac7 but since I want sync with ac8 I' ll wait a bit for ac8... >> It would also be nice to not have two separate mm cycles (one that >> grow the cache until borrow percentage and the other one that shrink >> and that reach very near the limit of the working set). We should >> have always the same level of cache in the system if the mm stress >> is constant. This could be easily done by a state++ inside >> do_try_to_free_pages() after some (how many??) susccesfully returns. > >I'm seeing a pretty stable cache behaviour here, on everything from >4MB to 64MB systems. It works fine but it' s not stable at all. The cache here goes from 40Mbyte to 10Mbyte in cycle (the only local changes I have here are on kswapd implementation; do_try_to_free_pages() and all other function that do_try_to_free_pages() uses are untouched). The good thing is that now when the cache reaches the low bound the working set is preserved (this is achieved by decreasing (not increasing as it seem to me reading the comment some days ago) the pressure of shrink_mmap()). Now I' ll try to remove my state = 0 to see what will happens... My state = 0 is the reason of the mm cycle I am seeing here, but is also the reason for which the mm subsystem doesn' t swapout too much. I' ll experiment now... -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.1.130 mem usage. 1998-12-11 18:08 ` Andrea Arcangeli @ 1998-12-12 15:14 ` Andrea Arcangeli 0 siblings, 0 replies; 4+ messages in thread From: Andrea Arcangeli @ 1998-12-12 15:14 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: linux-kernel, linux-mm, Rik van Riel, Linus Torvalds On Fri, 11 Dec 1998, Andrea Arcangeli wrote: >>> It would also be nice to not have two separate mm cycles (one that >>> grow the cache until borrow percentage and the other one that shrink >>> and that reach very near the limit of the working set). We should >>> have always the same level of cache in the system if the mm stress >>> is constant. This could be easily done by a state++ inside >>> do_try_to_free_pages() after some (how many??) susccesfully returns. >> >>I'm seeing a pretty stable cache behaviour here, on everything from >>4MB to 64MB systems. > >It works fine but it' s not stable at all. The cache here goes from This patch should rebalance the swapping/mmap-shrinking (and seems to works here, even if really my kswapd start when the buf/cache are over max and stop when they are under borrow, I don' t remeber without look at the code what the stock kswapd is doing): Index: vmscan.c =================================================================== RCS file: /var/cvs/linux/mm/vmscan.c,v retrieving revision 1.1.1.1.2.16 diff -u -r1.1.1.1.2.16 vmscan.c --- vmscan.c 1998/12/12 12:31:57 1.1.1.1.2.16 +++ linux/mm/vmscan.c 1998/12/12 14:27:55 @@ -439,7 +439,8 @@ kmem_cache_reap(gfp_mask); if (buffer_over_borrow() || pgcache_over_borrow()) - state = 0; + if (shrink_mmap(i, gfp_mask)) + return 1; if (atomic_read(&nr_async_pages) > pager_daemon.swap_cluster / 2) shrink_mmap(i, gfp_mask); The patch basically avoids the clobbering of state so the mm remains always in state = `swapout' but the cache remains close to the borrow percentage. I should have do that from time 0 instead of using state = 0... Andrea Arcangeli -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~1998-12-12 15:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <199812021749.RAA04575@dax.scot.redhat.com>
1998-12-11 0:38 ` 2.1.130 mem usage Andrea Arcangeli
1998-12-11 14:05 ` Stephen C. Tweedie
1998-12-11 18:08 ` Andrea Arcangeli
1998-12-12 15:14 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox