* [DATAPOINT] pre7-6 will not swap @ 2000-05-05 8:07 Benjamin Redelings I 0 siblings, 0 replies; 18+ messages in thread From: Benjamin Redelings I @ 2000-05-05 8:07 UTC (permalink / raw) To: linux-mm Hi, I just compiled pre7-6. It seems more useable than pre7-5. However, it basically does not swap. The first time there is any memory pressure, it swaps 32 pages (128k), and it never swaps again. In similar circumstances, pre7-4 has gotten up to 30Mb swapped. There are many unused daemons running in my 64Mb RAM. I also reverted to count = nr_threads / (priority +1) though I didn't check carefully what this did. Anyway, it doesn't seem to make a difference. </datapoint> -BenRI UP PPro, IDE, 64MB RAM -- "I want to be in the light, as He is in the Light, I want to shine like the stars in the heavens." - DC Talk, "In the Light" Benjamin Redelings I <>< http://www.bol.ucla.edu/~bredelin/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <8evk0f$7jote$1@fido.engr.sgi.com>]
* Re: [DATAPOINT] pre7-6 will not swap [not found] <8evk0f$7jote$1@fido.engr.sgi.com> @ 2000-05-06 17:12 ` Rajagopal Ananthanarayanan 2000-05-06 4:25 ` Benjamin Redelings I 2000-05-06 19:35 ` Linus Torvalds 0 siblings, 2 replies; 18+ messages in thread From: Rajagopal Ananthanarayanan @ 2000-05-06 17:12 UTC (permalink / raw) To: Benjamin Redelings I, torvalds; +Cc: linux-mm Benjamin Redelings I wrote: > > Hi, > I just compiled pre7-6. It seems more useable than pre7-5. However, > it basically does not swap. The first time there is any memory > pressure, it swaps 32 pages (128k), and it never swaps again. > In similar circumstances, pre7-4 has gotten up to 30Mb swapped. There > are many unused daemons running in my 64Mb RAM. > > I also reverted to > count = nr_threads / (priority +1) > though I didn't check carefully what this did. Anyway, it doesn't > seem to make a difference. > Yes, your observation is a good summarization of 7-6 behaviour. I'm also not seeing good results. The writes from dbench start failing; i guess the grab_page_cache in generic_file_write is returning ENOMEM. Again, as you say, the system doesn't want to swap after an intial flurry of activity. Linus has taken in the fix to "old" vs. "young" in shrink_mmap, and taken out the aggressive counter change (also in shrink_mmap). But apparently another change in try_to_swap_out is causing problems. I haven't an analytical evaluation, but empericically, if I remove this in try_to_swap_out (mm/vmscan.c), dbench runs ok. --------------- mm/vmscan.c around line 113 -------------- /* * Don't do any of the expensive stuff if * we're not really interested in this zone. */ if (!page->zone->zone_wake_kswapd) goto out_unlock; ---------------------------------------------------------- Benjamin, can you comment this line out and see if it improves things? Linus, one thing crossed my mind. With the above change swap_out() will "count" as having tried this process, although the zone may never need balancing. Aren't the initial system threads at the beginning of the task_list? If so, do you think their zones may never balancing? ... and hence swap_out in essence gives up early? -------------------------------------------------------------------------- Rajagopal Ananthanarayanan ("ananth") Member Technical Staff, SGI. -------------------------------------------------------------------------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-06 17:12 ` Rajagopal Ananthanarayanan @ 2000-05-06 4:25 ` Benjamin Redelings I 2000-05-06 19:35 ` Linus Torvalds 1 sibling, 0 replies; 18+ messages in thread From: Benjamin Redelings I @ 2000-05-06 4:25 UTC (permalink / raw) To: Rajagopal Ananthanarayanan; +Cc: torvalds, linux-mm > --------------- mm/vmscan.c around line 113 -------------- > /* > * Don't do any of the expensive stuff if > * we're not really interested in this zone. > */ > if (!page->zone->zone_wake_kswapd) > goto out_unlock; > ---------------------------------------------------------- > > Benjamin, can you comment this line out and see if it improves things? OK, reverted this. I also reverted to "count = nr_threads / (priority + 1)", I hope that doesn't cause a problem. With the above patch reverted, the system swaps amazingly well, as opposed to almost never. It swaps out tasks in the correct order. It is also a bit more aggressive than pre7-4, swapping out unused daemons even when there is lots of cache that presumably could be freed (e.g. BEFORE I run netscape). But this seems to be the right decision, given that that stuff isn't swapped back in later. After running lots of processes, I can also say that this kernel does not have a permanent cache size of 30Mb/64Mb. It actually decreases eventually instead of swapping out foreground programs like before. Does this mean that the zone_wake_kswapd essentially has the wrong value, so that we don't even balance the zone for which we were called? -benRI UP PPro, 64MB RAM, IDE -- "I want to be in the light, as He is in the Light, I want to shine like the stars in the heavens." - DC Talk, "In the Light" Benjamin Redelings I <>< http://www.bol.ucla.edu/~bredelin/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-06 17:12 ` Rajagopal Ananthanarayanan 2000-05-06 4:25 ` Benjamin Redelings I @ 2000-05-06 19:35 ` Linus Torvalds 2000-05-06 5:35 ` Benjamin Redelings I 2000-05-09 1:52 ` Quintela Carreira Juan J. 1 sibling, 2 replies; 18+ messages in thread From: Linus Torvalds @ 2000-05-06 19:35 UTC (permalink / raw) To: Rajagopal Ananthanarayanan; +Cc: Benjamin Redelings I, linux-mm On Sat, 6 May 2000, Rajagopal Ananthanarayanan wrote: > > Linus has taken in the fix to "old" vs. "young" in shrink_mmap, > and taken out the aggressive counter change (also in shrink_mmap). > But apparently another change in try_to_swap_out is causing problems. > I haven't an analytical evaluation, but empericically, if I remove this > in try_to_swap_out (mm/vmscan.c), dbench runs ok. Yes. I was thinking some more about it, and it isusing the wrong test. It must use the same test as the one in page_alloc.cto determine whether azone is "interesting" or not - otherwise you get into a situation where page_alloc.c doesn't want to allocate from a zone because it's not quite empty enough, but at the same time vmscan doesn't want to free pages from the zone because it's not quite full enough. No wonder that if you get to that situation, the allocator starts getting unhappy and says "no free pages". > --------------- mm/vmscan.c around line 113 -------------- > /* > * Don't do any of the expensive stuff if > * we're not really interested in this zone. > */ > if (!page->zone->zone_wake_kswapd) > goto out_unlock; Make this test be the same as in "__alloc_pages()" in mm/page_alloc.c, and it should be ok. The test there is: /* Are we supposed to free memory? Don't make it worse.. */ if (!z->zone_wake_kswapd && z->free_pages > z->pages_low) { and I suspect that we mightactually make the vmscan.c test more eager to swap stuff out: my private source tree says /* * Don't do any of the expensive stuff if * we're not really interested in this zone. */ if (z->free_pages > z->pages_high) goto out_unlock; in vmscan.c, and that seems to be quite well-behaved too (but if somebody has the energy to test the two different versions, I'd absolutely love to hear results..) Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-06 19:35 ` Linus Torvalds @ 2000-05-06 5:35 ` Benjamin Redelings I 2000-05-06 21:46 ` Rik van Riel 2000-05-09 1:52 ` Quintela Carreira Juan J. 1 sibling, 1 reply; 18+ messages in thread From: Benjamin Redelings I @ 2000-05-06 5:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: Rajagopal Ananthanarayanan, linux-mm > and I suspect that we mightactually make the vmscan.c test more eager to > swap stuff out: my private source tree says > > /* > * Don't do any of the expensive stuff if > * we're not really interested in this zone. > */ > if (z->free_pages > z->pages_high) > goto out_unlock; > > in vmscan.c, and that seems to be quite well-behaved too (but if somebody > has the energy to test the two different versions, I'd absolutely love to > hear results..) Although I would have thought that putting this test in would have no effect on performance, it actually kills performance. Since the test appears very reasonable, I think this means we have a bug elsewhere, and that removing this reasonable test cures a symptom, but not the bug. OK, details. With Linus's test, the kernel does not want to swap much. It is a little better than the pervious version of the test, but much lower than if the test was removed. One result is that the cache shrinks to low sizes like 14Mb/64Mb, when there are several unused daemons that could be swapped out. Also, the WRONG PROCESSES are swapped out. Several large daemons that were swapped out w/o the test, are now left in core. Instead, RUNNING programs are swapped out, like netscape. Even worse, running xquake and 'tar -xf linux.tar' makes the system non-responsive - the VM continues paging the quake ENGINE in and out and in and out :P It looks like some processes (my unused daemons) are scanned only once, and then get stuck at the end of some list? Is that a possible explanation? <guessing> Perhaps Rik's moving list-head idea is needed? </guessing>. carry on, -benRI -- "I want to be in the light, as He is in the Light, I want to shine like the stars in the heavens." - DC Talk, "In the Light" Benjamin Redelings I <>< http://www.bol.ucla.edu/~bredelin/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-06 5:35 ` Benjamin Redelings I @ 2000-05-06 21:46 ` Rik van Riel 2000-05-06 22:24 ` Rajagopal Ananthanarayanan 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2000-05-06 21:46 UTC (permalink / raw) To: Benjamin Redelings I; +Cc: Linus Torvalds, Rajagopal Ananthanarayanan, linux-mm On Fri, 5 May 2000, Benjamin Redelings I wrote: > It looks like some processes (my unused daemons) are > scanned only once, and then get stuck at the end of some list? > Is that a possible explanation? <guessing> Perhaps Rik's moving > list-head idea is needed? </guessing>. I'm busy implementing Davem's active/inactive list proposal to replace the current page/swapcache. I don't know if it'll work really well though, so research into other directions is very much welcome ;) Rik -- The Internet is not a network of computers. It is a network of people. That is its real strength. Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies http://www.conectiva.com/ http://www.surriel.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-06 21:46 ` Rik van Riel @ 2000-05-06 22:24 ` Rajagopal Ananthanarayanan 2000-05-06 14:03 ` Benjamin Redelings I ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Rajagopal Ananthanarayanan @ 2000-05-06 22:24 UTC (permalink / raw) To: riel; +Cc: Benjamin Redelings I, Linus Torvalds, linux-mm Rik van Riel wrote: > > On Fri, 5 May 2000, Benjamin Redelings I wrote: > > > It looks like some processes (my unused daemons) are > > scanned only once, and then get stuck at the end of some list? > > Is that a possible explanation? <guessing> Perhaps Rik's moving > > list-head idea is needed? </guessing>. > > I'm busy implementing Davem's active/inactive list proposal > to replace the current page/swapcache. I don't know if it'll > work really well though, so research into other directions > is very much welcome ;) > Again my experience, with skipping pages whose zones have (free_pages > pages_high) in try_to_swap_out, is similar to Benajamin's ... the system behaves better than 7-4, but isn't as good as without any zone skipping. Once again, I'm back to asking, should we be swapping at all? Shouldn't shrink_mmap() be finding pages to throw out? I have a hunch. Follow this argument closely. In shrink_mmap we have: ------------ if (p_zone->free_pages > p_zone->pages_high) goto dispose_continue; ------ This page doesn't count against a valid try in shrink_mmap(). Soon, we run out of pages to look at, but "count" in shrink_mmap is still high. So, we go back to scanning the lru list all over again. If some pages' reference count was flipped in the first loop, good. If it wasn't, and all that remained was unreferenced pages whose zones have reached the high water mark, then they won't be victimized, because the same test above will skip the page again! Still on the second loop, shrink_mmap will look at other pages, for instance because an I/O is in flight, and _those_ pages do tally against "count" ... so, in essense, we have skipped unreferenced pages belonging to zones with high water mark, for ever. This is wrong. My solution is simple. Have a variable, "second_scan" initialized to zero, at the top of shrink_mmap(). Set "second_scan = 1" at the bottom of the loop in shrink_mmap: --------------- /* wrong zone? not looped too often? roll again... */ if (page->zone != zone && count) { second_scan = 1; goto again; } ------------- Now the pages_high test will be changed to: ----------- if (p_zone->free_pages > p_zone->pages_high && !second_scan) goto dispose_continue; ----------- That is, victimize pages in zones with lots of free_pages if having scanned once we didn't find anything. If you are worried about unreferenced pages not being looked at in the second_scan, we can change it to a third_scan. Now, the final argument: since this page was skipped by shrink_mmap(), the test in try_to_swap_out that Benjamin, I and Linus have been playing around becomes important. Without it, pages in zones with lots of free memory neither get "shrunk" nor get swapped. What do you guys think? -------------------------------------------------------------------------- Rajagopal Ananthanarayanan ("ananth") Member Technical Staff, SGI. -------------------------------------------------------------------------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-06 22:24 ` Rajagopal Ananthanarayanan @ 2000-05-06 14:03 ` Benjamin Redelings I 2000-05-07 0:22 ` Rik van Riel 2000-05-07 2:23 ` Linus Torvalds 2 siblings, 0 replies; 18+ messages in thread From: Benjamin Redelings I @ 2000-05-06 14:03 UTC (permalink / raw) To: Rajagopal Ananthanarayanan; +Cc: riel, Linus Torvalds, linux-mm > Once again, I'm back to asking, should we be swapping at all? > Shouldn't shrink_mmap() be finding pages to throw out? > Thats a good question. However, it also misses part of the point. The reason for the bad performance is not mainly that there is too little swapout. The WRONG PAGES are swapped out! The system spends most of its I/O bandwith doing page-in's. Remember, on my system, the VM swapped out the quake ENGINE, which was running 100% of the time, in order to keep unused daemons blocking on select in core. That is just wrong. Right? -benRI -- "I want to be in the light, as He is in the Light, I want to shine like the stars in the heavens." - DC Talk, "In the Light" Benjamin Redelings I <>< http://www.bol.ucla.edu/~bredelin/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-06 22:24 ` Rajagopal Ananthanarayanan 2000-05-06 14:03 ` Benjamin Redelings I @ 2000-05-07 0:22 ` Rik van Riel 2000-05-07 2:23 ` Linus Torvalds 2 siblings, 0 replies; 18+ messages in thread From: Rik van Riel @ 2000-05-07 0:22 UTC (permalink / raw) To: Rajagopal Ananthanarayanan; +Cc: Benjamin Redelings I, Linus Torvalds, linux-mm On Sat, 6 May 2000, Rajagopal Ananthanarayanan wrote: > What do you guys think? I think you may want to take a look at page_alloc.c::__alloc_pages(), where the kernel balances between different zones... - kswapd is woken up when zone->free_pages < zone->pages_low - kswapd goes to sleep when it has freed enough pages in the current zone - if another zone has a lower memory load, we'll free some "extra" pages in that other zone, up to zone->pages_high This should provide enough balancing between zones... regards, Rik -- The Internet is not a network of computers. It is a network of people. That is its real strength. Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies http://www.conectiva.com/ http://www.surriel.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-06 22:24 ` Rajagopal Ananthanarayanan 2000-05-06 14:03 ` Benjamin Redelings I 2000-05-07 0:22 ` Rik van Riel @ 2000-05-07 2:23 ` Linus Torvalds 2000-05-07 17:40 ` Rik van Riel 2 siblings, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2000-05-07 2:23 UTC (permalink / raw) To: Rajagopal Ananthanarayanan; +Cc: riel, Benjamin Redelings I, linux-mm On Sat, 6 May 2000, Rajagopal Ananthanarayanan wrote: > > I have a hunch. Follow this argument closely. In shrink_mmap we have: > > ------------ > if (p_zone->free_pages > p_zone->pages_high) > goto dispose_continue; > ------ > > This page doesn't count against a valid try in shrink_mmap(). [ second-scan logic ] Ugh. This may be right, but it also gets my hackles up for being "too contrieved". It shouldn't be this complex. Either "shrink_mmap()" should care about the zone or it shouldn't. If it should, then it should just check the particular zone that it was passed in (ie basically per-zone LRU again). If it shouldn't, then it probably should just take the LRU as-is. Also, one thing that keeps me wondering is whether the current "try_to_free_pages()" is right at all. Remember: the fundamental operation isn't really "try_to_free_pages()" Nobody really ever calls that directly. The fundamental operation we want to have is really just "balance_zones()", and it may be that the by isolating the "zone" we're aiming for early in balance_zones() we've done a mistake. My personal inclination is along the lines of - we never really care about any particular zone. We should make sure that all zones get balanced, and that is what running kswapd will eventually cause. - things like "shrink_mmap" and "vmscan" should both free any page from any zone that is (a) a good candidateand (b) the zone is not yet well-balanced. - looking at "shrink_mmap()", my reaction would not be to add more complexity to it, but to remove the _one_ special case that looks at one specific zone: /* wrong zone? not looped too often? roll again... */ if (page->zone != zone && count) goto again; I would suggest just removing that test altogether. The page wasn't from a "wrong zone". It was just a different zone that also needed balancing. That single test stands out as being zone-specific instead of geared towards the bigger goal of "let's balance the zones". It would also cause "shrink_mmap()" to =return= failure, even if shrink_mmap() actually ended up doing real work. Which just seems wrong. So instead of making that test more complicated and adding a "phase" counter, why not just remove it? Then "shrink_mmap()"will start failing onlywhen it _truly_ fails - ie when it no longer can find any pages really worth freeing. Linus "gut instinct" Torvalds -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-07 2:23 ` Linus Torvalds @ 2000-05-07 17:40 ` Rik van Riel 2000-05-07 17:53 ` Linus Torvalds 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2000-05-07 17:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: Rajagopal Ananthanarayanan, Benjamin Redelings I, linux-mm On Sat, 6 May 2000, Linus Torvalds wrote: > My personal inclination is along the lines of > - we never really care about any particular zone. We should make sure > that all zones get balanced, and that is what running kswapd will > eventually cause. > - things like "shrink_mmap" and "vmscan" should both free any page from > any zone that is (a) a good candidateand (b) the zone is not yet > well-balanced. double-nod > - looking at "shrink_mmap()", my reaction would not be to add more > complexity to it, but to remove the _one_ special case that looks at > one specific zone: > > /* wrong zone? not looped too often? roll again... */ > if (page->zone != zone && count) > goto again; > > I would suggest just removing that test altogether. The page wasn't > from a "wrong zone". It was just a different zone that also needed > balancing. The danger in this is that we could "use up" the remaining ticks on the count variable in do_try_to_free_pages() and end up with a failed rmqueue for the request... Oh, and the return value for shrink_mmap() will still indicate success, even if we failed to free a page for the zone we intended ... we've already decided for that before we get into the loop or not. But I agree that this test is wrong; it makes shrink_mmap() loop to often compared to swap_out(), leading to worse page aging in the swap cache and increased cpu use. The solution could be to let do_try_to_free_page() loop more often than it does now ... increasing our chances of freeing from the right zone while at the same time not increasing the amount of work to be done (we need to do it anyway, so why not do it now and have that memory allocation succeed?) regards, Rik -- The Internet is not a network of computers. It is a network of people. That is its real strength. Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies http://www.conectiva.com/ http://www.surriel.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-07 17:40 ` Rik van Riel @ 2000-05-07 17:53 ` Linus Torvalds 2000-05-07 19:13 ` Rajagopal Ananthanarayanan 0 siblings, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2000-05-07 17:53 UTC (permalink / raw) To: riel; +Cc: Rajagopal Ananthanarayanan, Benjamin Redelings I, linux-mm On Sun, 7 May 2000, Rik van Riel wrote: > On Sat, 6 May 2000, Linus Torvalds wrote: > > > - looking at "shrink_mmap()", my reaction would not be to add more > > complexity to it, but to remove the _one_ special case that looks at > > one specific zone: > > > > /* wrong zone? not looped too often? roll again... */ > > if (page->zone != zone && count) > > goto again; > > > > I would suggest just removing that test altogether. The page wasn't > > from a "wrong zone". It was just a different zone that also needed > > balancing. > > The danger in this is that we could "use up" the remaining > ticks on the count variable in do_try_to_free_pages() and > end up with a failed rmqueue for the request... I agree. However, I think the logic should be - kswapd tries to keep all zones reasonably well balanced - but kswapd obviously cannot do a perfect job, especially with bursty allocations, so: - we should at some point start synchronously helping kswapd - if somebody has special requirements, they may not be always possibly under all circumstances. Basically, it boils down to: we should try to do our best, but we cannot do wonders and we should realize that too. > Oh, and the return value for shrink_mmap() will still > indicate success, even if we failed to free a page for > the zone we intended ... we've already decided for that > before we get into the loop or not. You're right. The only downside to the extra test is that it unbalances the page freeing, and can lead to (for example) not using swap very efficiently because we're looping too much in shrink_mmap. Which actually seems to be one of the symptoms right now, but it may of course be dueto something else too. It can also make the aging less efficient. But my real reason for disliking it is that I prefer conceptually simple approaches, and that one test just doesn't fit conceptually ;) Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-07 17:53 ` Linus Torvalds @ 2000-05-07 19:13 ` Rajagopal Ananthanarayanan 2000-05-07 19:30 ` Linus Torvalds 0 siblings, 1 reply; 18+ messages in thread From: Rajagopal Ananthanarayanan @ 2000-05-07 19:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: riel, Benjamin Redelings I, linux-mm Linus Torvalds wrote: > > > It can also make the aging less efficient. > > But my real reason for disliking it is that I prefer conceptually simple > approaches, and that one test just doesn't fit conceptually ;) Linus & Rik, agreed that the second_scan logic I proposed earlier was not perfect. And, I agree that we should make things simpler. One question about what shrink_mmap is trying to accomplish, conceptually: In the presense unreferenced pages in zones with free_pages > pages_high, should shrink_mmap ever fail? Current shrink_mmap will always skip over the pages of such zones. This in turn can lead to swapping. -------------------------------------------------------------------------- Rajagopal Ananthanarayanan ("ananth") Member Technical Staff, SGI. -------------------------------------------------------------------------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-07 19:13 ` Rajagopal Ananthanarayanan @ 2000-05-07 19:30 ` Linus Torvalds 0 siblings, 0 replies; 18+ messages in thread From: Linus Torvalds @ 2000-05-07 19:30 UTC (permalink / raw) To: Rajagopal Ananthanarayanan; +Cc: riel, Benjamin Redelings I, linux-mm On Sun, 7 May 2000, Rajagopal Ananthanarayanan wrote: > > In the presense unreferenced pages in zones with free_pages > pages_high, > should shrink_mmap ever fail? Current shrink_mmap will > always skip over the pages of such zones. This in turn > can lead to swapping. I think shrink_mmap() should fail for that case: it tells the logic that calls it that its time to stop calling shrink_mmap(), and go to vmscan instead (so that next time we call shrink_mmap, we may in fact find some pages to free). If there really are tons of pages with free_pages > pages_high, then we must have called shrink_mmap() for some other reason, so we're probably interested in another zone altogether that isn't even a subset of the "tons of memory" case (because if we had been interested in any class that has the "lots of free memory" zone as a subset, then the logic in __alloc_pages() would just have allocated it directly without worrying about zone balancing at all). Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-06 19:35 ` Linus Torvalds 2000-05-06 5:35 ` Benjamin Redelings I @ 2000-05-09 1:52 ` Quintela Carreira Juan J. 2000-05-09 2:28 ` Rajagopal Ananthanarayanan 2000-05-09 2:33 ` Linus Torvalds 1 sibling, 2 replies; 18+ messages in thread From: Quintela Carreira Juan J. @ 2000-05-09 1:52 UTC (permalink / raw) To: Linus Torvalds Cc: Rajagopal Ananthanarayanan, Andrea Arcangeli, Benjamin Redelings I, linux-mm >>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes: linus> in vmscan.c, and that seems to be quite well-behaved too (but if somebody linus> has the energy to test the two different versions, I'd absolutely love to linus> hear results..) Hi Linus, I have tested two versions of the patch (against vanilla pre7-6), the first was to remove the test altogether (I think this is from Rajagopal): --- pre7-6/mm/vmscan.c Fri May 5 23:58:56 2000 +++ testing/mm/vmscan.c Mon May 8 23:30:52 2000 @@ -114,8 +114,9 @@ * Don't do any of the expensive stuff if * we're not really interested in this zone. */ - if (!page->zone->zone_wake_kswapd) +/* if (!page->zone->zone_wake_kswapd) goto out_unlock; +*/ /* * Ok, it's really dirty. That means that Second one is the Linus suggestion, change the test for: diff -u -urN --exclude=CVS --exclude=*~ --exclude=.#* --exclude=TAGS pre7-6/mm/vmscan.c testing2/mm/vmscan.c --- pre7-6/mm/vmscan.c Fri May 5 23:58:56 2000 +++ testing2/mm/vmscan.c Tue May 9 01:46:08 2000 @@ -114,7 +114,7 @@ * Don't do any of the expensive stuff if * we're not really interested in this zone. */ - if (!page->zone->zone_wake_kswapd) + if (page->zone->free_pages > page->zone->pages_high) goto out_unlock; /* and thred one was the classzone-25 patch from Andrea. The test is one of my tests: while (true); do time ./mmap002; done which the size parameter adjusted to the size of te memory of the system. The results are: vanilla pre7-6 kills *all* my processes after 2 minutes and a half pre7-6 + Rajagopal: Works quite well, times are stable between 2m20 and 3m10 (didn't kill any processes) pre7-6 + Linus: Kill all the processes after 3m and a few seconds. pre7-6 + classzone25: between 2m8 seconds and 2m23. 2.2.15: between 1m50 and 2m15 (the time is quite stable around 1m50) It has killed one process in 7 so far. If you need more information, let me know. As always comments, suggestions are welcome. Later, Juan. -- In theory, practice and theory are the same, but in practice they are different -- Larry McVoy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-09 1:52 ` Quintela Carreira Juan J. @ 2000-05-09 2:28 ` Rajagopal Ananthanarayanan 2000-05-09 2:33 ` Linus Torvalds 1 sibling, 0 replies; 18+ messages in thread From: Rajagopal Ananthanarayanan @ 2000-05-09 2:28 UTC (permalink / raw) To: Quintela Carreira Juan J. Cc: Linus Torvalds, Andrea Arcangeli, Benjamin Redelings I, linux-mm "Quintela Carreira Juan J." wrote: > > >>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes: > > linus> in vmscan.c, and that seems to be quite well-behaved too (but if somebody > linus> has the energy to test the two different versions, I'd absolutely love to > linus> hear results..) > > Hi Linus, > I have tested two versions of the patch (against vanilla > pre7-6), the first was to remove the test altogether (I think this is > from Rajagopal): > > --- pre7-6/mm/vmscan.c Fri May 5 23:58:56 2000 > +++ testing/mm/vmscan.c Mon May 8 23:30:52 2000 > @@ -114,8 +114,9 @@ > * Don't do any of the expensive stuff if > * we're not really interested in this zone. > */ > - if (!page->zone->zone_wake_kswapd) > +/* if (!page->zone->zone_wake_kswapd) > goto out_unlock; > +*/ > I'm having the same experience too. The one thing that makes stuff better is not to look at the zone at all in try_to_swap_out (as Juan points out above). I'm trying to also see if we can do better in shrink_mmap(). Although my gprof statistics say that we can end-up spending 91% of the time skipping pages, I'm not able to comeup with anything simple to make shrink_mmap behave better ... except one change which makes swapping a lot less and shrink_mmap a lot more agressive: don't skip pages based on zone's high water mark if we are trying hard to free pages (my heuristic was to stop skipping pages if priority in shrink_mmap was 3; YMMV). I'm not entirely convinced that this is the right thing to do. In all, I do think that try_to_swap_out shouldn't skip pages based on zones. We have now evidence from 3 different "workloads" in this direction --- my own dbench test, Juan's test above & Benjamin's "gaming" workload. -------------------------------------------------------------------------- Rajagopal Ananthanarayanan ("ananth") Member Technical Staff, SGI. -------------------------------------------------------------------------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-09 1:52 ` Quintela Carreira Juan J. 2000-05-09 2:28 ` Rajagopal Ananthanarayanan @ 2000-05-09 2:33 ` Linus Torvalds 2000-05-09 3:31 ` Rajagopal Ananthanarayanan 1 sibling, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2000-05-09 2:33 UTC (permalink / raw) To: Quintela Carreira Juan J. Cc: Rajagopal Ananthanarayanan, Andrea Arcangeli, Benjamin Redelings I, linux-mm On 9 May 2000, Quintela Carreira Juan J. wrote: > Hi Linus, > I have tested two versions of the patch (against vanilla > pre7-6), the first was to remove the test altogether (I think this is > from Rajagopal): I'll make my current pre7-7 available right away, to head off the discussion. I found out the real reason for the problem, and it was quite a lot more subtle than I originally thought. The "don't page out pages from zones that don't need it" test is a good test, but it turns out that it triggers a rather serious problem: the way the buffer cache dirty page handling is done is by having shrink_mmap() do a "try_to_free_buffers()" on the pages it encounters that have "page->buffer" set. And doing that is quite important, because without that logic the buffers don't get written to disk in a timely manner, nor do already-written buffers get refiled to their proper lists. So you end up being "out of memory" - not because the machine is really out of memory, but because those buffers have a tendency to stick around if they aren't constantly looked after by "try_to_free_buffers()". So the real fix ended up being to re-order the tests in shrink_mmap() a bit, so that try_to_free_buffers() is called even for pages that are on a good zone that doesn't need any real balancing.. [ time passes ] pre7-7 is there now. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [DATAPOINT] pre7-6 will not swap 2000-05-09 2:33 ` Linus Torvalds @ 2000-05-09 3:31 ` Rajagopal Ananthanarayanan 0 siblings, 0 replies; 18+ messages in thread From: Rajagopal Ananthanarayanan @ 2000-05-09 3:31 UTC (permalink / raw) To: Linus Torvalds Cc: Quintela Carreira Juan J., Andrea Arcangeli, Benjamin Redelings I, linux-mm Linus Torvalds wrote: > [ ... ] > > The "don't page out pages from zones that don't need it" test is a good > test, but it turns out that it triggers a rather serious problem: the way > the buffer cache dirty page handling is done is by having shrink_mmap() do > a "try_to_free_buffers()" on the pages it encounters that have > "page->buffer" set. > > And doing that is quite important, because without that logic the buffers > don't get written to disk in a timely manner, nor do already-written > buffers get refiled to their proper lists. So you end up being "out of > memory" - not because the machine is really out of memory, but because > those buffers have a tendency to stick around if they aren't constantly > looked after by "try_to_free_buffers()". > > So the real fix ended up being to re-order the tests in shrink_mmap() a > bit, so that try_to_free_buffers() is called even for pages that are on > a good zone that doesn't need any real balancing.. Not sure entirely what effect this has, except for freeing underlying buffer_head's. The page itself is still skipped. Anyway, brief examination shows that you've changed several things here (in 7-7), so I'll have to go at it some more time to get a full picture. > > [ time passes ] > > pre7-7 is there now. > > Linus Unfortunately my dbench test really runs bad with pre 7-7. Quantitively, the amount of memory in "cache" of vmstat is higher than before. write()'s start failing. More later, -- -------------------------------------------------------------------------- Rajagopal Ananthanarayanan ("ananth") Member Technical Staff, SGI. -------------------------------------------------------------------------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2000-05-09 3:31 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-05 8:07 [DATAPOINT] pre7-6 will not swap Benjamin Redelings I
[not found] <8evk0f$7jote$1@fido.engr.sgi.com>
2000-05-06 17:12 ` Rajagopal Ananthanarayanan
2000-05-06 4:25 ` Benjamin Redelings I
2000-05-06 19:35 ` Linus Torvalds
2000-05-06 5:35 ` Benjamin Redelings I
2000-05-06 21:46 ` Rik van Riel
2000-05-06 22:24 ` Rajagopal Ananthanarayanan
2000-05-06 14:03 ` Benjamin Redelings I
2000-05-07 0:22 ` Rik van Riel
2000-05-07 2:23 ` Linus Torvalds
2000-05-07 17:40 ` Rik van Riel
2000-05-07 17:53 ` Linus Torvalds
2000-05-07 19:13 ` Rajagopal Ananthanarayanan
2000-05-07 19:30 ` Linus Torvalds
2000-05-09 1:52 ` Quintela Carreira Juan J.
2000-05-09 2:28 ` Rajagopal Ananthanarayanan
2000-05-09 2:33 ` Linus Torvalds
2000-05-09 3:31 ` Rajagopal Ananthanarayanan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox