From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 15 Jan 2001 10:24:19 -0800 (PST) From: Linus Torvalds Subject: Re: swapout selection change in pre1 In-Reply-To: <20010115102445.B18014@pcep-jamie.cern.ch> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Jamie Lokier Cc: Ed Tomlinson , Marcelo Tosatti , linux-mm@kvack.org List-ID: On Mon, 15 Jan 2001, Jamie Lokier wrote: > > Freeing pages aggressively from a process that's paging lots will make > that process page more, meaning more aggressive freeing etc. etc. > Either it works and reduces overall paging fairly (great), it spirals > out of control, which will be obvious, or it'll simply be stable at many > different rates which is undesirable but not so obvious in testing. I doubt that it gets to any of the bad cases. See - when the VM layer frees pages from a virtual mapping, it doesn't throw them away. The pages are still there, and there won't be any "spiral of death". If the faulter faults them in quickly, a soft-fault will happen without any new memory allocation, and you won't see any more vmascanning. It doesn't get "worse", if the working set actually fits in memory. So the only case that actually triggers a "meltdown" is when the working set does _not_ fit in memory, in which case not only will the pages be unmapped, but they'll also get freed aggressively by the page_launder() logic. At that point, the big process will actually end up waiting for the pages, and will end up penalizing itself, which is exactly what we want. So it should never "spiral out of control", simply because of the fact that if we fit in memory it has no other impact than initially doing more soft page faults when it tries to find the right balancing point. It only really kicks in for real when people are continually trying to free memory: which is only true when we really have a working set bigger than available memory, and which is exactly the case where we _want_ to penalize the people who seem to be the worst offenders. So I woubt you get any "subtle cases". Note that this ties in to the thread issue too: if you have a single VM and 50 threads that all fault in, that single VM _will_ be penalized. Not because it has 50 threads (like the old code did), but because it has a very active paging behaviour. Which again is exactly what we want: we don't want to penalize threads per se, because threads are often used for user interfaces etc and can often be largely dormant. What we really want to penalize is bad VM behaviour, and that's exactly the information we get from heavy page faulting. NOTE! I'm not saying that tuning isn't necessary. Of course it is. And I suspect that we actually want to add a page allocation flag (__GPF_VM) that says that "this allocation is for growing our VM", and perhaps make the VM shrinking conditional on that - so that the VM shrinking really only kicks in for the big VM offenders, not for people who just read files into the page cache. So yes, we'll have VM tuning, the same as 2.2.x had and probably still has. But I think our algorithms are a lot more "fundamentally stable" than they were before. Which is not to say that the tuning is obvious - I just claim that we will probably have a lot better time doing it, and that we have more tools in our tool-chest. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/