From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 22 Dec 2003 00:55:42 +0100 From: Roger Luethi Subject: Re: load control demotion/promotion policy Message-ID: <20031221235541.GA22896@k3.hellgate.ch> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org Return-Path: To: Rik van Riel Cc: William Lee Irwin III , linux-mm@kvack.org, Andrew Morton List-ID: On Sat, 20 Dec 2003 21:33:34 -0500, Rik van Riel wrote: > I've got an idea for a load control / memory scheduling > policy that is inspired by the following requirements > and data points: It is my understanding that wli is interested in load control because he knows this Russian guy who puts an insane load on his box. Do you have friends in Russia as well? Isn't there _anybody_ interested in the fact that 2.6 performance completely breaks down under a light overload where 2.4 doesn't and where load control would be more of a problem than a solution? Heck, I even showed that you don't have to give up physical scanning to get most of the pageout performance back! Oh, and btw: Did I overlook this problem on akpm's should/must fix lists, or is it missing for a reason? I can't help but think of the man who looks for his keys not where he lost them but near the lamp post, where the light is. While I agree that working on load control is a lot more fun, it is _pageout_ that has been completely borked in 2.6 and there is no way in hell load control can fix that. Load control trades latency for throughput and makes sense for some situations after pageout tuning has been exhausted, which is not true at all for Linux 2.6. I hate to be a pest but I am still entirely unconvinced that load control is what 2.6 needs at this point. Maybe I should make that ceterum censeo a sig. That said, here's my take: > 1) wli pointed out that one of the better performing load > control mechanisms is one that swaps out the SMALLEST > process (easy to swap out, removes one process worth of > IO load from the system) According to wli this strategy was 15% better than random selection in terms of throughput / CPU usage. Those 15% may well be quite solid for transaction based systems, but typical Linux systems and workloads are different animals and it doesn't seem safe to rely on those numbers here. Also, on modern servers/workstations with load control, latency will become a much bigger problem than +/- 15% throughput could ever be. Bottom line: We would have to benchmark various criteria anyway and chosing the smallest process is arguably quite arbitrary. The best I could say about it is that for all we know it's as good as any other policy. > 2) small processes, like root shells, should not be > swapped out for a long time, but should be swapped > back in relatively quickly > > 3) because swapping big processes in or out is a lot of > work, we should do that infrequently > > 4) however, once a big process is swapped out, it should > stay out for a long time because it greatly reduces > the amount of memory the system needs > > The swapout selection loop would be as follows: > - calculate (rss / resident time) for every process > - swap out the process where this value is lowest > - remember the rss and swapout time in the task struct > > At swapin time we can do the opposite, looking at > every process in the swapped out queue and waking up > the process where (swap_rss / now - swap_time) is > the smallest. If I understand your description correctly, you'll probably stun sshd early on, because it will have accrued an impressive resident time. If the user starts a fat GUI administration tool to study/fix the load problem, it will likely hit the sack as well and stay there for a long time. IOW, you will help some users and quite possibly make things worse for others. Of course I don't claim your selection algorithm is any worse than mine, but I doubt it is much better. It is hard to get right -- looks like the OOM killer all over again. As for the implementation: An overload situation that is grave enough to make load control worthwhile should be a rare event. I didn't think I could justify growing the task struct even further for that. So when I wanted to save some state (like RSS at stunning time), I kept it in local variables where the processes hit the wait queue. I didn't use it for global comparisons like what you are suggesting, but even that is possible with some extra effort. And at the time load control is kicking in, we've got plenty of CPU cycles to spend on extra efforts. Roger -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org