From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark_H_Johnson.RTS@raytheon.com Message-ID: <852568E0.0056F0BB.00@raylex-gh01.eo.ray.com> Date: Mon, 15 May 2000 09:50:38 -0500 Subject: Re: pre8: where has the anti-hog code gone? Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-linux-mm@kvack.org Return-Path: To: "Juan J. Quintela" Cc: linux-mm@kvack.org, riel@conectiva.com.br, torvalds@transmeta.com List-ID: I guess I have a "philosophy question" - one where I can't quite understand the situation that we are in. What is the problem that killing processes is curing? I understand that the code that [has been/still is?] killing processes is doing so because there is no "free physical memory" - right now. Yet we have had code to do a schedule() instead of killing the job, and gave the system the chance to "fix" the lack of free physical memory problem (e.g., by writing dirty pages to a mapped file or swap space on disk). From what I read from Juan's message below, I guess this code has been lost or replaced by something more hostile to user applications. The problem as I see it is that we are seeing a situation where the system can "generate" dirty pages far faster than the dirty pages make it to disk. The relationship of [extremely fast CPU] --- is much faster than -> [relatively slow disk] -- results in --> [no free physical memory] -- system kills job--> [killed process] is causing the system to trigger the process killing code. The alternative I'm suggesting is --> [no free memory] -- system does reschedule --> [dirty pages written & free physical memory] -- resume suspended job --> [job runs to completion, and no jobs are killed] to give the system time to act on the situation. If you are truly out of memory [physical memory and ALL swap space], then SOME job has to free up memory. I think we would all agree with this premise. I suggest we remove automatic job killing as a solution. If it must remain as a solution, there must be several other attempts tried first. If this is an interactive system, the user should be able to close a window or otherwise kill a job [preferably the rogue job] to make some space available. If this is a standalone system (say a server), the long term solution is likely "get more memory or swap space". However, that doesn't fix the problem "right now". In this case, give the developer or operator of that system an opportunity to make the choice on which job to kill. Perhaps reserving a small amount of memory [just like the disk reserve] for privileged users is a solution. Making the job killing choice at a low level of the kernel, based on "what's currently running" does not appear to be the "right" answer. Making this choice in the kernel and killing "init" (as Juan notes below) is almost certainly the "wrong" answer. I see a choice in alternatives... [1] replace the raise SIGKILL code with schedule(). I've tried this in older kernels (2.2.14) & it helps preserve system operation with mapped files, but doesn't help when the swap file is full. [this may fix Juan's symptoms] [2] return "out of memory" when swap is full - let application code handle it. If no action in "X" time, then kill a job. Add to kswapd? [3] long term - add a "reserve" to physical memory for root (or privileged code). [not sure how to implement] [4] Protect init (could be as simple as if pid==1, then schedule() & kill something else) [5] Long term - reduce resident set sizes to slow the generation of dirty pages. Let me use a "file copy" as an example. I can use "cp A B" to do this. I can also write a program that maps files "A" and "B" into memory & copy the contents of "A" into "B" and then unmap the two files [Multics used to do something like this for all file accesses]. These two methods SHOULD have similar characteristics in terms of CPU time, memory used, elapsed time, etc. The current VM system in Linux handles the "cp" example much better than the memory mapped example - I think this is due to overhead in memory management with large resident set sizes. [make code active to enforce RSSLIM]. As a system administrator and user of Linux, I am concerned about jobs getting "killed" - please make this a "last resort". Do it only after giving me and my users time to "fix" the problem. Thanks. --Mark H Johnson |--------+-----------------------> | | "Juan J. | | | Quintela" | | | | | | | | | 05/13/00 | | | 01:14 PM | | | | |--------+-----------------------> >----------------------------------------------------------------------------| | | | To: Linus Torvalds | | cc: Rik van Riel , linux-mm@kvack.org, | | (bcc: Mark H Johnson/RTS/Raytheon/US) | | Subject: Re: pre8: where has the anti-hog code gone? | >----------------------------------------------------------------------------| >>>>> "linus" == Linus Torvalds writes: Hi linus> So pre-8 with your suggested for for kswapd() looks pretty good, actually, linus> but still has this issue that try_to_free_pages() seems to give up too linus> easily and return failure when it shouldn't. I'll happily apply patches linus> that make for nicer behaviour once this is clearly fixed, but not before linus> (unless the "nicer behaviour" patch _also_ fixes the "pathological linus> behaviour" case ;) Here pre8, pre8 with any of the Rik patchs and pre9-1 looks bad. If I ran mmap002 in that machines it will be killed allways, now a lot of times in around 30 seconds (in previous kernels the tests lasts around 3 min before being killed). The system continues doing kills until init dies, then all the system freezes, no net, no ping answer, no keyboard answer (sysrq didn't work). No information in logs, except that some processes have been killed, no messages in the console either. If you need to reproduce the efect is easy, here in less than 5 min mmap002 test, the system is frozen. If you need more information, let me know. Later, Juan. -- In theory, practice and theory are the same, but in practice they are different -- Larry McVoy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/