From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20001010002520.B8709@athlon.random> Date: Tue, 10 Oct 2000 08:59:23 +1000 (EST) From: Peter Waltenberg Subject: Re: [PATCH] VM fix for 2.4.0-test9 & OOM handler Sender: owner-linux-mm@kvack.org Return-Path: To: Andrea Arcangeli Cc: Linus Torvalds , Rik van Riel , Byron Stanoszek , MM mailing list , Ingo Molnar , Peter Waltenberg List-ID: On 09-Oct-2000 Andrea Arcangeli wrote: > On Tue, Oct 10, 2000 at 07:10:13AM +1000, Peter Waltenberg wrote: >> People seem to be forgetting (again), that Rik's code is *REALLY* an > > Please explain why you think "people" is forgetting that. At least from my > part > I wasn't forgetting that and so far I didn't read any email that made me to > think others are forgetting that. I didn't mail the whole kernel list originally, maybe I should have. This discussion has happened before. The OOM code can never be perfect, I beleive this can be proven mathematically. In that case, it should at least be simple. We've seen suggestion after suggestion recently for making the heuristics more and more complex to cope with corner cases. That isn't going to help, it just makes it's behaviour less predictable. THAT is what I was commenting on. Without some last resort "kill user processes" code, the kernel hangs under memory pressure. It'd be nicer if it didn't, but eventually thats what happens. Having some last resort kernel process which will attempt to keep the kernel usable is a good idea, and it seems to work, at least on my testing. Frankly, when it gets to the point where my machine will crash anyway, I don't really care if the OOM killer gets it wrong now and then. It's still better than it not being there. I realize that the MM people are making efforts to ensure that the kernel will keep running under insane pressure, and maybe you'll produce a kernel now and then that doesn't die, BUT, I don't think you can ensure that's the case with every kernel produced. Something will slip through, and again we'll have the possibility of hangs. Having a SIMPLE OOM handler in the kernel is a very usefull fallback, it's a last resort, and if it gets it right 9 times out of 10, it's added another "9" to the reliability figures. >> It's probably reasonable to not kill init, but the rest just don't matter. > > Killing init is a kernel bug. > >> Without the OOM killer the machine would have locked up and you'd lose that >> 3 > > Grab 2.2.18pre15aa1 and try to lockup the machine if you can. > >> At least with Rik's code you end up with a usable machine afterwards which >> is >> a major improvement. > > If current 2.4.x lockups during OOM that's because of bugs introduced during > 2.[34].x. The oom killer is completly irrelevant to the stability of the > kernel, But not the the stability of the system. I agree, it's better if the OOM killer never gets used, but the majority of kernels released ARE killable with memory pressure. > the oom killer only deals with the _selection_ of the task to kill. OOM > detection is a completly orthogonal issue. > > If something the oom killer can introduce a lockup condition if there isn't > a mechamism to fallback killing the current task (all the other tasks > may be sleeping on a down-nfs-server in interruptible mode). That probably doesn't matter, the machine would be dead otherwise anyway. WITH the OOM killer it has some chance of recovering, without it none. It'd be nicer if that didn't occur, but OOM handling is still an improvement. > Andrea Peter -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/