From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 7 Jan 2008 19:37:55 -0800 (PST) From: David Rientjes Subject: Re: [PATCH 11 of 11] not-wait-memdie In-Reply-To: <200801081425.31515.nickpiggin@yahoo.com.au> Message-ID: References: <504e981185254a12282d.1199326157@v2.random> <200801081425.31515.nickpiggin@yahoo.com.au> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Nick Piggin Cc: Christoph Lameter , Andrea Arcangeli , linux-mm@kvack.org, Andrew Morton List-ID: On Tue, 8 Jan 2008, Nick Piggin wrote: > The problem is the global reserve. Once you have a kernel that doesn't > need this handwavy global reserve for forward progress, a lot of little > problems go away. > I'm specifically talking about TIF_MEMDIE here which gives access to that global reserve. In OOM situations there is no easy way to guarantee that a task will have enough memory to exit, but that is exactly what is needed to alleviate the condition. Additionally, it is not guaranteed that a task that has been OOM killed and given access to the global reserve will exit after it has exhausted that reserve in its entirety. That's when the system deadlocks. So giving access to the global reserve to multiple tasks that share memory in at least one of their zones for simultaneous OOM killings is not a complete solution. There should be a timeout on tasks when they are OOM killed; if they cannot exit for the duration of that period, they lose access to the reserves and only then is another task selected. > > It should be given to a single > > OOM-killed task that will alleviate the OOM condition for the task that > > called out_of_memory(). > > It should be, but that task you OOM may be blocking on another one that > is waiting for memory, for example. > And after the timeout that I'm proposing it, or another suitable candidate, will be killed instead. The dependencies are beyond the scope of the OOM killer badness scoring but without giving tasks a short but reasonable period to exit and then opting to kill another task there will always exist the potential for deadlock. > > That's only possible with my proposal of adding > > > > unsigned long oom_kill_jiffies; > > > > to struct task_struct. We can't get away with a system-wide jiffies > > variable, nor can we get away with per-cgroup, per-cpuset, or > > per-mempolicy variable. The only way to clear such a variable is in the > > exit path (by checking test_thread_flag(tsk, TIF_MEMDIE) in do_exit()) and > > fails miserably if there are simultaneous but zone-disjoint OOMs > > occurring. > > Why not just have a global frequency limit on OOM events. Then the panic > has this delay factored in... > Because OOM killing is going to become more and more frequent with the introduction of the memory controller which uses it as a mechanism to enforce its policy. And a global frequency limit does not work well for parallel cpuset, mempolicy, or memory controller OOM events. That is why it is currently serialized by the triggering task's zonelist and not globally. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org