From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Piggin Subject: Re: [PATCH 11 of 11] not-wait-memdie Date: Tue, 8 Jan 2008 18:42:33 +1100 References: <504e981185254a12282d.1199326157@v2.random> <200801081425.31515.nickpiggin@yahoo.com.au> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200801081842.33482.nickpiggin@yahoo.com.au> Sender: owner-linux-mm@kvack.org Return-Path: To: David Rientjes Cc: Christoph Lameter , Andrea Arcangeli , linux-mm@kvack.org, Andrew Morton List-ID: On Tuesday 08 January 2008 14:37, David Rientjes wrote: > On Tue, 8 Jan 2008, Nick Piggin wrote: > > The problem is the global reserve. Once you have a kernel that doesn't > > need this handwavy global reserve for forward progress, a lot of little > > problems go away. > > I'm specifically talking about TIF_MEMDIE here which gives access to that > global reserve. And I'm specifically talking about PF_MEMALLOC, which does the same. > In OOM situations there is no easy way to guarantee that > a task will have enough memory to exit, but that is exactly what is needed > to alleviate the condition. Additionally, it is not guaranteed that a > task that has been OOM killed and given access to the global reserve will > exit after it has exhausted that reserve in its entirety. That's when the > system deadlocks. I know all that ;) Your second point is the reason to have more than 1 MEMDIE process... > So giving access to the global reserve to multiple tasks that share memory > in at least one of their zones for simultaneous OOM killings is not a > complete solution. There should be a timeout on tasks when they are OOM > killed; if they cannot exit for the duration of that period, they lose > access to the reserves and only then is another task selected. Hmm, OK I didn't realise you'd proposed that as an alternative. Maybe. I don't know if the complexity would be worthwhile, given that there is no sort of reentrancy limit on the global reserve pool anyway. > > > That's only possible with my proposal of adding > > > > > > unsigned long oom_kill_jiffies; > > > > > > to struct task_struct. We can't get away with a system-wide jiffies > > > variable, nor can we get away with per-cgroup, per-cpuset, or > > > per-mempolicy variable. The only way to clear such a variable is in > > > the exit path (by checking test_thread_flag(tsk, TIF_MEMDIE) in > > > do_exit()) and fails miserably if there are simultaneous but > > > zone-disjoint OOMs occurring. > > > > Why not just have a global frequency limit on OOM events. Then the panic > > has this delay factored in... > > Because OOM killing is going to become more and more frequent with the > introduction of the memory controller which uses it as a mechanism to > enforce its policy. And a global frequency limit does not work well for > parallel cpuset, mempolicy, or memory controller OOM events. That is why > it is currently serialized by the triggering task's zonelist and not > globally. I don't think that's a very good reason for the complexity. If your system is OOM-throughput-limited, then something's very wrong with your wokload management. (and I don't buy the DoS security argument either because the memory controller doesn't provide security last time I looked). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org