From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 3 Jan 2008 12:49:33 -0800 (PST) From: David Rientjes Subject: Re: [PATCH 04 of 11] avoid selecting already killed tasks In-Reply-To: <20080103195433.GW30939@v2.random> Message-ID: References: <4cf8805b5695a8a3fb7c.1199326150@v2.random> <20080103134137.GT30939@v2.random> <20080103195433.GW30939@v2.random> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Andrea Arcangeli Cc: linux-mm@kvack.org, Andrew Morton List-ID: On Thu, 3 Jan 2008, Andrea Arcangeli wrote: > In theory no memory allocation should be required in do_exit.... in > practice sometime it can happen, but the PF_MEMALLOC pool is available > and can be emptied way before the first task has been killed, and the > potential eaters of the PF_MEMALLOC pool are much heavier users than > the do_exit path, so I doubt worrying about the memory reserves by the > time TIF_MEMDIE has been set is a valid concern. > Ok. > > the best alternative is to then take TIF_MEMDIE away from that task, > > reduce its timeslice, and never select it again for OOM kill. > > The TIF_MEMDIE undoing isn't a big deal. Sigkilling undoing is more > interesting. > Well, that doesn't matter either if the task is stuck in D state forever. I was thinking that reducing the timeslice to 1 would be beneficial, however, for the remainder of the system's uptime since the task will have received the HZ timeslice when killed by the OOM killer. > I tried to prioritize and reduce and simplify the amount of stuff to > push to the minimum to be stable, but certainly I'd like to take the > more complex approach too, yet I'd keep it at the end to keep the > priority high on preventing the crash with small changes. I was being > more complex originally with a global timeout, still simpler than your > per-task timeout, and yet it wasn't merged as style changes > to such code bitrotten the patchset I guess. > Ok. The global timeout would require the jiffies to be stored when the SIGKILL is issued and cleared in the exit path with a test_tsk_thread_flag(p, TIF_MEMDIE) check. Unfortunately that doesn't work because, as you said, it is possible for more than one thread to have TIF_MEMDIE. So there would be no way to catch tasks stuck in D state that have been OOM killed to be exempted from making the entire OOM killer a no-op. > > It was made on a per-zone level instead of a global level, as your > > approach did, to support cpusets and memory policy OOM killings. With a > > global approach these OOM kills would have taken longer because you were > > serializing globally and the OOM killer was dealing with a zonelist that > > wouldn't necessarily have alleviated OOM conditions in other zones. > > I know, scaling oom killing in parallel in numa is nicer but in > practice oom is rare and should never happen... so my global approach > wasn't that different ;) > It's becoming much more popular since the memory controller work that is based on cgroups uses OOM killing as a mechanism, in part, for enforcing its policy. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org