From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id 205128D003A for ; Sat, 12 Mar 2011 14:40:17 -0500 (EST) Received: from kpbe16.cbf.corp.google.com (kpbe16.cbf.corp.google.com [172.25.105.80]) by smtp-out.google.com with ESMTP id p2CJeCA1015976 for ; Sat, 12 Mar 2011 11:40:14 -0800 Received: from vxd2 (vxd2.prod.google.com [10.241.33.194]) by kpbe16.cbf.corp.google.com with ESMTP id p2CJe1DV019194 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Sat, 12 Mar 2011 11:40:11 -0800 Received: by vxd2 with SMTP id 2so3905354vxd.8 for ; Sat, 12 Mar 2011 11:40:11 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20110312134341.GA27275@redhat.com> References: <20110303100030.B936.A69D9226@jp.fujitsu.com> <20110308134233.GA26884@redhat.com> <20110309151946.dea51cde.akpm@linux-foundation.org> <20110312123413.GA18351@redhat.com> <20110312134341.GA27275@redhat.com> Date: Sat, 12 Mar 2011 11:40:11 -0800 Message-ID: Subject: Re: [PATCH 0/3] oom: TIF_MEMDIE/PF_EXITING fixes From: Hugh Dickins Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Oleg Nesterov Cc: Andrew Morton , KOSAKI Motohiro , KAMEZAWA Hiroyuki , linux-mm@kvack.org, Andrey Vagin , David Rientjes Hi Oleg, On Sat, Mar 12, 2011 at 5:43 AM, Oleg Nesterov wrote: > On 03/12, Oleg Nesterov wrote: >> >> On 03/11, David Rientjes wrote: >> > >> > On Wed, 9 Mar 2011, Andrew Morton wrote: >> > >> > > If Oleg's test program cause a hang with >> > > oom-prevent-unnecessary-oom-kills-or-kernel-panics.patch and doesn't >> > > cause a hang without >> > > oom-prevent-unnecessary-oom-kills-or-kernel-panics.patch then that's= a >> > > big problem for >> > > oom-prevent-unnecessary-oom-kills-or-kernel-panics.patch, no? >> > > >> > >> > It's a problem, but not because of >> > oom-prevent-unnecessary-oom-kills-or-kernel-panics.patch. >> >> It is, afaics. oom-killer can't ussume that a single PF_EXITING && p->mm >> thread is going to free the memory. >> >> > If we don't >> > have this patch, then we have a trivial panic when an oom kill occurs = in a >> > cpuset with no other eligible processes, the oom killed thread group >> > leader exits >> >> It is not clear what "leader exits" actually mean. OK, perhaps you mean >> its ->mm =3D=3D NULL. >> >> > but its other threads do not and they trigger oom kills >> > themselves. =C2=A0for_each_process() does not iterate over these threa= ds and so >> > it finds no eligible threads to kill and then panics >> >> Could you explain what do you mean? No need to kill these threads, they >> are already killed, we should wait until they all exit. >> >> > I'll look at Oleg's test case >> > and see what can be done to fix that condition, but the answer isn't t= o >> > ignore eligible threads that can be killed. >> >> Once again, they are already killed. Or I do not understand what you mea= nt. >> >> Could you please explain the problem in more details? >> >> >> Also. Could you please look at the patches I sent? >> >> =C2=A0 =C2=A0 =C2=A0 [PATCH 1/1] oom_kill_task: mark every thread as TIF= _MEMDIE >> =C2=A0 =C2=A0 =C2=A0 [PATCH v2 1/1] select_bad_process: improve the PF_E= XITING check > > Cough. And both were not right, while_each_thread(p, t) needs the properl= y > initialized "t". At least I warned they were not tested ;) > >> Note also the note about "p =3D=3D current" check. it should be fixed to= o. > > I am resending the fixes above plus the new one. I've spent much of the week building up to join in, but the more I look around, the more I find to say or investigate, and therefore never quite get to write the mail. Let this be a placeholder, that I probably disagree (in an amicable way!) with all of you, and maybe I'll finally manage to collect my thoughts into mail later today. I guess my main point will be that TIF_MEMDIE serves a number of slightly different, even conflicting, purposes; and one of those purposes, which present company seems to ignore repeatedly, is to serialize access to final reserves of memory - as a comment by Nick in select_bad_process() makes clear. (This is distinct from the serialization to avoid OOM-killing rampage.) We _might_ choose to abandon that, but if so, it should be a decision, not an oversight. So I cannot blindly agree with just setting TIF_MEMDIE on more and more tasks, even if they share the same mm. I wonder if use of your find_lock_task_mm() in select_bad_process() might bring together my wish to continue serialization, David's wish to avoid stupid panics, and your wish to avoid deadlocks. Though any serialization has a risk of deadlock: we probably need to weigh up how realistic different cases are. Which brings me neatly to your little pthread_create, ptrace proggy... I dare say you and Kosaki and David know exactly what it's doing and why it's a problem, but even after repeated skims of the ptrace manpage, I'll admit to not having a clue, nor the inclination to run and then debug it to find out the answer. Please, Oleg, would you mind very much explaining it to me? I don't even know if the double pthread_create is a vital part of the scheme, or just a typo. I see it doesn't even allocate any special memory, so I assume it leaves a PF_EXITING around forever, but I couldn't quite see how (with PF_EXITING being set after the tracehook_report_exit). And I wonder if a similar case can be constructed to deadlock the for_each_process version of select_bad_process(). I worry more about someone holding a reference to the mm via /proc (I see memory allocations after getting the mm). Thanks; until later, Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org