From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f49.google.com (mail-pa0-f49.google.com [209.85.220.49]) by kanga.kvack.org (Postfix) with ESMTP id 0B2B16B00F0 for ; Wed, 6 Nov 2013 11:59:08 -0500 (EST) Received: by mail-pa0-f49.google.com with SMTP id lj1so10738646pab.36 for ; Wed, 06 Nov 2013 08:59:08 -0800 (PST) Received: from psmtp.com ([74.125.245.175]) by mx.google.com with SMTP id sd2si17573624pbb.319.2013.11.06.08.59.06 for ; Wed, 06 Nov 2013 08:59:07 -0800 (PST) Received: by mail-wg0-f53.google.com with SMTP id y10so5302624wgg.32 for ; Wed, 06 Nov 2013 08:59:04 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <1383693987-14171-1-git-send-email-snanda@chromium.org> From: Sameer Nanda Date: Wed, 6 Nov 2013 08:58:44 -0800 Message-ID: Subject: Re: [PATCH] mm, oom: Fix race when selecting process to kill Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Luigi Semenzato Cc: msb@facebook.com, David Rientjes , Andrew Morton , mhocko@suse.cz, Johannes Weiner , Rusty Russell , linux-mm@kvack.org, linux-kernel@vger.kernel.org (adding back context from thread history) On Tue, Nov 5, 2013 at 5:18 PM, David Rientjes wrote: > On Tue, 5 Nov 2013, Sameer Nanda wrote: > >> The selection of the process to be killed happens in two spots -- first >> in select_bad_process and then a further refinement by looking for >> child processes in oom_kill_process. Since this is a two step process, >> it is possible that the process selected by select_bad_process may get a >> SIGKILL just before oom_kill_process executes. If this were to happen, >> __unhash_process deletes this process from the thread_group list. This >> then results in oom_kill_process getting stuck in an infinite loop when >> traversing the thread_group list of the selected process. >> >> Fix this race by holding the tasklist_lock across the calls to both >> select_bad_process and oom_kill_process. >> >> Change-Id: I8f96b106b3257b5c103d6497bac7f04f4dff4e60 >> Signed-off-by: Sameer Nanda > > Nack, we had to avoid taking tasklist_lock for this duration since it > stalls out forks and exits on other cpus trying to take the writeside with > irqs disabled to avoid watchdog problems. > David -- I think we can make the duration that the tasklist_lock is held smaller by consolidating the process selection logic that is currently split across select_bad_process and oom_kill_process into one place in select_bad_process. The tasklist_lock would then need to be held only when the thread lists are being traversed. Would you be ok with that? I can re-spin the patch if that sounds like a workable option. > What kernel version are you patching? If you check the latest Linus tree, > we hold a reference to the task_struct of the chosen process before > calling oom_kill_process() so the hypothesis would seem incorrect. On Tue, Nov 5, 2013 at 11:17 PM, Luigi Semenzato wrote: > Regarding other fixes: would it be possible to have the thread > iterator insert a dummy marker element in the thread list before the > scan? There would be one such dummy element per CPU, so that multiple > CPUs can scan the list in parallel. The loop would skip such > elements, and each dummy element would be removed at the end of each > scan. > > I think this would work, i.e. it would have all the right properties, > but I don't have a sense of whether the performance impact is > acceptable. Probably not, or it would have been proposed earlier. > > > > On Tue, Nov 5, 2013 at 8:45 PM, Luigi Semenzato wrote: >> It's interesting that this was known for 3+ years, but nobody bothered >> adding a small warning to the code. >> >> We noticed this because it's actually happening on Chromebooks in the >> field. We try to minimize OOM kills, but we can deal with them. Of >> course, a hung kernel we cannot deal with. >> >> On Tue, Nov 5, 2013 at 7:04 PM, Sameer Nanda wrote: >>> >>> >>> >>> On Tue, Nov 5, 2013 at 5:27 PM, David Rientjes wrote: >>>> >>>> On Tue, 5 Nov 2013, Luigi Semenzato wrote: >>>> >>>> > It's not enough to hold a reference to the task struct, because it can >>>> > still be taken out of the circular list of threads. The RCU >>>> > assumptions don't hold in that case. >>>> > >>>> >>>> Could you please post a proper bug report that isolates this at the cause? >>> >>> >>> We've been running into this issue on Chrome OS. crbug.com/256326 has >>> additional >>> details. The issue manifests itself as a soft lockup. >>> >>> The kernel we've been seeing this on is 3.8. >>> >>> We have a pretty consistent repro currently. Happy to try out other >>> suggestions >>> for a fix. >>> >>>> >>>> >>>> Thanks. >>> >>> >>> >>> >>> -- >>> Sameer -- Sameer -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org