From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 086A56B0005 for ; Tue, 31 May 2016 18:25:07 -0400 (EDT) Received: by mail-pf0-f200.google.com with SMTP id s73so1154725pfs.0 for ; Tue, 31 May 2016 15:25:07 -0700 (PDT) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [202.181.97.72]) by mx.google.com with ESMTPS id s15si46321328pfj.65.2016.05.31.15.25.05 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 31 May 2016 15:25:05 -0700 (PDT) Subject: Re: [PATCH v2] mm,oom: Allow SysRq-f to always select !TIF_MEMDIE thread group. From: Tetsuo Handa References: <1464432784-6058-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> <1464452714-5126-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> <20160531131159.GL26128@dhcp22.suse.cz> In-Reply-To: <20160531131159.GL26128@dhcp22.suse.cz> Message-Id: <201606010635.HJI86975.JOFOFFMQHLVtSO@I-love.SAKURA.ne.jp> Date: Wed, 1 Jun 2016 06:35:30 +0900 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: mhocko@suse.cz Cc: akpm@linux-foundation.org, linux-mm@kvack.org, rientjes@google.com Michal Hocko wrote: > On Sun 29-05-16 01:25:14, Tetsuo Handa wrote: > > There has been three problems about SysRq-f (manual invocation of the OOM > > killer) case. To make description simple, this patch assumes situation > > where the OOM reaper is not called (because the OOM victim's mm is shared > > by unkillable threads) or not available (due to kthread_run() failure or > > CONFIG_MMU=n). > > > > First is that moom_callback() is not called by moom_work under OOM > > livelock situation because it does not have a dedicated WQ like vmstat_wq. > > This problem is not fixed yet. > > Why do you mention it in the changelog when it is not related to the > patch then? Just we won't forget about it. > > Btw. you can (ab)use oom_reaper for that purpose. The patch would be > quite trivial. How do you handle CONFIG_MMU=n case? Are we going to provide oom_reaper for CONFIG_MMU=n case? > > > Second is that select_bad_process() chooses a thread group which already > > has a TIF_MEMDIE thread. Since commit f44666b04605d1c7 ("mm,oom: speed up > > select_bad_process() loop") changed oom_scan_process_group() to use > > task->signal->oom_victims, non SysRq-f case will no longer select a > > thread group which already has a TIF_MEMDIE thread. > > I am not sure the reference to the commit is really helpful. The > behavior you are describing below was there before this commit, the only > thing that has changed is the scope of the TIF_MEMDIE check. Indeed. Traversing on all threads can always find a thread group which already has a TIF_MEMDIE thread when there is already a TIF_MEMDIE thread. > > > But SysRq-f case will > > select such thread group due to returning OOM_SCAN_OK. This patch makes > > sure that oom_badness() is skipped by making oom_scan_process_group() to > > return OOM_SCAN_CONTINUE for SysRq-f case. > > I am OK with this part. I was suggesting something similar except I > wanted to skip over tasks which have fatal_signal_pending and that part > got nacked by David AFAIR. Could you make this a separate patch, please? I think it is better to change both part with this patch. > > > Third is that oom_kill_process() chooses a thread group which already > > has a TIF_MEMDIE thread when the candidate select_bad_process() chose > > has children because oom_badness() does not take TIF_MEMDIE into account. > > This patch checks child->signal->oom_victims before calling oom_badness() > > if oom_kill_process() was called by SysRq-f case. This resembles making > > sure that oom_badness() is skipped by returning OOM_SCAN_CONTINUE. > > This makes sense to me as well but why should be limit this to sysrq case? > Does it make any sense to select a child which already got killed for > normal OOM killer? Anyway I think it would be better to split this into > its own patch as well. The reason is described in next paragraph. Do we prefer immediately killing all children of the allocating task? If yes, I think it should be a separate patch on top of this patch because somebody might complain such behavior as a regression. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org