From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, oleg@redhat.com, vdavydov@virtuozzo.com,
rientjes@google.com
Subject: Re: [PATCH v2] mm, oom: don't set TIF_MEMDIE on a mm-less thread.
Date: Sat, 25 Jun 2016 01:19:12 +0900 [thread overview]
Message-ID: <201606250119.IIJ30735.FMSHQFVtOLOJOF@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20160624120454.GB20203@dhcp22.suse.cz>
Michal Hocko wrote:
> On Fri 24-06-16 19:56:43, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Fri 24-06-16 01:24:46, Tetsuo Handa wrote:
> > > > I missed that victim != p case needs to use get_task_struct(). Patch updated.
> > > > ----------------------------------------
> > > > >From 1819ec63b27df2d544f66482439e754d084cebed Mon Sep 17 00:00:00 2001
> > > > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > Date: Fri, 24 Jun 2016 01:16:02 +0900
> > > > Subject: [PATCH v2] mm, oom: don't set TIF_MEMDIE on a mm-less thread.
> > > >
> > > > Patch "mm, oom: fortify task_will_free_mem" removed p->mm != NULL test for
> > > > shortcut path in oom_kill_process(). But since commit f44666b04605d1c7
> > > > ("mm,oom: speed up select_bad_process() loop") changed to iterate using
> > > > thread group leaders, the possibility of p->mm == NULL has increased
> > > > compared to when commit 83363b917a2982dd ("oom: make sure that TIF_MEMDIE
> > > > is set under task_lock") was proposed. On CONFIG_MMU=n kernels, nothing
> > > > will clear TIF_MEMDIE and the system can OOM livelock if TIF_MEMDIE was
> > > > by error set to a mm-less thread group leader.
> > > >
> > > > Let's do steps for regular path except printing OOM killer messages and
> > > > sending SIGKILL.
> > >
> > > I fully agree with Oleg. It would be much better to encapsulate this
> > > into mark_oom_victim and guard it by ifdef NOMMU as this is nommu
> > > specific with a big fat warning why we need it.
> >
> > OK. But before doing so, which one ((A) or (B) shown below) do you prefer?
> >
> >
> > (A) Don't use task_will_free_mem(p) shortcut in oom_kill_process() if CONFIG_MMU=n.
> >
> > Since task_will_free_mem(p) == true where p is the largest memory consumer
> > (with oom_score_adj taken into account) is not exiting smoothly, as with
> > commit 6a618957ad17d8f4 ("mm: oom_kill: don't ignore oom score on exiting
> > tasks") thought, it can be a sign of something bad (possibly OOM livelock) is
> > happening. Thus, print the OOM killer messages anyway although all tasks
> > which will be OOM killed are already killed/exiting (unless p has OOM killable
> > children). This will help giving administrator a hint when the kernel hit
> > OOM livelock.
> [...]
> > (B) Check mm in mark_oom_victim() if CONFIG_MMU=n.
> >
> > Since mark_oom_victim() is also called from current->mm && task_will_free_mem(current)
> > shortcut in out_of_memory(), mark_oom_victim(current) needs to set TIF_MEMDIE on current
> > if current->mm != NULL.
>
> I think you are overcomplicating this. Why cannot we simply do the
> following?
> ---
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4c21f744daa6..97be9324a58b 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -671,6 +671,22 @@ void mark_oom_victim(struct task_struct *tsk)
> /* OOM killer might race with memcg OOM */
> if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
> return;
> +#ifndef CONFIG_MMU
> + /*
> + * we shouldn't risk setting TIF_MEMDIE on a task which has passed its
> + * exit_mm task->mm = NULL and exit_oom_victim otherwise it could
> + * theoretically keep its TIF_MEMDIE for ever while waiting for a parent
> + * to get it out of zombie state. MMU doesn't have this problem because
> + * it has the oom_reaper to clear the flag asynchronously.
> + */
> + task_lock(tsk);
> + if (!tsk->mm) {
> + clear_tsk_thread_flag(tsk, TIF_MEMDIE);
> + task_unlock(tsk);
> + return;
> + }
> + taks_unlock(tsk);
This makes mark_oom_victim(tsk) for tsk->mm == NULL a no-op unless tsk is
currently doing memory allocation. And it is possible that tsk is blocked
waiting for somebody else's memory allocation after returning from
exit_mm() from do_exit(), isn't it? Then, how is this better than current
code (i.e. sets TIF_MEMDIE to a mm-less thread group leader)?
> +#endif
> atomic_inc(&tsk->signal->oom_victims);
> /*
> * Make sure that the task is woken up from uninterruptible sleep
> --
> Michal Hocko
> SUSE Labs
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-06-24 16:19 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-23 15:58 [PATCH] " Tetsuo Handa
2016-06-23 16:24 ` [PATCH v2] " Tetsuo Handa
2016-06-23 22:58 ` Oleg Nesterov
2016-06-24 9:54 ` Michal Hocko
2016-06-24 10:56 ` Tetsuo Handa
2016-06-24 12:04 ` Michal Hocko
2016-06-24 16:19 ` Tetsuo Handa [this message]
2016-06-27 11:37 ` Michal Hocko
2016-06-27 13:32 ` Tetsuo Handa
2016-06-27 14:06 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201606250119.IIJ30735.FMSHQFVtOLOJOF@I-love.SAKURA.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=oleg@redhat.com \
--cc=rientjes@google.com \
--cc=vdavydov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox