From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, oleg@redhat.com, vdavydov@virtuozzo.com,
rientjes@google.com
Subject: Re: [PATCH] mm,oom: use per signal_struct flag rather than clear TIF_MEMDIE
Date: Sat, 25 Jun 2016 00:54:09 +0900 [thread overview]
Message-ID: <201606250054.AIF67056.OOSLVtMOJFFFQH@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20160624123953.GC20203@dhcp22.suse.cz>
Michal Hocko wrote:
> On Fri 24-06-16 20:02:01, Tetsuo Handa wrote:
> > Currently, the OOM reaper calls exit_oom_victim() on remote TIF_MEMDIE
> > thread after an OOM reap attempt was made. This behavior is intended
> > for allowing oom_scan_process_thread() to select next OOM victim by
> > making atomic_read(&task->signal->oom_victims) == 0.
> >
> > But since threads can be blocked for unbounded period at __mmput() from
> > mmput() from exit_mm() from do_exit(), we can't risk the OOM reaper
> > being blocked for unbounded period waiting for TIF_MEMDIE threads.
> > Therefore, when we hit a situation that a TIF_MEMDIE thread which is
> > the only thread of that thread group reached tsk->mm = NULL line in
> > exit_mm() from do_exit() before __oom_reap_task() finds a mm via
> > find_lock_task_mm(), oom_reap_task() does not wait for the TIF_MEMDIE
> > thread to return from __mmput() and instead calls exit_oom_victim().
> >
> > Patch "mm, oom: hide mm which is shared with kthread or global init"
> > tried to avoid OOM livelock by setting MMF_OOM_REAPED, but it is racy
> > because setting MMF_OOM_REAPED will not help when find_lock_task_mm()
> > in oom_scan_process_thread() failed.
>
> I haven't thought that through yet (I will wait for the monday fresh
> brain) but wouldn't the following be sufficient?
> ---
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4c21f744daa6..72360d7284a6 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -295,7 +295,8 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
> if (test_bit(MMF_OOM_REAPED, &p->mm->flags))
> ret = OOM_SCAN_CONTINUE;
> task_unlock(p);
> - }
> + } else if (task->state == EXIT_ZOMBIE)
> + ret = OOM_SCAN_CONTINUE;
I think EXIT_ZOMBIE is too late, for it is exit_notify() stage from do_exit()
which sets EXIT_ZOMBIE state.
The stage my patch is trying to rescue is __mmput() from mmput() from
exit_mm() from do_exit(). It is something like doing
----------
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4c21f74..f1f892e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -289,11 +289,11 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
*/
if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims)) {
struct task_struct *p = find_lock_task_mm(task);
- enum oom_scan_t ret = OOM_SCAN_ABORT;
+ enum oom_scan_t ret = OOM_SCAN_CONTINUE;
if (p) {
- if (test_bit(MMF_OOM_REAPED, &p->mm->flags))
- ret = OOM_SCAN_CONTINUE;
+ if (!test_bit(MMF_OOM_REAPED, &p->mm->flags))
+ ret = OOM_SCAN_ABORT;
task_unlock(p);
}
----------
which is effectively nearly equivalent with doing
----------
diff --git a/kernel/exit.c b/kernel/exit.c
index 84ae830..ea188a7 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -509,9 +509,9 @@ static void exit_mm(struct task_struct *tsk)
enter_lazy_tlb(mm, current);
task_unlock(tsk);
mm_update_next_owner(mm);
- mmput(mm);
if (test_thread_flag(TIF_MEMDIE))
exit_oom_victim(tsk);
+ mmput(mm);
}
static struct task_struct *find_alive_thread(struct task_struct *p)
----------
because we don't want to risk the OOM killer wait forever when __mmput() from
mmput() from exit_mm() from do_exit() is blocked at memory allocation.
If we had some timer or timeout, we can call
if (test_thread_flag(TIF_MEMDIE))
exit_oom_victim(tsk);
when __mmput() from mmput() from exit_mm() from do_exit() is blocked for
too long. But since we don't put !can_oom_reap TIF_MEMDIE thread under the
OOM reaper's supervision, we need to do something equivalent to calling
exit_oom_victim(tsk) early, at the cost of increasing possibility of
needlessly selecting next OOM victim.
>
> return ret;
> }
> @@ -592,14 +593,7 @@ static void oom_reap_task(struct task_struct *tsk)
> debug_show_all_locks();
> }
>
> - /*
> - * Clear TIF_MEMDIE because the task shouldn't be sitting on a
> - * reasonably reclaimable memory anymore or it is not a good candidate
> - * for the oom victim right now because it cannot release its memory
> - * itself nor by the oom reaper.
> - */
> tsk->oom_reaper_list = NULL;
> - exit_oom_victim(tsk);
>
> /* Drop a reference taken by wake_oom_reaper */
> put_task_struct(tsk);
> --
> Michal Hocko
> SUSE Labs
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-06-24 15:54 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-24 11:02 Tetsuo Handa
2016-06-24 12:39 ` Michal Hocko
2016-06-24 15:54 ` Tetsuo Handa [this message]
2016-06-24 22:42 ` Oleg Nesterov
2016-06-24 21:56 ` Oleg Nesterov
2016-06-25 5:44 ` Tetsuo Handa
2016-06-27 9:23 ` Michal Hocko
2016-06-27 10:36 ` Michal Hocko
2016-06-27 15:51 ` Oleg Nesterov
2016-06-27 16:06 ` Michal Hocko
2016-06-27 17:55 ` Oleg Nesterov
2016-06-28 10:19 ` Michal Hocko
2016-06-29 0:13 ` Oleg Nesterov
2016-06-29 8:33 ` Michal Hocko
2016-06-29 14:19 ` Michal Hocko
2016-07-01 10:15 ` Tetsuo Handa
2016-06-29 20:01 ` Oleg Nesterov
2016-06-30 7:59 ` Michal Hocko
2016-06-30 10:51 ` Tetsuo Handa
2016-06-30 11:21 ` Michal Hocko
2016-07-03 13:32 ` Oleg Nesterov
2016-07-03 13:21 ` Oleg Nesterov
2016-07-07 11:51 ` Michal Hocko
2016-07-07 16:42 ` Oleg Nesterov
2016-06-29 20:14 ` Oleg Nesterov
2016-06-30 8:07 ` Michal Hocko
2016-07-03 13:24 ` Oleg Nesterov
2016-06-27 21:09 ` Oleg Nesterov
2016-06-28 10:26 ` Michal Hocko
2016-06-29 19:34 ` Oleg Nesterov
2016-06-27 20:40 ` Oleg Nesterov
2016-06-28 10:29 ` Michal Hocko
2016-06-29 20:24 ` Oleg Nesterov
2016-06-30 8:16 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201606250054.AIF67056.OOSLVtMOJFFFQH@I-love.SAKURA.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=oleg@redhat.com \
--cc=rientjes@google.com \
--cc=vdavydov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox