From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: linux-mm@kvack.org, rientjes@google.com, akpm@linux-foundation.org
Subject: Re: [PATCH] mm,oom: Re-enable OOM killer using timeout.
Date: Tue, 19 Apr 2016 16:07:54 -0400 [thread overview]
Message-ID: <20160419200752.GA10437@dhcp22.suse.cz> (raw)
In-Reply-To: <201604200006.FBG45192.SOHFQJFOOLFMtV@I-love.SAKURA.ne.jp>
On Wed 20-04-16 00:06:05, Tetsuo Handa wrote:
> >From dbe7dcb4ff757d5b1865e935f840d0418af804d3 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Tue, 19 Apr 2016 23:11:31 +0900
> Subject: [PATCH] mm,oom: Re-enable OOM killer using timeout.
>
> We are trying to reduce the possibility of hitting OOM livelock by
> introducing the OOM reaper. But the OOM reaper cannot reap the victim's
> memory if the victim's mmap_sem is held for write. It is possible that
> the thread which got TIF_MEMDIE while holding mmap_sem for write gets
> stuck at unkillable wait waiting for other thread's memory allocation.
> This problem cannot be avoided even after we convert
> down_write(&mm->mmap_sem) to down_write_killable(&mm->mmap_sem).
> Since we cannot afford converting all waits killable, we should prepare
> for such situation.
>
> The simplest way is to mark the victim's thread group as no longer
> OOM-killable by updating victim's signal->oom_score_adj to
> OOM_SCORE_ADJ_MIN.
>
> But doing so is not sufficient for !oom_kill_allocating_task case
> because oom_scan_process_thread() will find TIF_MEMDIE thread and
> continue waiting. We will need to revoke TIF_MEMDIE from all victim
> threads but TIF_MEMDIE will be automatically granted to potentially all
> victim threads due to fatal_signal_pending() or task_will_free_mem() in
> out_of_memory(). We don't want to walk the process list so many times
> in order to revoke TIF_MEMDIE from all victim threads from racy loop.
>
> Also, doing so breaks oom_kill_allocating_task case because we will not
> wait for existing TIF_MEMDIE threads because oom_scan_process_thread()
> is not called. As a result, all children of the calling process will be
> needlessly OOM-killed.
>
> Therefore, we should not play with victim's signal->oom_score_adj value
> and/or victim's TIF_MEMDIE flag.
>
> This patch adds a timeout for handling corner cases where a TIF_MEMDIE
> thread got stuck. Since the timeout is checked at oom_unkillable_task(),
> oom_scan_process_thread() will not find TIF_MEMDIE thread
> (for !oom_kill_allocating_task case) and oom_badness() will return 0
> (for oom_kill_allocating_task case).
>
> By applying this patch, the kernel will automatically press SysRq-f if
> the OOM reaper cannot reap the victim's memory, and we will never OOM
> livelock forever as long as the OOM killer is called.
Which will not guarantee anything as already pointed out several times
before. So I think this is not really that useful. I have said it
earlier and will repeat it again. Any timeout based solution which
doesn't guarantee that the system will be in a consistent state (reboot,
panic or kill all existing tasks) after the specified timeout is
pointless.
I believe that the chances of the lockup are much less likely with the
oom reaper and that we are not really urged to provide a new knob with a
random semantic. If we really want to have a timeout based thing better
make it behave reliably.
> For those who prefer existing behavior (i.e. let the kernel OOM livelock
> forever if the OOM reaper cannot reap the victim's memory), the timeout
> is set to very large value (effectively no timeout) by default.
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
> include/linux/oom.h | 2 ++
> include/linux/sched.h | 1 +
> kernel/sysctl.c | 12 ++++++++++++
> mm/oom_kill.c | 18 ++++++++++++++++++
> 4 files changed, 33 insertions(+)
>
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index abaab8e..4d2a97e 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -112,4 +112,6 @@ static inline bool task_will_free_mem(struct task_struct *task)
> extern int sysctl_oom_dump_tasks;
> extern int sysctl_oom_kill_allocating_task;
> extern int sysctl_panic_on_oom;
> +extern unsigned long sysctl_oom_victim_wait_timeout;
> +
> #endif /* _INCLUDE_LINUX_OOM_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d8f366c..6e701b3 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -790,6 +790,7 @@ struct signal_struct {
> short oom_score_adj; /* OOM kill score adjustment */
> short oom_score_adj_min; /* OOM kill score adjustment min value.
> * Only settable by CAP_SYS_RESOURCE. */
> + unsigned long oom_start; /* If not 0, timestamp of being OOM-killed. */
>
> struct mutex cred_guard_mutex; /* guard against foreign influences on
> * credential calculations
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index d11c22d..3092ec2 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -147,6 +147,9 @@ static const int cap_last_cap = CAP_LAST_CAP;
> static unsigned long hung_task_timeout_max = (LONG_MAX/HZ);
> #endif
>
> +static unsigned long oom_victim_wait_timeout_min = 1;
> +static unsigned long oom_victim_wait_timeout_max = (LONG_MAX / HZ);
> +
> #ifdef CONFIG_INOTIFY_USER
> #include <linux/inotify.h>
> #endif
> @@ -1222,6 +1225,15 @@ static struct ctl_table vm_table[] = {
> .proc_handler = proc_dointvec,
> },
> {
> + .procname = "oom_victim_wait_timeout",
> + .data = &sysctl_oom_victim_wait_timeout,
> + .maxlen = sizeof(sysctl_oom_victim_wait_timeout),
> + .mode = 0644,
> + .proc_handler = proc_doulongvec_minmax,
> + .extra1 = &oom_victim_wait_timeout_min,
> + .extra2 = &oom_victim_wait_timeout_max,
> + },
> + {
> .procname = "overcommit_ratio",
> .data = &sysctl_overcommit_ratio,
> .maxlen = sizeof(sysctl_overcommit_ratio),
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 7098104..a2e1543a 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -47,6 +47,7 @@
> int sysctl_panic_on_oom;
> int sysctl_oom_kill_allocating_task;
> int sysctl_oom_dump_tasks = 1;
> +unsigned long sysctl_oom_victim_wait_timeout = (LONG_MAX / HZ);
>
> DEFINE_MUTEX(oom_lock);
>
> @@ -149,6 +150,12 @@ static bool oom_unkillable_task(struct task_struct *p,
> if (!has_intersects_mems_allowed(p, nodemask))
> return true;
>
> + /* Already OOM-killed p might get stuck at unkillable wait */
> + if (p->signal->oom_start &&
> + time_after(jiffies, p->signal->oom_start
> + + sysctl_oom_victim_wait_timeout * HZ))
> + return true;
> +
> return false;
> }
>
> @@ -668,6 +675,17 @@ void mark_oom_victim(struct task_struct *tsk)
> if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
> return;
> /*
> + * The task might get stuck at unkillable wait with mmap_sem held for
> + * write. In that case, even the OOM reaper will not help.
> + */
> + if (!tsk->signal->oom_start) {
> + unsigned long oom_start = jiffies;
> +
> + if (!oom_start)
> + oom_start--;
> + tsk->signal->oom_start = oom_start;
> + }
> + /*
> * Make sure that the task is woken up from uninterruptible sleep
> * if it is frozen because OOM killer wouldn't be able to free
> * any memory and livelock. freezing_slow_path will tell the freezer
> --
> 1.8.3.1
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-04-19 20:07 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-19 15:06 Tetsuo Handa
2016-04-19 20:07 ` Michal Hocko [this message]
2016-04-19 21:55 ` Tetsuo Handa
2016-04-20 10:37 ` [PATCH v2] " Tetsuo Handa
2016-04-25 11:47 ` Michal Hocko
2016-04-26 14:00 ` Tetsuo Handa
2016-04-26 14:31 ` Michal Hocko
2016-04-27 10:43 ` Tetsuo Handa
2016-04-20 14:47 ` [PATCH] " Michal Hocko
2016-04-21 11:49 ` Tetsuo Handa
2016-04-21 13:07 ` Michal Hocko
2016-04-24 14:19 ` Tetsuo Handa
2016-04-25 9:55 ` Michal Hocko
2016-04-26 13:54 ` Michal Hocko
2016-04-27 10:43 ` Tetsuo Handa
2016-04-27 11:11 ` Michal Hocko
2016-05-14 0:39 ` Tetsuo Handa
2016-05-16 14:18 ` Michal Hocko
2016-05-17 11:08 ` Tetsuo Handa
2016-05-17 12:51 ` Michal Hocko
2016-04-26 14:00 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160419200752.GA10437@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox