From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org, akpm@linux-foundation.org
Cc: rientjes@google.com, mgorman@suse.de, oleg@redhat.com,
torvalds@linux-foundation.org, hughd@google.com,
andrea@kernel.org, riel@redhat.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, mhocko@suse.com
Subject: Re: [PATCH 5/5] mm, oom_reaper: implement OOM victims queuing
Date: Thu, 4 Feb 2016 19:49:29 +0900 [thread overview]
Message-ID: <201602041949.BIG30715.QVFLFOOOHMtSFJ@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <1454505240-23446-6-git-send-email-mhocko@kernel.org>
Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> wake_oom_reaper has allowed only 1 oom victim to be queued. The main
> reason for that was the simplicity as other solutions would require
> some way of queuing. The current approach is racy and that was deemed
> sufficient as the oom_reaper is considered a best effort approach
> to help with oom handling when the OOM victim cannot terminate in a
> reasonable time. The race could lead to missing an oom victim which can
> get stuck
>
> out_of_memory
> wake_oom_reaper
> cmpxchg // OK
> oom_reaper
> oom_reap_task
> __oom_reap_task
> oom_victim terminates
> atomic_inc_not_zero // fail
> out_of_memory
> wake_oom_reaper
> cmpxchg // fails
> task_to_reap = NULL
>
> This race requires 2 OOM invocations in a short time period which is not
> very likely but certainly not impossible. E.g. the original victim might
> have not released a lot of memory for some reason.
>
> The situation would improve considerably if wake_oom_reaper used a more
> robust queuing. This is what this patch implements. This means adding
> oom_reaper_list list_head into task_struct (eat a hole before embeded
> thread_struct for that purpose) and a oom_reaper_lock spinlock for
> queuing synchronization. wake_oom_reaper will then add the task on the
> queue and oom_reaper will dequeue it.
>
I think we want to rewrite this patch's description from a different point
of view.
As of "[PATCH 1/5] mm, oom: introduce oom reaper", we assumed that we try to
manage OOM livelock caused by system-wide OOM events using the OOM reaper.
Therefore, the OOM reaper had high scheduling priority and we considered side
effect of the OOM reaper as a reasonable constraint.
But as the discussion went by, we started to try to manage OOM livelock
caused by non system-wide OOM events (e.g. memcg OOM) using the OOM reaper.
Therefore, the OOM reaper now has normal scheduling priority. For non
system-wide OOM events, side effect of the OOM reaper might not be a
reasonable constraint. Some administrator might expect that the OOM reaper
does not break coredumping unless the system is under system-wide OOM events.
The race described in this patch's description sounds as if 2 OOM invocations
are by system-wide OOM events. If we consider only system-wide OOM events,
there is no need to keep task_to_reap != NULL after the OOM reaper found
a task to reap (shown below) because existing victim will prevent the OOM
killer from calling wake_oom_reaper().
----------------------------------------
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 012dd6f..c919ddb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1835,9 +1835,6 @@ struct task_struct {
unsigned long task_state_change;
#endif
int pagefault_disabled;
-#ifdef CONFIG_MMU
- struct list_head oom_reaper_list;
-#endif
/* CPU-specific state of this task */
struct thread_struct thread;
/*
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b42c6bc..fa6a302 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -422,10 +422,8 @@ bool oom_killer_disabled __read_mostly;
* victim (if that is possible) to help the OOM killer to move on.
*/
static struct task_struct *oom_reaper_th;
+static struct task_struct *task_to_reap;
static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait);
-static LIST_HEAD(oom_reaper_list);
-static DEFINE_SPINLOCK(oom_reaper_lock);
-
static bool __oom_reap_task(struct task_struct *tsk)
{
@@ -526,20 +524,11 @@ static void oom_reap_task(struct task_struct *tsk)
static int oom_reaper(void *unused)
{
while (true) {
- struct task_struct *tsk = NULL;
+ struct task_struct *tsk;
wait_event_freezable(oom_reaper_wait,
- (!list_empty(&oom_reaper_list)));
- spin_lock(&oom_reaper_lock);
- if (!list_empty(&oom_reaper_list)) {
- tsk = list_first_entry(&oom_reaper_list,
- struct task_struct, oom_reaper_list);
- list_del(&tsk->oom_reaper_list);
- }
- spin_unlock(&oom_reaper_lock);
-
- if (tsk)
- oom_reap_task(tsk);
+ (tsk = xchg(&task_to_reap, NULL)));
+ oom_reap_task(tsk);
}
return 0;
@@ -551,11 +540,11 @@ static void wake_oom_reaper(struct task_struct *tsk)
return;
get_task_struct(tsk);
-
- spin_lock(&oom_reaper_lock);
- list_add(&tsk->oom_reaper_list, &oom_reaper_list);
- spin_unlock(&oom_reaper_lock);
- wake_up(&oom_reaper_wait);
+ tsk = xchg(&task_to_reap, tsk);
+ if (!tsk)
+ wake_up(&oom_reaper_wait);
+ else
+ put_task_struct(tsk);
}
static int __init oom_init(void)
----------------------------------------
But if we consider non system-wide OOM events, it is not very unlikely to hit
this race. This queue is useful for situations where memcg1 and memcg2 hit
memcg OOM at the same time and victim1 in memcg1 cannot terminate immediately.
I expect parallel reaping (shown below) because there is no need to serialize
victim tasks (e.g. wait for reaping victim1 in memcg1 which can take up to
1 second to complete before start reaping victim2 in memcg2) if we implement
this queue.
----------------------------------------
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b42c6bc..c2d6472 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -427,7 +427,7 @@ static LIST_HEAD(oom_reaper_list);
static DEFINE_SPINLOCK(oom_reaper_lock);
-static bool __oom_reap_task(struct task_struct *tsk)
+static bool oom_reap_task(struct task_struct *tsk)
{
struct mmu_gather tlb;
struct vm_area_struct *vma;
@@ -504,42 +504,42 @@ out:
return ret;
}
-#define MAX_OOM_REAP_RETRIES 10
-static void oom_reap_task(struct task_struct *tsk)
-{
- int attempts = 0;
-
- /* Retry the down_read_trylock(mmap_sem) a few times */
- while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task(tsk))
- schedule_timeout_idle(HZ/10);
-
- if (attempts > MAX_OOM_REAP_RETRIES) {
- pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
- task_pid_nr(tsk), tsk->comm);
- debug_show_all_locks();
- }
-
- /* Drop a reference taken by wake_oom_reaper */
- put_task_struct(tsk);
-}
-
static int oom_reaper(void *unused)
{
while (true) {
- struct task_struct *tsk = NULL;
-
+ struct task_struct *tsk;
+ struct task_struct *t;
+ LIST_HEAD(list);
+ int i;
+
wait_event_freezable(oom_reaper_wait,
(!list_empty(&oom_reaper_list)));
spin_lock(&oom_reaper_lock);
- if (!list_empty(&oom_reaper_list)) {
- tsk = list_first_entry(&oom_reaper_list,
- struct task_struct, oom_reaper_list);
- list_del(&tsk->oom_reaper_list);
- }
+ list_splice(&oom_reaper_list, &list);
+ INIT_LIST_HEAD(&oom_reaper_list);
spin_unlock(&oom_reaper_lock);
-
- if (tsk)
- oom_reap_task(tsk);
+ /* Retry the down_read_trylock(mmap_sem) a few times */
+ for (i = 0; i < 10; i++) {
+ list_for_each_entry_safe(tsk, t, &list,
+ oom_reaper_list) {
+ if (!oom_reap_task(tsk))
+ continue;
+ list_del(&tsk->oom_reaper_list);
+ /* Drop a reference taken by wake_oom_reaper */
+ put_task_struct(tsk);
+ }
+ if (list_empty(&list))
+ break;
+ schedule_timeout_idle(HZ/10);
+ }
+ if (list_empty(&list))
+ continue;
+ list_for_each_entry(tsk, &list, oom_reaper_list) {
+ pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
+ task_pid_nr(tsk), tsk->comm);
+ put_task_struct(tsk);
+ }
+ debug_show_all_locks();
}
return 0;
----------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-02-04 10:49 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-03 13:13 [PATCH 0/5] oom reaper v5 Michal Hocko
2016-02-03 13:13 ` [PATCH 1/5] mm, oom: introduce oom reaper Michal Hocko
2016-02-03 23:48 ` David Rientjes
2016-02-04 6:41 ` Michal Hocko
2016-02-06 13:22 ` Tetsuo Handa
2016-02-15 20:50 ` Michal Hocko
2016-02-03 13:13 ` [PATCH 2/5] oom reaper: handle mlocked pages Michal Hocko
2016-02-03 23:57 ` David Rientjes
2016-02-23 1:36 ` David Rientjes
2016-02-23 13:21 ` Michal Hocko
2016-02-29 3:19 ` Hugh Dickins
2016-02-29 13:41 ` Michal Hocko
2016-03-08 13:40 ` Michal Hocko
2016-03-08 20:07 ` Hugh Dickins
2016-03-09 8:26 ` Michal Hocko
2016-02-03 13:13 ` [PATCH 3/5] oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space Michal Hocko
2016-02-04 14:22 ` Tetsuo Handa
2016-02-04 14:43 ` Michal Hocko
2016-02-04 15:08 ` Tetsuo Handa
2016-02-04 16:31 ` Michal Hocko
2016-02-05 11:14 ` Tetsuo Handa
2016-02-06 8:30 ` Michal Hocko
2016-02-06 11:23 ` Tetsuo Handa
2016-02-15 20:47 ` Michal Hocko
2016-02-06 6:45 ` Michal Hocko
2016-02-06 14:33 ` Tetsuo Handa
2016-02-15 20:40 ` [PATCH 3.1/5] oom: make oom_reaper freezable Michal Hocko
2016-02-25 11:28 ` [PATCH 3/5] oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space Tetsuo Handa
2016-02-25 11:31 ` Tetsuo Handa
2016-02-25 14:16 ` Michal Hocko
2016-02-03 13:13 ` [PATCH 4/5] mm, oom_reaper: report success/failure Michal Hocko
2016-02-03 23:10 ` David Rientjes
2016-02-04 6:46 ` Michal Hocko
2016-02-04 22:31 ` David Rientjes
2016-02-05 9:26 ` Michal Hocko
2016-02-06 6:34 ` Michal Hocko
2016-02-03 13:14 ` [PATCH 5/5] mm, oom_reaper: implement OOM victims queuing Michal Hocko
2016-02-04 10:49 ` Tetsuo Handa [this message]
2016-02-04 14:53 ` Michal Hocko
2016-02-06 5:54 ` Tetsuo Handa
2016-02-06 8:37 ` Michal Hocko
2016-02-06 15:33 ` Tetsuo Handa
2016-02-15 20:15 ` Michal Hocko
2016-02-16 11:11 ` Tetsuo Handa
2016-02-16 15:53 ` Michal Hocko
2016-02-17 9:48 ` [PATCH 6/5] oom, oom_reaper: disable oom_reaper for Michal Hocko
2016-02-17 10:41 ` Tetsuo Handa
2016-02-17 11:33 ` Michal Hocko
2016-02-19 18:34 ` Michal Hocko
2016-02-20 2:32 ` [PATCH 6/5] oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task Tetsuo Handa
2016-02-22 9:41 ` Michal Hocko
2016-02-29 1:26 ` Tetsuo Handa
2016-03-15 11:15 ` Tetsuo Handa
2016-03-15 11:43 ` Michal Hocko
2016-03-15 11:50 ` Michal Hocko
2016-03-16 11:16 ` Tetsuo Handa
2016-03-17 10:49 ` Tetsuo Handa
2016-03-17 12:17 ` Michal Hocko
2016-03-17 13:00 ` Tetsuo Handa
2016-03-17 13:23 ` Michal Hocko
2016-03-17 14:34 ` Tetsuo Handa
2016-03-17 14:54 ` Michal Hocko
2016-03-17 15:20 ` Tetsuo Handa
2016-03-17 12:14 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201602041949.BIG30715.QVFLFOOOHMtSFJ@I-love.SAKURA.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=akpm@linux-foundation.org \
--cc=andrea@kernel.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=oleg@redhat.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox