From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, Roman Gushchin <guro@fb.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>
Subject: Re: [RFC PATCH 0/3] rework mmap-exit vs. oom_reaper handover
Date: Sat, 15 Sep 2018 02:07:46 +0900 [thread overview]
Message-ID: <8a766c49-3832-4a34-cba7-c4c24fdca8e0@i-love.sakura.ne.jp> (raw)
In-Reply-To: <20180914141457.GB6081@dhcp22.suse.cz>
On 2018/09/14 23:14, Michal Hocko wrote:
> On Fri 14-09-18 22:54:45, Tetsuo Handa wrote:
>> OK, next question.
>> Is it guaranteed that arch_exit_mmap(mm) is safe with the OOM reaper?
>
> I do not see any obvious problem and we used to allow to race unmaping
> in exit and oom_reaper paths before we had to handle mlocked vmas
> specially.
Although we used to allow arch_exit_mmap() to race, it might be nothing but
we hit mlock() problem first. I want "clearly no problem".
>> Well, anyway, diffstat of your proposal would be
>>
>> include/linux/oom.h | 2 --
>> mm/internal.h | 3 +++
>> mm/memory.c | 28 ++++++++++++--------
>> mm/mmap.c | 73 +++++++++++++++++++++++++++++++----------------------
>> mm/oom_kill.c | 46 ++++++++++++++++++++++++---------
>> 5 files changed, 98 insertions(+), 54 deletions(-)
>>
>> trying to hand over only __free_pgtables() part at the risk of
>> setting MMF_OOM_SKIP without reclaiming any memory due to dropping
>> __oom_reap_task_mm() and scattering down_write()/up_write() inside
>> exit_mmap(), while diffstat of my proposal (not tested yet) would be
>>
>> include/linux/mm_types.h | 2 +
>> include/linux/oom.h | 3 +-
>> include/linux/sched.h | 2 +-
>> kernel/fork.c | 11 +++
>> mm/mmap.c | 42 ++++-------
>> mm/oom_kill.c | 182 ++++++++++++++++++++++-------------------------
>> 6 files changed, 117 insertions(+), 125 deletions(-)
>>
>> trying to wait until __mmput() completes and also trying to handle
>> multiple OOM victims in parallel.
Bottom is the fix-up for my proposal. It seems to be working well enough.
include/linux/oom.h | 1 -
kernel/fork.c | 2 +-
mm/oom_kill.c | 30 ++++++++++++------------------
3 files changed, 13 insertions(+), 20 deletions(-)
>>
>> You are refusing timeout based approach but I don't think this is
>> something we have to be frayed around the edge about possibility of
>> overlooking races/bugs because you don't want to use timeout. And you
>> have never showed that timeout based approach cannot work well enough.
>
> I have tried to explain why I do not like the timeout based approach
> several times alreay and I am getting fed up repeating it over and over
> again. The main point though is that we know _what_ we are waiting for
> and _how_ we are synchronizing different parts rather than let's wait
> some time and hopefully something happens.
At the risk of overlooking bugs. Quite few persons are checking OOM lockup
possibility which is a dangerous thing for taking your aggressive approach.
>
> Moreover, we have a backoff mechanism. The new class of oom victims
> with a large amount of memory in page tables can fit into that
> model. The new model adds few more branches to the exit path so if this
> is acceptable for other mm developers then I think this is much more
> preferrable to add a diffrent retry mechanism.
>
These "few more branches" have to be "clearly no problem" rather than
"passed some stress tests". And so far no response from other mm developers.
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 8a987c6..9d30c15 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -104,7 +104,6 @@ extern unsigned long oom_badness(struct task_struct *p,
extern bool out_of_memory(struct oom_control *oc);
extern void exit_oom_victim(void);
-extern void exit_oom_mm(struct mm_struct *mm);
extern int register_oom_notifier(struct notifier_block *nb);
extern int unregister_oom_notifier(struct notifier_block *nb);
diff --git a/kernel/fork.c b/kernel/fork.c
index 3e662bb..5c32791 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1018,7 +1018,7 @@ static inline void __mmput(struct mm_struct *mm)
if (mm->binfmt)
module_put(mm->binfmt->module);
if (oom)
- exit_oom_mm(mm);
+ set_bit(MMF_OOM_SKIP, &mm->flags);
mmdrop(mm);
}
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 01fa0d7..cff41fa 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -561,6 +561,14 @@ static void oom_reap_task(struct task_struct *tsk)
struct mm_struct *mm = tsk->signal->oom_mm;
unsigned long pages;
+ if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
+ spin_lock(&oom_reaper_lock);
+ list_del(&tsk->oom_victim);
+ spin_unlock(&oom_reaper_lock);
+ /* Drop a reference taken by wake_oom_reaper(). */
+ put_task_struct(tsk);
+ return;
+ }
oom_reap_task_mm(tsk, mm);
pages = oom_badness_pages(mm);
/* Hide this mm from OOM killer if stalled for too long. */
@@ -581,6 +589,7 @@ static int oom_reaper(void *unused)
{
while (true) {
struct task_struct *tsk;
+ struct task_struct *tmp;
if (!list_empty(&oom_reaper_list))
schedule_timeout_uninterruptible(HZ / 10);
@@ -588,32 +597,17 @@ static int oom_reaper(void *unused)
wait_event_freezable(oom_reaper_wait,
!list_empty(&oom_reaper_list));
spin_lock(&oom_reaper_lock);
- list_for_each_entry(tsk, &oom_reaper_list, oom_victim) {
- get_task_struct(tsk);
+ list_for_each_entry_safe(tsk, tmp, &oom_reaper_list,
+ oom_victim) {
spin_unlock(&oom_reaper_lock);
oom_reap_task(tsk);
spin_lock(&oom_reaper_lock);
- put_task_struct(tsk);
}
spin_unlock(&oom_reaper_lock);
}
return 0;
}
-void exit_oom_mm(struct mm_struct *mm)
-{
- struct task_struct *tsk;
-
- spin_lock(&oom_reaper_lock);
- list_for_each_entry(tsk, &oom_reaper_list, oom_victim)
- if (tsk->signal->oom_mm == mm)
- break;
- list_del(&tsk->oom_victim);
- spin_unlock(&oom_reaper_lock);
- /* Drop a reference taken by wake_oom_reaper(). */
- put_task_struct(tsk);
-}
-
static void wake_oom_reaper(struct task_struct *tsk)
{
struct mm_struct *mm = tsk->signal->oom_mm;
@@ -632,7 +626,7 @@ static void wake_oom_reaper(struct task_struct *tsk)
get_task_struct(tsk);
spin_lock(&oom_reaper_lock);
- list_add_tail(&tsk->oom_victim, &oom_reaper_list);
+ list_add(&tsk->oom_victim, &oom_reaper_list);
spin_unlock(&oom_reaper_lock);
trace_wake_reaper(tsk->pid);
wake_up(&oom_reaper_wait);
--
1.8.3.1
prev parent reply other threads:[~2018-09-14 17:08 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-08 4:54 [PATCH v2] mm, oom: Fix unnecessary killing of additional processes Tetsuo Handa
2018-09-10 9:54 ` Michal Hocko
2018-09-10 11:27 ` Tetsuo Handa
2018-09-10 11:40 ` Michal Hocko
2018-09-10 12:52 ` Tetsuo Handa
2018-09-10 12:55 ` [RFC PATCH 0/3] rework mmap-exit vs. oom_reaper handover Michal Hocko
2018-09-10 12:55 ` [RFC PATCH 1/3] mm, oom: rework mmap_exit vs. oom_reaper synchronization Michal Hocko
2018-09-10 12:55 ` [RFC PATCH 2/3] mm, oom: keep retrying the oom_reap operation as long as there is substantial memory left Michal Hocko
2018-09-10 12:55 ` [RFC PATCH 3/3] mm, oom: hand over MMF_OOM_SKIP to exit path if it is guranteed to finish Michal Hocko
2018-09-10 14:59 ` [RFC PATCH 0/3] rework mmap-exit vs. oom_reaper handover Tetsuo Handa
2018-09-10 15:11 ` Michal Hocko
2018-09-10 15:40 ` Tetsuo Handa
2018-09-10 16:44 ` Michal Hocko
2018-09-12 3:06 ` Tetsuo Handa
2018-09-12 7:18 ` Michal Hocko
2018-09-12 7:58 ` Tetsuo Handa
2018-09-12 8:17 ` Michal Hocko
2018-09-12 10:59 ` Tetsuo Handa
2018-09-12 11:22 ` Michal Hocko
2018-09-11 14:01 ` Tetsuo Handa
2018-09-12 7:50 ` Michal Hocko
2018-09-12 13:42 ` Michal Hocko
2018-09-13 2:44 ` Tetsuo Handa
2018-09-13 9:09 ` Michal Hocko
2018-09-13 11:20 ` Tetsuo Handa
2018-09-13 11:35 ` Michal Hocko
2018-09-13 11:53 ` Tetsuo Handa
2018-09-13 13:40 ` Michal Hocko
2018-09-14 13:54 ` Tetsuo Handa
2018-09-14 14:14 ` Michal Hocko
2018-09-14 17:07 ` Tetsuo Handa [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a766c49-3832-4a34-cba7-c4c24fdca8e0@i-love.sakura.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=akpm@linux-foundation.org \
--cc=guro@fb.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox