From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Andrea Arcangeli <aarcange@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
Date: Tue, 5 Dec 2017 18:58:08 -0800 (PST) [thread overview]
Message-ID: <alpine.DEB.2.10.1712051857450.98120@chino.kir.corp.google.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1712051824050.91099@chino.kir.corp.google.com>
On Tue, 5 Dec 2017, David Rientjes wrote:
> One way to solve the issue is to have two mm flags: one to indicate the mm
> is entering unmap_vmas(): set the flag, do down_write(&mm->mmap_sem);
> up_write(&mm->mmap_sem), then unmap_vmas(). The oom reaper needs this
> flag clear, not MMF_OOM_SKIP, while holding down_read(&mm->mmap_sem) to be
> allowed to call unmap_page_range(). The oom killer will still defer
> selecting this victim for MMF_OOM_SKIP after unmap_vmas() returns.
>
> The result of that change would be that we do not oom reap from any mm
> entering unmap_vmas(): we let unmap_vmas() do the work itself and avoid
> racing with it.
>
I think we need something like the following?
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -70,6 +70,7 @@ static inline int get_dumpable(struct mm_struct *mm)
#define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */
#define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */
#define MMF_DISABLE_THP 24 /* disable THP for all VMAs */
+#define MMF_REAPING 25 /* mm is undergoing reaping */
#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3014,16 +3014,11 @@ void exit_mmap(struct mm_struct *mm)
lru_add_drain();
flush_cache_mm(mm);
- tlb_gather_mmu(&tlb, mm, 0, -1);
- /* update_hiwater_rss(mm) here? but nobody should be looking */
- /* Use -1 here to ensure all VMAs in the mm are unmapped */
- unmap_vmas(&tlb, vma, 0, -1);
-
- set_bit(MMF_OOM_SKIP, &mm->flags);
+ set_bit(MMF_REAPING, &mm->flags);
if (unlikely(tsk_is_oom_victim(current))) {
/*
* Wait for oom_reap_task() to stop working on this
- * mm. Because MMF_OOM_SKIP is already set before
+ * mm. Because MMF_REAPING is already set before
* calling down_read(), oom_reap_task() will not run
* on this "mm" post up_write().
*
@@ -3036,6 +3031,11 @@ void exit_mmap(struct mm_struct *mm)
down_write(&mm->mmap_sem);
up_write(&mm->mmap_sem);
}
+ tlb_gather_mmu(&tlb, mm, 0, -1);
+ /* update_hiwater_rss(mm) here? but nobody should be looking */
+ /* Use -1 here to ensure all VMAs in the mm are unmapped */
+ unmap_vmas(&tlb, vma, 0, -1);
+ set_bit(MMF_OOM_SKIP, &mm->flags);
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb, 0, -1);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -529,12 +529,12 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
}
/*
- * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
- * work on the mm anymore. The check for MMF_OOM_SKIP must run
+ * MMF_REAPING is set by exit_mmap when the OOM reaper can't
+ * work on the mm anymore. The check for MMF_REAPING must run
* under mmap_sem for reading because it serializes against the
* down_write();up_write() cycle in exit_mmap().
*/
- if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
+ if (test_bit(MMF_REAPING, &mm->flags)) {
up_read(&mm->mmap_sem);
trace_skip_task_reaping(tsk->pid);
goto unlock_oom;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-12-06 2:58 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-06 2:43 David Rientjes
2017-12-06 2:58 ` David Rientjes [this message]
2017-12-06 3:28 ` Tetsuo Handa
2017-12-06 7:48 ` David Rientjes
2017-12-06 9:00 ` Michal Hocko
2017-12-07 7:20 ` Tetsuo Handa
2017-12-07 8:28 ` Michal Hocko
2017-12-07 8:44 ` Michal Hocko
2017-12-07 11:00 ` Tetsuo Handa
2017-12-07 21:22 ` David Rientjes
2017-12-08 7:50 ` Michal Hocko
2017-12-06 11:37 ` Tetsuo Handa
2017-12-06 8:50 ` Michal Hocko
2017-12-06 8:31 ` Michal Hocko
2017-12-07 11:35 ` Michal Hocko
2017-12-07 15:44 ` Tetsuo Handa
2017-12-07 16:30 ` Michal Hocko
2017-12-07 21:55 ` David Rientjes
2017-12-08 9:26 ` David Rientjes
2017-12-08 11:27 ` Michal Hocko
2017-12-08 10:11 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.10.1712051857450.98120@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=penguin-kernel@i-love.sakura.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox