From: Michal Hocko <mhocko@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
Andrea Arcangeli <aarcange@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
Date: Wed, 6 Dec 2017 09:31:30 +0100 [thread overview]
Message-ID: <20171206083130.GC16386@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.10.1712051824050.91099@chino.kir.corp.google.com>
On Tue 05-12-17 18:43:48, David Rientjes wrote:
> Hi,
>
> I'd like to understand the synchronization between the oom_reaper's
> unmap_page_range() and exit_mmap(). The latter does not hold
> mm->mmap_sem: it's supposed to be the last thread operating on the mm
> before it is destroyed.
>
> If unmap_page_range() races with unmap_vmas(), we trivially call
> page_remove_rmap() twice on the same page:
Well, the oom reaper is basically MADV_DONTNEED and that allows
parallel tear down (it takes only mmap_sem for read). The exit path
doesn't take the mmap_sem during unmap_vmas but that shouldn't make any
difference because both path would take it for read anyway. The
essential synchronization between oom reaper and exit_mmap is
exit_mmap:
set_bit(MMF_OOM_SKIP, &mm->flags);
if (unlikely(tsk_is_oom_victim(current))) {
/*
* Wait for oom_reap_task() to stop working on this
* mm. Because MMF_OOM_SKIP is already set before
* calling down_read(), oom_reap_task() will not run
* on this "mm" post up_write().
*
* tsk_is_oom_victim() cannot be set from under us
* either because current->mm is already set to NULL
* under task_lock before calling mmput and oom_mm is
* set not NULL by the OOM killer only if current->mm
* is found not NULL while holding the task_lock.
*/
down_write(&mm->mmap_sem);
up_write(&mm->mmap_sem);
}
oom_reaper
if (!down_read_trylock(&mm->mmap_sem)) {
/*
* MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
* work on the mm anymore. The check for MMF_OOM_SKIP must run
* under mmap_sem for reading because it serializes against the
* down_write();up_write() cycle in exit_mmap().
*/
if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
which makes sure that the reaper doesn't race with free_pgtables.
> BUG: Bad page map in process oom_reaper pte:6353826300000000 pmd:00000000
Hmm, this is really strange. This is a pte without a pmd or is the
output just incomplete.
> addr:00007f50cab1d000 vm_flags:08100073 anon_vma:ffff9eea335603f0 mapping: (null) index:7f50cab1d
> file: (null) fault: (null) mmap: (null) readpage: (null)
> CPU: 2 PID: 1001 Comm: oom_reaper
> Call Trace:
> [<ffffffffa4bd967d>] dump_stack+0x4d/0x70
> [<ffffffffa4a03558>] unmap_page_range+0x1068/0x1130
could you use addr2line to get the exact spot where this triggered
please?
> [<ffffffffa4a2e07f>] __oom_reap_task_mm+0xd5/0x16b
> [<ffffffffa4a2e226>] oom_reaper+0xff/0x14c
> [<ffffffffa48d6ad1>] kthread+0xc1/0xe0
>
> And there are more examples of badness from an unmap_page_range() racing
> with unmap_vmas(). In this case, MMF_OOM_SKIP is doing two things: (1)
> avoiding additional oom kills until unmap_vmas() returns and (2) avoid the
> oom_reaper working on the mm after unmap_vmas(). In (2), there's nothing
> preventing the oom reaper from calling unmap_page_range() in parallel with
> the final thread doing unmap_vmas() -- we no longer do mmget() to prevent
> exit_mmap() from being called.
Yes and that is an intentional behavior. There shouldn't be any reason
to exclude the two because this should be equivalent to calling
MADV_DONTNEED in parallel.
I will get to the rest of your email later because the above is the
essential assumption 212925802454 ("mm: oom: let oom_reap_task and
exit_mmap run concurrently") builds on. If it is not correct then we
have a bigger problem.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-12-06 8:31 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-06 2:43 David Rientjes
2017-12-06 2:58 ` David Rientjes
2017-12-06 3:28 ` Tetsuo Handa
2017-12-06 7:48 ` David Rientjes
2017-12-06 9:00 ` Michal Hocko
2017-12-07 7:20 ` Tetsuo Handa
2017-12-07 8:28 ` Michal Hocko
2017-12-07 8:44 ` Michal Hocko
2017-12-07 11:00 ` Tetsuo Handa
2017-12-07 21:22 ` David Rientjes
2017-12-08 7:50 ` Michal Hocko
2017-12-06 11:37 ` Tetsuo Handa
2017-12-06 8:50 ` Michal Hocko
2017-12-06 8:31 ` Michal Hocko [this message]
2017-12-07 11:35 ` Michal Hocko
2017-12-07 15:44 ` Tetsuo Handa
2017-12-07 16:30 ` Michal Hocko
2017-12-07 21:55 ` David Rientjes
2017-12-08 9:26 ` David Rientjes
2017-12-08 11:27 ` Michal Hocko
2017-12-08 10:11 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171206083130.GC16386@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox