From: David Rientjes <rientjes@google.com>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: mhocko@kernel.org, Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
guro@fb.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch v2] mm, oom: fix concurrent munlock and oom reaperunmap
Date: Mon, 23 Apr 2018 22:35:06 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.21.1804232231020.82340@chino.kir.corp.google.com> (raw)
In-Reply-To: <201804240511.w3O5BY4o090598@www262.sakura.ne.jp>
On Tue, 24 Apr 2018, Tetsuo Handa wrote:
> > > We can call __oom_reap_task_mm() from exit_mmap() (or __mmput()) before
> > > exit_mmap() holds mmap_sem for write. Then, at least memory which could
> > > have been reclaimed if exit_mmap() did not hold mmap_sem for write will
> > > be guaranteed to be reclaimed before MMF_OOM_SKIP is set.
> > >
> >
> > I think that's an exceptionally good idea and will mitigate the concerns
> > of others.
> >
> > It can be done without holding mm->mmap_sem in exit_mmap() and uses the
> > same criteria that the oom reaper uses to set MMF_OOM_SKIP itself, so we
> > don't get dozens of unnecessary oom kills.
> >
> > What do you think about this? It passes preliminary testing on powerpc
> > and I'm enqueued it for much more intensive testing. (I'm wishing there
> > was a better way to acknowledge your contribution to fixing this issue,
> > especially since you brought up the exact problem this is addressing in
> > previous emails.)
> >
>
> I don't think this patch is safe, for exit_mmap() is calling
> mmu_notifier_invalidate_range_{start,end}() which might block with oom_lock
> held when oom_reap_task_mm() is waiting for oom_lock held by exit_mmap().
One of the reasons that I extracted __oom_reap_task_mm() out of the new
oom_reap_task_mm() is to avoid the checks that would be unnecessary when
called from exit_mmap(). In this case, we can ignore the
mm_has_blockable_invalidate_notifiers() check because exit_mmap() has
already done mmu_notifier_release(). So I don't think there's a concern
about __oom_reap_task_mm() blocking while holding oom_lock. Unless you
are referring to something else?
> exit_mmap() must not block while holding oom_lock in order to guarantee that
> oom_reap_task_mm() can give up.
>
> Some suggestion on top of your patch:
>
> mm/mmap.c | 13 +++++--------
> mm/oom_kill.c | 51 ++++++++++++++++++++++++++-------------------------
> 2 files changed, 31 insertions(+), 33 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 981eed4..7b31357 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3019,21 +3019,18 @@ void exit_mmap(struct mm_struct *mm)
> /*
> * Manually reap the mm to free as much memory as possible.
> * Then, as the oom reaper, set MMF_OOM_SKIP to disregard this
> - * mm from further consideration. Taking mm->mmap_sem for write
> - * after setting MMF_OOM_SKIP will guarantee that the oom reaper
> - * will not run on this mm again after mmap_sem is dropped.
> + * mm from further consideration. Setting MMF_OOM_SKIP under
> + * oom_lock held will guarantee that the OOM reaper will not
> + * run on this mm again.
> *
> * This needs to be done before calling munlock_vma_pages_all(),
> * which clears VM_LOCKED, otherwise the oom reaper cannot
> * reliably test it.
> */
> - mutex_lock(&oom_lock);
> __oom_reap_task_mm(mm);
> - mutex_unlock(&oom_lock);
> -
> + mutex_lock(&oom_lock);
> set_bit(MMF_OOM_SKIP, &mm->flags);
> - down_write(&mm->mmap_sem);
> - up_write(&mm->mmap_sem);
> + mutex_unlock(&oom_lock);
> }
>
> if (mm->locked_vm) {
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 8ba6cb8..9a29df8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -523,21 +523,15 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
> {
> bool ret = true;
>
> + mutex_lock(&oom_lock);
> +
> /*
> - * We have to make sure to not race with the victim exit path
> - * and cause premature new oom victim selection:
> - * oom_reap_task_mm exit_mm
> - * mmget_not_zero
> - * mmput
> - * atomic_dec_and_test
> - * exit_oom_victim
> - * [...]
> - * out_of_memory
> - * select_bad_process
> - * # no TIF_MEMDIE task selects new victim
> - * unmap_page_range # frees some memory
> + * MMF_OOM_SKIP is set by exit_mmap() when the OOM reaper can't
> + * work on the mm anymore. The check for MMF_OOM_SKIP must run
> + * under oom_lock held.
> */
> - mutex_lock(&oom_lock);
> + if (test_bit(MMF_OOM_SKIP, &mm->flags))
> + goto unlock_oom;
>
> if (!down_read_trylock(&mm->mmap_sem)) {
> ret = false;
> @@ -557,18 +551,6 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
> goto unlock_oom;
> }
>
> - /*
> - * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
> - * work on the mm anymore. The check for MMF_OOM_SKIP must run
> - * under mmap_sem for reading because it serializes against the
> - * down_write();up_write() cycle in exit_mmap().
> - */
> - if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
> - up_read(&mm->mmap_sem);
> - trace_skip_task_reaping(tsk->pid);
> - goto unlock_oom;
> - }
> -
> trace_start_task_reaping(tsk->pid);
>
> __oom_reap_task_mm(mm);
> @@ -610,8 +592,27 @@ static void oom_reap_task(struct task_struct *tsk)
> /*
> * Hide this mm from OOM killer because it has been either reaped or
> * somebody can't call up_write(mmap_sem).
> + *
> + * We have to make sure to not cause premature new oom victim selection:
> + *
> + * __alloc_pages_may_oom() oom_reap_task_mm()/exit_mmap()
> + * mutex_trylock(&oom_lock)
> + * get_page_from_freelist(ALLOC_WMARK_HIGH) # fails
> + * unmap_page_range() # frees some memory
> + * set_bit(MMF_OOM_SKIP)
> + * out_of_memory()
> + * select_bad_process()
> + * test_bit(MMF_OOM_SKIP) # selects new oom victim
> + * mutex_unlock(&oom_lock)
> + *
> + * Setting MMF_OOM_SKIP under oom_lock held will guarantee that the
> + * last second alocation attempt is done by __alloc_pages_may_oom()
> + * before out_of_memory() selects next OOM victim by finding
> + * MMF_OOM_SKIP.
> */
> + mutex_lock(&oom_lock);
> set_bit(MMF_OOM_SKIP, &mm->flags);
> + mutex_unlock(&oom_lock);
>
> /* Drop a reference taken by wake_oom_reaper */
> put_task_struct(tsk);
next prev parent reply other threads:[~2018-04-24 5:35 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-17 22:46 [patch] mm, oom: fix concurrent munlock and oom reaper unmap David Rientjes
2018-04-18 0:57 ` Tetsuo Handa
2018-04-18 2:39 ` David Rientjes
2018-04-18 2:52 ` [patch v2] " David Rientjes
2018-04-18 3:55 ` Tetsuo Handa
2018-04-18 4:11 ` David Rientjes
2018-04-18 4:47 ` Tetsuo Handa
2018-04-18 5:20 ` David Rientjes
2018-04-18 7:50 ` Michal Hocko
2018-04-18 11:49 ` Tetsuo Handa
2018-04-18 11:58 ` Michal Hocko
2018-04-18 13:25 ` Tetsuo Handa
2018-04-18 13:44 ` Michal Hocko
2018-04-18 14:28 ` Tetsuo Handa
2018-04-18 19:14 ` David Rientjes
2018-04-19 6:35 ` Michal Hocko
2018-04-19 10:45 ` Tetsuo Handa
2018-04-19 11:04 ` Michal Hocko
2018-04-19 11:51 ` Tetsuo Handa
2018-04-19 12:48 ` Michal Hocko
2018-04-19 19:14 ` David Rientjes
2018-04-19 19:34 ` David Rientjes
2018-04-19 22:13 ` Tetsuo Handa
2018-04-20 8:23 ` Michal Hocko
2018-04-20 12:40 ` Michal Hocko
2018-04-22 3:22 ` David Rientjes
2018-04-22 3:48 ` [patch v2] mm, oom: fix concurrent munlock and oom reaperunmap Tetsuo Handa
2018-04-22 13:08 ` Michal Hocko
2018-04-24 2:31 ` David Rientjes
2018-04-24 5:11 ` Tetsuo Handa
2018-04-24 5:35 ` David Rientjes [this message]
2018-04-24 21:57 ` [patch v2] mm, oom: fix concurrent munlock and oom reaper unmap Tetsuo Handa
2018-04-24 22:25 ` David Rientjes
2018-04-24 22:34 ` [patch v3 for-4.17] " David Rientjes
2018-04-24 23:19 ` Michal Hocko
2018-04-24 13:04 ` [patch v2] mm, oom: fix concurrent munlock and oom reaperunmap Michal Hocko
2018-04-24 20:01 ` David Rientjes
2018-04-24 20:13 ` Michal Hocko
2018-04-24 20:22 ` David Rientjes
2018-04-24 20:31 ` Michal Hocko
2018-04-24 21:07 ` David Rientjes
2018-04-24 23:08 ` Michal Hocko
2018-04-24 23:14 ` Michal Hocko
2018-04-22 3:45 ` [patch v2] mm, oom: fix concurrent munlock and oom reaper unmap David Rientjes
2018-04-22 13:18 ` Michal Hocko
2018-04-23 16:09 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.21.1804232231020.82340@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=guro@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox