linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: David Rientjes <rientjes@google.com>
Cc: mhocko@kernel.org, Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	guro@fb.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch v2] mm, oom: fix concurrent munlock and oom reaperunmap
Date: Tue, 24 Apr 2018 14:11:34 +0900	[thread overview]
Message-ID: <201804240511.w3O5BY4o090598@www262.sakura.ne.jp> (raw)
In-Reply-To: <alpine.DEB.2.21.1804231706340.18716@chino.kir.corp.google.com>

> On Sun, 22 Apr 2018, Tetsuo Handa wrote:
> 
> > > I'm wondering why you do not see oom killing of many processes if the 
> > > victim is a very large process that takes a long time to free memory in 
> > > exit_mmap() as I do because the oom reaper gives up trying to acquire 
> > > mm->mmap_sem and just sets MMF_OOM_SKIP itself.
> > > 
> > 
> > We can call __oom_reap_task_mm() from exit_mmap() (or __mmput()) before
> > exit_mmap() holds mmap_sem for write. Then, at least memory which could
> > have been reclaimed if exit_mmap() did not hold mmap_sem for write will
> > be guaranteed to be reclaimed before MMF_OOM_SKIP is set.
> > 
> 
> I think that's an exceptionally good idea and will mitigate the concerns 
> of others.
> 
> It can be done without holding mm->mmap_sem in exit_mmap() and uses the 
> same criteria that the oom reaper uses to set MMF_OOM_SKIP itself, so we 
> don't get dozens of unnecessary oom kills.
> 
> What do you think about this?  It passes preliminary testing on powerpc 
> and I'm enqueued it for much more intensive testing.  (I'm wishing there 
> was a better way to acknowledge your contribution to fixing this issue, 
> especially since you brought up the exact problem this is addressing in 
> previous emails.)
> 

I don't think this patch is safe, for exit_mmap() is calling
mmu_notifier_invalidate_range_{start,end}() which might block with oom_lock
held when oom_reap_task_mm() is waiting for oom_lock held by exit_mmap().
exit_mmap() must not block while holding oom_lock in order to guarantee that
oom_reap_task_mm() can give up.

Some suggestion on top of your patch:

 mm/mmap.c     | 13 +++++--------
 mm/oom_kill.c | 51 ++++++++++++++++++++++++++-------------------------
 2 files changed, 31 insertions(+), 33 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 981eed4..7b31357 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3019,21 +3019,18 @@ void exit_mmap(struct mm_struct *mm)
 		/*
 		 * Manually reap the mm to free as much memory as possible.
 		 * Then, as the oom reaper, set MMF_OOM_SKIP to disregard this
-		 * mm from further consideration.  Taking mm->mmap_sem for write
-		 * after setting MMF_OOM_SKIP will guarantee that the oom reaper
-		 * will not run on this mm again after mmap_sem is dropped.
+		 * mm from further consideration. Setting MMF_OOM_SKIP under
+		 * oom_lock held will guarantee that the OOM reaper will not
+		 * run on this mm again.
 		 *
 		 * This needs to be done before calling munlock_vma_pages_all(),
 		 * which clears VM_LOCKED, otherwise the oom reaper cannot
 		 * reliably test it.
 		 */
-		mutex_lock(&oom_lock);
 		__oom_reap_task_mm(mm);
-		mutex_unlock(&oom_lock);
-
+		mutex_lock(&oom_lock);
 		set_bit(MMF_OOM_SKIP, &mm->flags);
-		down_write(&mm->mmap_sem);
-		up_write(&mm->mmap_sem);
+		mutex_unlock(&oom_lock);
 	}
 
 	if (mm->locked_vm) {
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8ba6cb8..9a29df8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -523,21 +523,15 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 {
 	bool ret = true;
 
+	mutex_lock(&oom_lock);
+
 	/*
-	 * We have to make sure to not race with the victim exit path
-	 * and cause premature new oom victim selection:
-	 * oom_reap_task_mm		exit_mm
-	 *   mmget_not_zero
-	 *				  mmput
-	 *				    atomic_dec_and_test
-	 *				  exit_oom_victim
-	 *				[...]
-	 *				out_of_memory
-	 *				  select_bad_process
-	 *				    # no TIF_MEMDIE task selects new victim
-	 *  unmap_page_range # frees some memory
+	 * MMF_OOM_SKIP is set by exit_mmap() when the OOM reaper can't
+	 * work on the mm anymore. The check for MMF_OOM_SKIP must run
+	 * under oom_lock held.
 	 */
-	mutex_lock(&oom_lock);
+	if (test_bit(MMF_OOM_SKIP, &mm->flags))
+		goto unlock_oom;
 
 	if (!down_read_trylock(&mm->mmap_sem)) {
 		ret = false;
@@ -557,18 +551,6 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 		goto unlock_oom;
 	}
 
-	/*
-	 * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
-	 * work on the mm anymore. The check for MMF_OOM_SKIP must run
-	 * under mmap_sem for reading because it serializes against the
-	 * down_write();up_write() cycle in exit_mmap().
-	 */
-	if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
-		up_read(&mm->mmap_sem);
-		trace_skip_task_reaping(tsk->pid);
-		goto unlock_oom;
-	}
-
 	trace_start_task_reaping(tsk->pid);
 
 	__oom_reap_task_mm(mm);
@@ -610,8 +592,27 @@ static void oom_reap_task(struct task_struct *tsk)
 	/*
 	 * Hide this mm from OOM killer because it has been either reaped or
 	 * somebody can't call up_write(mmap_sem).
+	 *
+	 * We have to make sure to not cause premature new oom victim selection:
+	 *
+	 * __alloc_pages_may_oom()     oom_reap_task_mm()/exit_mmap()
+	 *   mutex_trylock(&oom_lock)
+	 *   get_page_from_freelist(ALLOC_WMARK_HIGH) # fails
+	 *                               unmap_page_range() # frees some memory
+	 *                               set_bit(MMF_OOM_SKIP)
+	 *   out_of_memory()
+	 *     select_bad_process()
+	 *       test_bit(MMF_OOM_SKIP) # selects new oom victim
+	 *   mutex_unlock(&oom_lock)
+	 *
+	 * Setting MMF_OOM_SKIP under oom_lock held will guarantee that the
+	 * last second alocation attempt is done by __alloc_pages_may_oom()
+	 * before out_of_memory() selects next OOM victim by finding
+	 * MMF_OOM_SKIP.
 	 */
+	mutex_lock(&oom_lock);
 	set_bit(MMF_OOM_SKIP, &mm->flags);
+	mutex_unlock(&oom_lock);
 
 	/* Drop a reference taken by wake_oom_reaper */
 	put_task_struct(tsk);
-- 
1.8.3.1

  reply	other threads:[~2018-04-24  5:11 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-17 22:46 [patch] mm, oom: fix concurrent munlock and oom reaper unmap David Rientjes
2018-04-18  0:57 ` Tetsuo Handa
2018-04-18  2:39   ` David Rientjes
2018-04-18  2:52     ` [patch v2] " David Rientjes
2018-04-18  3:55       ` Tetsuo Handa
2018-04-18  4:11         ` David Rientjes
2018-04-18  4:47           ` Tetsuo Handa
2018-04-18  5:20             ` David Rientjes
2018-04-18  7:50       ` Michal Hocko
2018-04-18 11:49         ` Tetsuo Handa
2018-04-18 11:58           ` Michal Hocko
2018-04-18 13:25             ` Tetsuo Handa
2018-04-18 13:44               ` Michal Hocko
2018-04-18 14:28                 ` Tetsuo Handa
2018-04-18 19:14         ` David Rientjes
2018-04-19  6:35           ` Michal Hocko
2018-04-19 10:45             ` Tetsuo Handa
2018-04-19 11:04               ` Michal Hocko
2018-04-19 11:51                 ` Tetsuo Handa
2018-04-19 12:48                   ` Michal Hocko
2018-04-19 19:14               ` David Rientjes
2018-04-19 19:34             ` David Rientjes
2018-04-19 22:13               ` Tetsuo Handa
2018-04-20  8:23               ` Michal Hocko
2018-04-20 12:40                 ` Michal Hocko
2018-04-22  3:22                   ` David Rientjes
2018-04-22  3:48                     ` [patch v2] mm, oom: fix concurrent munlock and oom reaperunmap Tetsuo Handa
2018-04-22 13:08                       ` Michal Hocko
2018-04-24  2:31                       ` David Rientjes
2018-04-24  5:11                         ` Tetsuo Handa [this message]
2018-04-24  5:35                           ` David Rientjes
2018-04-24 21:57                             ` [patch v2] mm, oom: fix concurrent munlock and oom reaper unmap Tetsuo Handa
2018-04-24 22:25                               ` David Rientjes
2018-04-24 22:34                                 ` [patch v3 for-4.17] " David Rientjes
2018-04-24 23:19                                   ` Michal Hocko
2018-04-24 13:04                         ` [patch v2] mm, oom: fix concurrent munlock and oom reaperunmap Michal Hocko
2018-04-24 20:01                           ` David Rientjes
2018-04-24 20:13                             ` Michal Hocko
2018-04-24 20:22                               ` David Rientjes
2018-04-24 20:31                                 ` Michal Hocko
2018-04-24 21:07                                   ` David Rientjes
2018-04-24 23:08                                     ` Michal Hocko
2018-04-24 23:14                                       ` Michal Hocko
2018-04-22  3:45                 ` [patch v2] mm, oom: fix concurrent munlock and oom reaper unmap David Rientjes
2018-04-22 13:18                   ` Michal Hocko
2018-04-23 16:09                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201804240511.w3O5BY4o090598@www262.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox