linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: Re: [patch] mm, oom: prevent additional oom kills before memory is freed
Date: Fri, 16 Jun 2017 10:39:46 +0200	[thread overview]
Message-ID: <20170616083946.GC30580@dhcp22.suse.cz> (raw)
In-Reply-To: <201706160054.v5G0sY7c064781@www262.sakura.ne.jp>

On Fri 16-06-17 09:54:34, Tetsuo Handa wrote:
[...]
> And the patch you proposed is broken.

Thanks for your testing!
 
> ----------
> [  161.846202] Out of memory: Kill process 6331 (a.out) score 999 or sacrifice child
> [  161.850327] Killed process 6331 (a.out) total-vm:4172kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> [  161.858503] ------------[ cut here ]------------
> [  161.861512] kernel BUG at mm/memory.c:1381!

BUG_ON(addr >= end) suggests our vma has trimmed. I guess I see what is
going on here.
__oom_reap_task_mm				exit_mmap
						  free_pgtables
						  up_write(mm->mmap_sem)
  down_read_trylock(&mm->mmap_sem)
  						  remove_vma
    unmap_page_range

So we need to extend the mmap_sem coverage. See the updated diff (not
the full proper patch yet).

> Please carefully consider the reason why there is VM_BUG_ON() in __mmput(),
> and clarify in your patch that what are possible side effects of racing
> uprobe_clear_state()/exit_aio()/ksm_exit()/exit_mmap() etc. with
> __oom_reap_task_mm()

Yes that definitely needs to be checked. We basically rely on the racing
part of the __mmput to not modify the address space. oom_reaper doesn't
touch any vma state except it unmaps pages which can run in parallel.
exit_aio->kill_ioctx seemingly does vm_munmap but it a) uses the
mmap_sem for write and b) it doesn't actually unmap because exit_aio
does ctx->mmap_size = 0. {ksm,khugepaged}_exit just do some houskeeping
which is not modifying the address space. I hope I will find some more
time to work on this next week. Additional test would be highly
appreciated of course.

---
diff --git a/mm/mmap.c b/mm/mmap.c
index 3bd5ecd20d4d..ca58f8a2a217 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2962,6 +2962,11 @@ void exit_mmap(struct mm_struct *mm)
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);
 
+	/*
+	 * oom reaper might race with exit_mmap so make sure we won't free
+	 * page tables or unmap VMAs under its feet
+	 */
+	down_write(&mm->mmap_sem);
 	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
 	tlb_finish_mmu(&tlb, 0, -1);
 
@@ -2975,6 +2980,7 @@ void exit_mmap(struct mm_struct *mm)
 		vma = remove_vma(vma);
 	}
 	vm_unacct_memory(nr_accounted);
+	up_write(&mm->mmap_sem);
 }
 
 /* Insert vm structure into process list sorted by address
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 0e2c925e7826..3df464f0f48b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -494,16 +494,6 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 	}
 
 	/*
-	 * increase mm_users only after we know we will reap something so
-	 * that the mmput_async is called only when we have reaped something
-	 * and delayed __mmput doesn't matter that much
-	 */
-	if (!mmget_not_zero(mm)) {
-		up_read(&mm->mmap_sem);
-		goto unlock_oom;
-	}
-
-	/*
 	 * Tell all users of get_user/copy_from_user etc... that the content
 	 * is no longer stable. No barriers really needed because unmapping
 	 * should imply barriers already and the reader would hit a page fault
@@ -537,13 +527,6 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 			K(get_mm_counter(mm, MM_FILEPAGES)),
 			K(get_mm_counter(mm, MM_SHMEMPAGES)));
 	up_read(&mm->mmap_sem);
-
-	/*
-	 * Drop our reference but make sure the mmput slow path is called from a
-	 * different context because we shouldn't risk we get stuck there and
-	 * put the oom_reaper out of the way.
-	 */
-	mmput_async(mm);
 unlock_oom:
 	mutex_unlock(&oom_lock);
 	return ret;
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-06-16  8:39 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-14 23:43 David Rientjes
2017-06-15 10:39 ` Michal Hocko
2017-06-15 10:53   ` Tetsuo Handa
2017-06-15 11:01     ` Michal Hocko
2017-06-15 11:32       ` Tetsuo Handa
2017-06-15 12:03         ` Michal Hocko
2017-06-15 12:13           ` Michal Hocko
2017-06-15 13:01             ` Tetsuo Handa
2017-06-15 13:22               ` Michal Hocko
2017-06-15 21:43                 ` Tetsuo Handa
2017-06-15 21:37               ` David Rientjes
2017-06-15 12:20       ` Michal Hocko
2017-06-15 21:26   ` David Rientjes
2017-06-15 21:41     ` Michal Hocko
2017-06-15 22:03       ` David Rientjes
2017-06-15 22:12         ` Michal Hocko
2017-06-15 22:42           ` David Rientjes
2017-06-16  8:06             ` Michal Hocko
2017-06-16  0:54           ` Tetsuo Handa
2017-06-16  4:00             ` Tetsuo Handa
2017-06-16  8:39             ` Michal Hocko [this message]
2017-06-16 10:27               ` Tetsuo Handa
2017-06-16 11:02                 ` Michal Hocko
2017-06-16 14:26                   ` Re: [patch] mm, oom: prevent additional oom kills before memoryis freed Tetsuo Handa
2017-06-16 14:42                     ` Michal Hocko
2017-06-17 13:30                       ` Re: [patch] mm, oom: prevent additional oom kills before memory is freed Tetsuo Handa
2017-06-23 12:38                         ` Michal Hocko
2017-06-16 12:22       ` Tetsuo Handa
2017-06-16 14:12         ` Michal Hocko
2017-06-17  5:17           ` [PATCH] mm,oom_kill: Close race window of needlessly selecting new victims Tetsuo Handa
2017-06-20 22:12             ` David Rientjes
2017-06-21  2:17               ` Tetsuo Handa
2017-06-21 20:31                 ` David Rientjes
2017-06-22  0:53                   ` Tetsuo Handa
2017-06-23 12:45                     ` Michal Hocko
2017-06-21 13:18               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170616083946.GC30580@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox