Re: [RFC PATCH] mm, oom_reaper: do not attempt to reap a task more than twice

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, rientjes@google.com,
	akpm@linux-foundation.org, oleg@redhat.com,
	vdavydov@parallels.com
Subject: Re: [RFC PATCH] mm, oom_reaper: do not attempt to reap a task more than twice
Date: Sat, 28 May 2016 21:22:08 +0900	[thread overview]
Message-ID: <201605282122.HAD09894.SFOFHtOVJLOQMF@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <201605280124.EJB71319.SHOtOVFFFQMOJL@I-love.SAKURA.ne.jp>

Tetsuo Handa wrote:
> Michal Hocko wrote:
> > We could very well do 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index bcb6d3b26c94..d9017b8c7300 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -813,6 +813,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> >  			 * memory might be still used.
> >  			 */
> >  			can_oom_reap = false;
> > +			set_bit(MMF_OOM_REAPED, mm->flags);
> >  			continue;
> >  		}
> >  		if (p->signal->oom_score_adj == OOM_ADJUST_MIN)
> > 
> > with the same result. If you _really_ think that this would make a
> > difference I could live with that. But I am highly skeptical this
> > matters all that much.

Usage of set_bit() above and below are both wrong. The mm used by
kernel thread via use_mm() will become OOM reapable after unuse_mm().
Thus, setting MMF_OOM_REAPED is a mistake as with MMF_OOM_KILLED
( http://lkml.kernel.org/r/201603152015.JAE86937.VFOLtQFOFJOSHM@I-love.SAKURA.ne.jp ).

> I think the lines needed for the guarantee are something like
> 
> 	rcu_read_lock();
> 	for_each_process(p) {
> 		if (!process_shares_mm(p, mm))
> 			continue;
> 		if (same_thread_group(p, victim))
> 			continue;
> 		/*
> 		 * It is not safe to reap memory used by global init or
> 		 * kernel threads.
> 		 */
> 		if (unlikely(p->flags & PF_KTHREAD) || is_global_init(p)) {
> 			set_bit(MMF_OOM_REAPED, mm->flags);
> 			continue;
> 		}
> 		/*
> 		 * Memory used by OOM_SCORE_ADJ_MIN is still OOM reapable
> 		 * if they are already killed or exiting. Just don't
> 		 * send SIGKILL.
> 		 */
> 		if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
> 			continue;
> 
> 		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
> 	}
> 	rcu_read_unlock();
> 
> 	wake_oom_reaper(victim);
> 
> but doing set_bit(MMF_OOM_REAPED, mm->flags) here makes sense?



I also realized that my

	if (task_is_reapable(current))
		return true;

is wrong. task_is_reapable() depends on all threads using current->mm are
dying or exiting, but select_bad_process() (which is needed for calling
mark_oom_victim() from oom_kill_process() after oom_badness() > 0 by
oom_scan_process_thread() returning OOM_SCAN_OK) depends on there is no
TIF_MEMDIE thread.

If there is a TIF_MEMDIE thread, current thread which will (as of Linux 4.6)
be able to get TIF_MEMDIE by

  fatal_signal_pending(current) || ((current->flags & PF_EXITING) && !(current->signal->flags & SIGNAL_GROUP_COREDUMP))

condition will fail to get TIF_MEMDIE because oom_scan_process_thread() will
return OOM_SCAN_ABORT. The logic of setting TIF_MEMDIE to only one thread

	/*
	 * Kill all user processes sharing victim->mm in other thread groups, if
	 * any.  They don't get access to memory reserves, though, to avoid
	 * depletion of all memory.  This prevents mm->mmap_sem livelock when an
	 * oom killed thread cannot exit because it requires the semaphore and
	 * its contended by another thread trying to allocate memory itself.
	 * That thread will now get access to memory reserves since it has a
	 * pending fatal signal.
	 */

does not allow the shortcuts to require that current->mm is reapable.

It seems to me that your "[PATCH 6/6] mm, oom: fortify task_will_free_mem"
expects that current->mm is reapable as well as my patch.
If so, [PATCH 6/6] will not work.

+static inline bool task_will_free_mem(struct task_struct *task)
+{
(...snipped...)
+		rcu_read_lock();
+		for_each_process(p) {
+			bool vfork;
+
+			/*
+			 * skip over vforked tasks because they are mostly
+			 * independent and will drop the mm soon
+			 */
+			task_lock(p);
+			vfork = p->vfork_done;
+			task_unlock(p);
+			if (vfork)
+				continue;
+
+			ret = __task_will_free_mem(p);
+			if (!ret)
+				break;
+		}
+		rcu_read_unlock();
(...snipped...)
+}

@@ -945,14 +894,10 @@ bool out_of_memory(struct oom_control *oc)
 	 * If current has a pending SIGKILL or is exiting, then automatically
 	 * select it.  The goal is to allow it to allocate so that it may
 	 * quickly exit and free its memory.
-	 *
-	 * But don't select if current has already released its mm and cleared
-	 * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur.
 	 */
-	if (current->mm &&
-	    (fatal_signal_pending(current) || task_will_free_mem(current))) {
+	if (task_will_free_mem(current)) {
 		mark_oom_victim(current);
-		try_oom_reaper(current);
+		wake_oom_reaper(current);
 		return true;
 	}
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-05-28 12:22 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-26 15:27 Michal Hocko
2016-05-27 10:31 ` Tetsuo Handa
2016-05-27 12:23   ` Michal Hocko
2016-05-27 13:18     ` [RFC PATCH] mm, oom_reaper: do not attempt to reap a task morethan twice Tetsuo Handa
2016-05-27 13:35       ` Michal Hocko
2016-05-27 16:24         ` [RFC PATCH] mm, oom_reaper: do not attempt to reap a task more than twice Tetsuo Handa
2016-05-28 12:22           ` Tetsuo Handa [this message]
2016-05-30 11:57             ` Michal Hocko
2016-05-30 11:55           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201605282122.HAD09894.SFOFHtOVJLOQMF@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox