linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: anfei <anfei.zhou@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	nishimura@mxp.nes.nec.co.jp,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Mel Gorman <mel@csn.ul.ie>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] oom killer: break from infinite loop
Date: Sun, 28 Mar 2010 14:21:01 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1003281341590.30570@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100328162821.GA16765@redhat.com>

On Sun, 28 Mar 2010, Oleg Nesterov wrote:

> I see. But still I can't understand. To me, the problem is not that
> B can't exit, the problem is that A doesn't know it should exit. All
> threads should exit and free ->mm. Even if B could exit, this is not
> enough. And, to some extent, it doesn't matter if it holds mmap_sem
> or not.
> 
> Don't get me wrong. Even if I don't understand oom_kill.c the patch
> looks obviously good to me, even from "common sense" pov. I am just
> curious.
> 
> So, my understanding is: we are going to kill the whole thread group
> but TIF_MEMDIE is per-thread. Mark the whole thread group as TIF_MEMDIE
> so that any thread can notice this flag and (say, __alloc_pages_slowpath)
> fail asap.
> 
> Is my understanding correct?
> 

[Adding Mel Gorman <mel@csn.ul.ie> to the cc]

The problem with this approach is that we could easily deplete all memory 
reserves if the oom killed task has an extremely large number of threads, 
there has always been only a single thread with TIF_MEMDIE set per cpuset 
or memcg; for systems that don't run with cpusets or memory controller,
this has been limited to one thread with TIF_MEMDIE for the entire system.

There's risk involved with suddenly allowing 1000 threads to have 
TIF_MEMDIE set and the chances of fully depleting all allowed zones is 
much higher if they allocate memory prior to exit, for example.

An alternative is to fail allocations if they are failable and the 
allocating task has a pending SIGKILL.  It's better to preempt the oom 
killer since current is going to be exiting anyway and this avoids a 
needless kill.

That's possible if it's guaranteed that __GFP_NOFAIL allocations with a 
pending SIGKILL are granted ALLOC_NO_WATERMARKS to prevent them from 
endlessly looping while making no progress.

Comments?
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1610,13 +1610,21 @@ try_next_zone:
 }
 
 static inline int
-should_alloc_retry(gfp_t gfp_mask, unsigned int order,
+should_alloc_retry(struct task_struct *p, gfp_t gfp_mask, unsigned int order,
 				unsigned long pages_reclaimed)
 {
 	/* Do not loop if specifically requested */
 	if (gfp_mask & __GFP_NORETRY)
 		return 0;
 
+	/* Loop if specifically requested */
+	if (gfp_mask & __GFP_NOFAIL)
+		return 1;
+
+	/* Task is killed, fail the allocation if possible */
+	if (fatal_signal_pending(p))
+		return 0;
+
 	/*
 	 * In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
 	 * means __GFP_NOFAIL, but that may not be true in other
@@ -1635,13 +1643,6 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
 	if (gfp_mask & __GFP_REPEAT && pages_reclaimed < (1 << order))
 		return 1;
 
-	/*
-	 * Don't let big-order allocations loop unless the caller
-	 * explicitly requests that.
-	 */
-	if (gfp_mask & __GFP_NOFAIL)
-		return 1;
-
 	return 0;
 }
 
@@ -1798,6 +1799,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
 		if (!in_interrupt() &&
 		    ((p->flags & PF_MEMALLOC) ||
+		     (fatal_signal_pending(p) && (gfp_mask & __GFP_NOFAIL)) ||
 		     unlikely(test_thread_flag(TIF_MEMDIE))))
 			alloc_flags |= ALLOC_NO_WATERMARKS;
 	}
@@ -1812,6 +1814,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	int migratetype)
 {
 	const gfp_t wait = gfp_mask & __GFP_WAIT;
+	const gfp_t nofail = gfp_mask & __GFP_NOFAIL;
 	struct page *page = NULL;
 	int alloc_flags;
 	unsigned long pages_reclaimed = 0;
@@ -1876,7 +1879,7 @@ rebalance:
 		goto nopage;
 
 	/* Avoid allocations with no watermarks from looping endlessly */
-	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+	if (test_thread_flag(TIF_MEMDIE) && !nofail)
 		goto nopage;
 
 	/* Try direct reclaim and then allocating */
@@ -1888,6 +1891,10 @@ rebalance:
 	if (page)
 		goto got_pg;
 
+	/* Task is killed, fail the allocation if possible */
+	if (fatal_signal_pending(p) && !nofail)
+		goto nopage;
+
 	/*
 	 * If we failed to make any progress reclaiming, then we are
 	 * running out of options and have to consider going OOM
@@ -1909,8 +1916,7 @@ rebalance:
 			 * made, there are no other options and retrying is
 			 * unlikely to help.
 			 */
-			if (order > PAGE_ALLOC_COSTLY_ORDER &&
-						!(gfp_mask & __GFP_NOFAIL))
+			if (order > PAGE_ALLOC_COSTLY_ORDER && !nofail)
 				goto nopage;
 
 			goto restart;
@@ -1919,7 +1925,7 @@ rebalance:
 
 	/* Check if we should retry the allocation */
 	pages_reclaimed += did_some_progress;
-	if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) {
+	if (should_alloc_retry(p, gfp_mask, order, pages_reclaimed)) {
 		/* Wait for some write requests to complete then retry */
 		congestion_wait(BLK_RW_ASYNC, HZ/50);
 		goto rebalance;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-03-28 21:21 UTC|newest]

Thread overview: 115+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-24 16:25 Anfei Zhou
2010-03-25  2:51 ` KOSAKI Motohiro
2010-03-26 22:08 ` Andrew Morton
2010-03-26 22:33   ` Oleg Nesterov
2010-03-28 14:55     ` anfei
2010-03-28 16:28       ` Oleg Nesterov
2010-03-28 21:21         ` David Rientjes [this message]
2010-03-29 11:21           ` Oleg Nesterov
2010-03-29 20:49             ` [patch] oom: give current access to memory reserves if it has been killed David Rientjes
2010-03-30 15:46               ` Oleg Nesterov
2010-03-30 20:26                 ` David Rientjes
2010-03-31 17:58                   ` Oleg Nesterov
2010-03-31 20:47                     ` Oleg Nesterov
2010-04-01  8:35                       ` David Rientjes
2010-04-01  8:57                         ` [patch -mm] oom: hold tasklist_lock when dumping tasks David Rientjes
2010-04-01 14:27                           ` Oleg Nesterov
2010-04-01 19:16                             ` David Rientjes
2010-04-01 13:59                         ` [patch] oom: give current access to memory reserves if it has been killed Oleg Nesterov
2010-04-01 19:12                           ` David Rientjes
2010-04-02 11:14                             ` Oleg Nesterov
2010-04-02 18:30                               ` [PATCH -mm 0/4] oom: linux has threads Oleg Nesterov
2010-04-02 18:31                                 ` [PATCH -mm 1/4] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads Oleg Nesterov
2010-04-02 19:05                                   ` David Rientjes
2010-04-02 18:32                                 ` [PATCH -mm 2/4] oom: select_bad_process: PF_EXITING check should take ->mm into account Oleg Nesterov
2010-04-06 11:42                                   ` anfei
2010-04-06 12:18                                     ` Oleg Nesterov
2010-04-06 13:05                                       ` anfei
2010-04-06 13:38                                         ` Oleg Nesterov
2010-04-02 18:32                                 ` [PATCH -mm 3/4] oom: introduce find_lock_task_mm() to fix !mm false positives Oleg Nesterov
2010-04-02 18:33                                 ` [PATCH -mm 4/4] oom: oom_forkbomb_penalty: move thread_group_cputime() out of task_lock() Oleg Nesterov
2010-04-02 19:04                                   ` David Rientjes
2010-04-05 14:23                                 ` [PATCH -mm] oom: select_bad_process: never choose tasks with badness == 0 Oleg Nesterov
2010-04-02 19:02                               ` [patch] oom: give current access to memory reserves if it has been killed David Rientjes
2010-04-02 19:14                                 ` Oleg Nesterov
2010-04-02 19:46                                   ` David Rientjes
2010-04-02 19:54                                     ` [patch -mm] oom: exclude tasks with badness score of 0 from being selected David Rientjes
2010-04-02 21:04                                       ` Oleg Nesterov
2010-04-02 21:22                                         ` [patch -mm v2] " David Rientjes
2010-04-02 20:55                                     ` [patch] oom: give current access to memory reserves if it has been killed Oleg Nesterov
2010-03-31 21:07                     ` David Rientjes
2010-03-31 22:50                       ` Oleg Nesterov
2010-03-31 23:30                         ` Oleg Nesterov
2010-03-31 23:48                           ` David Rientjes
2010-04-01 14:39                             ` Oleg Nesterov
2010-04-01 18:58                               ` David Rientjes
2010-04-01  8:25                         ` David Rientjes
2010-04-01 15:26                           ` Oleg Nesterov
2010-04-08 21:08                             ` David Rientjes
2010-04-09 12:38                               ` Oleg Nesterov
2010-03-30 16:39               ` [PATCH] oom: fix the unsafe proc_oom_score()->badness() call Oleg Nesterov
2010-03-30 17:43                 ` [PATCH -mm] proc: don't take ->siglock for /proc/pid/oom_adj Oleg Nesterov
2010-03-30 20:30                   ` David Rientjes
2010-03-31  9:17                     ` Oleg Nesterov
2010-03-31 18:59                     ` Oleg Nesterov
2010-03-31 21:14                       ` David Rientjes
2010-03-31 23:00                         ` Oleg Nesterov
2010-04-01  8:32                           ` David Rientjes
2010-04-01 15:37                             ` Oleg Nesterov
2010-04-01 19:04                               ` David Rientjes
2010-03-30 20:32                 ` [PATCH] oom: fix the unsafe proc_oom_score()->badness() call David Rientjes
2010-03-31  9:16                   ` Oleg Nesterov
2010-03-31 20:17                     ` Oleg Nesterov
2010-04-01  7:41                       ` David Rientjes
2010-04-01 13:13                         ` [PATCH 0/1] oom: fix the unsafe usage of badness() in proc_oom_score() Oleg Nesterov
2010-04-01 13:13                           ` [PATCH 1/1] " Oleg Nesterov
2010-04-01 19:03                             ` David Rientjes
2010-03-29 14:06           ` [PATCH] oom killer: break from infinite loop anfei
2010-03-29 20:01             ` David Rientjes
2010-03-30 14:29               ` anfei
2010-03-30 20:29                 ` David Rientjes
2010-03-31  0:57                   ` KAMEZAWA Hiroyuki
2010-03-31  6:07                     ` David Rientjes
2010-03-31  6:13                       ` KAMEZAWA Hiroyuki
2010-03-31  6:30                         ` Balbir Singh
2010-03-31  6:31                           ` KAMEZAWA Hiroyuki
2010-03-31  7:04                             ` David Rientjes
2010-03-31  6:32                           ` David Rientjes
2010-03-31  7:08                             ` [patch -mm] memcg: make oom killer a no-op when no killable task can be found David Rientjes
2010-03-31  7:08                               ` KAMEZAWA Hiroyuki
2010-03-31  8:04                               ` Balbir Singh
2010-03-31 10:38                                 ` David Rientjes
2010-04-04 23:28                               ` David Rientjes
2010-04-05 21:30                                 ` Andrew Morton
2010-04-05 22:40                                   ` David Rientjes
2010-04-05 22:49                                     ` Andrew Morton
2010-04-05 23:01                                       ` David Rientjes
2010-04-06 12:08                                         ` KOSAKI Motohiro
2010-04-06 21:47                                           ` David Rientjes
2010-04-07  0:20                                             ` KAMEZAWA Hiroyuki
2010-04-07 13:29                                               ` KOSAKI Motohiro
2010-04-08 18:05                                                 ` David Rientjes
2010-04-21 19:17                                                   ` Andrew Morton
2010-04-21 22:04                                                     ` David Rientjes
2010-04-22  0:23                                                       ` KAMEZAWA Hiroyuki
2010-04-22  8:34                                                         ` David Rientjes
2010-04-27 22:58                                                       ` [patch -mm] oom: reintroduce and deprecate oom_kill_allocating_task David Rientjes
2010-04-28  0:57                                                         ` KAMEZAWA Hiroyuki
2010-04-22  7:23                                                     ` [patch -mm] memcg: make oom killer a no-op when no killable task can be found Nick Piggin
2010-04-22  7:25                                                       ` KAMEZAWA Hiroyuki
2010-04-22 10:09                                                         ` Nick Piggin
2010-04-22 10:27                                                           ` KAMEZAWA Hiroyuki
2010-04-22 21:11                                                             ` David Rientjes
2010-04-22 10:28                                                           ` David Rientjes
2010-04-22 15:39                                                             ` Nick Piggin
2010-04-22 21:09                                                               ` David Rientjes
2010-05-04 23:55                                                     ` David Rientjes
2010-04-08 17:36                                               ` David Rientjes
2010-04-02 10:17           ` [PATCH] oom killer: break from infinite loop Mel Gorman
2010-04-04 23:26             ` David Rientjes
2010-04-05 10:47               ` Mel Gorman
2010-04-06 22:40                 ` David Rientjes
2010-03-29 11:31         ` anfei
2010-03-29 11:46           ` Oleg Nesterov
2010-03-29 12:09             ` anfei
2010-03-28  2:46 ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1003281341590.30570@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=anfei.zhou@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox