linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>, anfei <anfei.zhou@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	nishimura@mxp.nes.nec.co.jp,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] oom killer: break from infinite loop
Date: Mon, 5 Apr 2010 11:47:52 +0100	[thread overview]
Message-ID: <20100405104752.GB21207@csn.ul.ie> (raw)
In-Reply-To: <alpine.DEB.2.00.1004041616280.7198@chino.kir.corp.google.com>

On Sun, Apr 04, 2010 at 04:26:38PM -0700, David Rientjes wrote:
> On Fri, 2 Apr 2010, Mel Gorman wrote:
> 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1610,13 +1610,21 @@ try_next_zone:
> > >  }
> > >  
> > >  static inline int
> > > -should_alloc_retry(gfp_t gfp_mask, unsigned int order,
> > > +should_alloc_retry(struct task_struct *p, gfp_t gfp_mask, unsigned int order,
> > >  				unsigned long pages_reclaimed)
> > >  {
> > >  	/* Do not loop if specifically requested */
> > >  	if (gfp_mask & __GFP_NORETRY)
> > >  		return 0;
> > >  
> > > +	/* Loop if specifically requested */
> > > +	if (gfp_mask & __GFP_NOFAIL)
> > > +		return 1;
> > > +
> > 
> > Meh, you could have preserved the comment but no biggie.
> > 
> 
> I'll remember to preserve it when it's proposed.
> 
> > > +	/* Task is killed, fail the allocation if possible */
> > > +	if (fatal_signal_pending(p))
> > > +		return 0;
> > > +
> > 
> > Seems reasonable. This will be checked on every major loop in the
> > allocator slow patch.
> > 
> > >  	/*
> > >  	 * In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
> > >  	 * means __GFP_NOFAIL, but that may not be true in other
> > > @@ -1635,13 +1643,6 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
> > >  	if (gfp_mask & __GFP_REPEAT && pages_reclaimed < (1 << order))
> > >  		return 1;
> > >  
> > > -	/*
> > > -	 * Don't let big-order allocations loop unless the caller
> > > -	 * explicitly requests that.
> > > -	 */
> > > -	if (gfp_mask & __GFP_NOFAIL)
> > > -		return 1;
> > > -
> > >  	return 0;
> > >  }
> > >  
> > > @@ -1798,6 +1799,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> > >  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
> > >  		if (!in_interrupt() &&
> > >  		    ((p->flags & PF_MEMALLOC) ||
> > > +		     (fatal_signal_pending(p) && (gfp_mask & __GFP_NOFAIL)) ||
> > 
> > This is a lot less clear. GFP_NOFAIL is rare so this is basically saying
> > that all threads with a fatal signal pending can ignore watermarks. This
> > is dangerous because if 1000 threads get killed, there is a possibility
> > of deadlocking the system.
> > 
> 
> I don't quite understand the comment, this is only for __GFP_NOFAIL 
> allocations, which you say are rare, so a large number of threads won't be 
> doing this simultaneously.
> 
> > Why not obey the watermarks and just not retry the loop later and fail
> > the allocation?
> > 
> 
> The above check for (fatal_signal_pending(p) && (gfp_mask & __GFP_NOFAIL)) 
> essentially oom kills p without invoking the oom killer before direct 
> reclaim is invoked.  We know it has a pending SIGKILL and wants to exit, 
> so we allow it to allocate beyond the min watermark to avoid costly 
> reclaim or needlessly killing another task.
> 

Sorry, I typod.

GFP_NOFAIL is rare but this is basically saying that all threads with a
fatal signal and using NOFAIL can ignore watermarks.

I don't think there is any caller in an exit path will be using GFP_NOFAIL
as it's most common user is file-system related but it still feels unnecssary
to check this case on every call to the slow path.

> > >  		     unlikely(test_thread_flag(TIF_MEMDIE))))
> > >  			alloc_flags |= ALLOC_NO_WATERMARKS;
> > >  	}
> > > @@ -1812,6 +1814,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > >  	int migratetype)
> > >  {
> > >  	const gfp_t wait = gfp_mask & __GFP_WAIT;
> > > +	const gfp_t nofail = gfp_mask & __GFP_NOFAIL;
> > >  	struct page *page = NULL;
> > >  	int alloc_flags;
> > >  	unsigned long pages_reclaimed = 0;
> > > @@ -1876,7 +1879,7 @@ rebalance:
> > >  		goto nopage;
> > >  
> > >  	/* Avoid allocations with no watermarks from looping endlessly */
> > > -	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> > > +	if (test_thread_flag(TIF_MEMDIE) && !nofail)
> > >  		goto nopage;
> > >  
> > >  	/* Try direct reclaim and then allocating */
> > > @@ -1888,6 +1891,10 @@ rebalance:
> > >  	if (page)
> > >  		goto got_pg;
> > >  
> > > +	/* Task is killed, fail the allocation if possible */
> > > +	if (fatal_signal_pending(p) && !nofail)
> > > +		goto nopage;
> > > +
> > 
> > Again, I would expect this to be caught by should_alloc_retry().
> > 
> 
> It is, but only after the oom killer is called.  We don't want to 
> needlessly kill another task here when p has already been killed but may 
> not be PF_EXITING yet.
> 

Fair point. How about just checking before __alloc_pages_may_oom() is
called then? This check will be then in a slower path.
I recognise this means that it is also only checked when direct reclaim
is failing but there is at least one good reason for it.

With this change, processes that have been sigkilled may now fail allocations
that they might not have failed before. It would be difficult to trigger
but here is one possible problem with this change;

1. System was borderline with some trashing
2. User starts program that gobbles up lots of memory on page faults,
   trashing the system further and annoying the user
3. User sends SIGKILL
4. Process was faulting and returns NULL because fatal signal was pending
5. Fault path returns VM_FAULT_OOM
6. Arch-specific path (on x86 anyway) calls out_of_memory again because
   VM_FAULT_OOM was returned.

Ho hum, I haven't thought about this before but it's also possible that
a process that is fauling that gets oom-killed will trigger a cascading
OOM kill. If the system was heavily trashing, it might mean a large
number of processes get killed.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-04-05 10:48 UTC|newest]

Thread overview: 115+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-24 16:25 Anfei Zhou
2010-03-25  2:51 ` KOSAKI Motohiro
2010-03-26 22:08 ` Andrew Morton
2010-03-26 22:33   ` Oleg Nesterov
2010-03-28 14:55     ` anfei
2010-03-28 16:28       ` Oleg Nesterov
2010-03-28 21:21         ` David Rientjes
2010-03-29 11:21           ` Oleg Nesterov
2010-03-29 20:49             ` [patch] oom: give current access to memory reserves if it has been killed David Rientjes
2010-03-30 15:46               ` Oleg Nesterov
2010-03-30 20:26                 ` David Rientjes
2010-03-31 17:58                   ` Oleg Nesterov
2010-03-31 20:47                     ` Oleg Nesterov
2010-04-01  8:35                       ` David Rientjes
2010-04-01  8:57                         ` [patch -mm] oom: hold tasklist_lock when dumping tasks David Rientjes
2010-04-01 14:27                           ` Oleg Nesterov
2010-04-01 19:16                             ` David Rientjes
2010-04-01 13:59                         ` [patch] oom: give current access to memory reserves if it has been killed Oleg Nesterov
2010-04-01 19:12                           ` David Rientjes
2010-04-02 11:14                             ` Oleg Nesterov
2010-04-02 18:30                               ` [PATCH -mm 0/4] oom: linux has threads Oleg Nesterov
2010-04-02 18:31                                 ` [PATCH -mm 1/4] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads Oleg Nesterov
2010-04-02 19:05                                   ` David Rientjes
2010-04-02 18:32                                 ` [PATCH -mm 2/4] oom: select_bad_process: PF_EXITING check should take ->mm into account Oleg Nesterov
2010-04-06 11:42                                   ` anfei
2010-04-06 12:18                                     ` Oleg Nesterov
2010-04-06 13:05                                       ` anfei
2010-04-06 13:38                                         ` Oleg Nesterov
2010-04-02 18:32                                 ` [PATCH -mm 3/4] oom: introduce find_lock_task_mm() to fix !mm false positives Oleg Nesterov
2010-04-02 18:33                                 ` [PATCH -mm 4/4] oom: oom_forkbomb_penalty: move thread_group_cputime() out of task_lock() Oleg Nesterov
2010-04-02 19:04                                   ` David Rientjes
2010-04-05 14:23                                 ` [PATCH -mm] oom: select_bad_process: never choose tasks with badness == 0 Oleg Nesterov
2010-04-02 19:02                               ` [patch] oom: give current access to memory reserves if it has been killed David Rientjes
2010-04-02 19:14                                 ` Oleg Nesterov
2010-04-02 19:46                                   ` David Rientjes
2010-04-02 19:54                                     ` [patch -mm] oom: exclude tasks with badness score of 0 from being selected David Rientjes
2010-04-02 21:04                                       ` Oleg Nesterov
2010-04-02 21:22                                         ` [patch -mm v2] " David Rientjes
2010-04-02 20:55                                     ` [patch] oom: give current access to memory reserves if it has been killed Oleg Nesterov
2010-03-31 21:07                     ` David Rientjes
2010-03-31 22:50                       ` Oleg Nesterov
2010-03-31 23:30                         ` Oleg Nesterov
2010-03-31 23:48                           ` David Rientjes
2010-04-01 14:39                             ` Oleg Nesterov
2010-04-01 18:58                               ` David Rientjes
2010-04-01  8:25                         ` David Rientjes
2010-04-01 15:26                           ` Oleg Nesterov
2010-04-08 21:08                             ` David Rientjes
2010-04-09 12:38                               ` Oleg Nesterov
2010-03-30 16:39               ` [PATCH] oom: fix the unsafe proc_oom_score()->badness() call Oleg Nesterov
2010-03-30 17:43                 ` [PATCH -mm] proc: don't take ->siglock for /proc/pid/oom_adj Oleg Nesterov
2010-03-30 20:30                   ` David Rientjes
2010-03-31  9:17                     ` Oleg Nesterov
2010-03-31 18:59                     ` Oleg Nesterov
2010-03-31 21:14                       ` David Rientjes
2010-03-31 23:00                         ` Oleg Nesterov
2010-04-01  8:32                           ` David Rientjes
2010-04-01 15:37                             ` Oleg Nesterov
2010-04-01 19:04                               ` David Rientjes
2010-03-30 20:32                 ` [PATCH] oom: fix the unsafe proc_oom_score()->badness() call David Rientjes
2010-03-31  9:16                   ` Oleg Nesterov
2010-03-31 20:17                     ` Oleg Nesterov
2010-04-01  7:41                       ` David Rientjes
2010-04-01 13:13                         ` [PATCH 0/1] oom: fix the unsafe usage of badness() in proc_oom_score() Oleg Nesterov
2010-04-01 13:13                           ` [PATCH 1/1] " Oleg Nesterov
2010-04-01 19:03                             ` David Rientjes
2010-03-29 14:06           ` [PATCH] oom killer: break from infinite loop anfei
2010-03-29 20:01             ` David Rientjes
2010-03-30 14:29               ` anfei
2010-03-30 20:29                 ` David Rientjes
2010-03-31  0:57                   ` KAMEZAWA Hiroyuki
2010-03-31  6:07                     ` David Rientjes
2010-03-31  6:13                       ` KAMEZAWA Hiroyuki
2010-03-31  6:30                         ` Balbir Singh
2010-03-31  6:31                           ` KAMEZAWA Hiroyuki
2010-03-31  7:04                             ` David Rientjes
2010-03-31  6:32                           ` David Rientjes
2010-03-31  7:08                             ` [patch -mm] memcg: make oom killer a no-op when no killable task can be found David Rientjes
2010-03-31  7:08                               ` KAMEZAWA Hiroyuki
2010-03-31  8:04                               ` Balbir Singh
2010-03-31 10:38                                 ` David Rientjes
2010-04-04 23:28                               ` David Rientjes
2010-04-05 21:30                                 ` Andrew Morton
2010-04-05 22:40                                   ` David Rientjes
2010-04-05 22:49                                     ` Andrew Morton
2010-04-05 23:01                                       ` David Rientjes
2010-04-06 12:08                                         ` KOSAKI Motohiro
2010-04-06 21:47                                           ` David Rientjes
2010-04-07  0:20                                             ` KAMEZAWA Hiroyuki
2010-04-07 13:29                                               ` KOSAKI Motohiro
2010-04-08 18:05                                                 ` David Rientjes
2010-04-21 19:17                                                   ` Andrew Morton
2010-04-21 22:04                                                     ` David Rientjes
2010-04-22  0:23                                                       ` KAMEZAWA Hiroyuki
2010-04-22  8:34                                                         ` David Rientjes
2010-04-27 22:58                                                       ` [patch -mm] oom: reintroduce and deprecate oom_kill_allocating_task David Rientjes
2010-04-28  0:57                                                         ` KAMEZAWA Hiroyuki
2010-04-22  7:23                                                     ` [patch -mm] memcg: make oom killer a no-op when no killable task can be found Nick Piggin
2010-04-22  7:25                                                       ` KAMEZAWA Hiroyuki
2010-04-22 10:09                                                         ` Nick Piggin
2010-04-22 10:27                                                           ` KAMEZAWA Hiroyuki
2010-04-22 21:11                                                             ` David Rientjes
2010-04-22 10:28                                                           ` David Rientjes
2010-04-22 15:39                                                             ` Nick Piggin
2010-04-22 21:09                                                               ` David Rientjes
2010-05-04 23:55                                                     ` David Rientjes
2010-04-08 17:36                                               ` David Rientjes
2010-04-02 10:17           ` [PATCH] oom killer: break from infinite loop Mel Gorman
2010-04-04 23:26             ` David Rientjes
2010-04-05 10:47               ` Mel Gorman [this message]
2010-04-06 22:40                 ` David Rientjes
2010-03-29 11:31         ` anfei
2010-03-29 11:46           ` Oleg Nesterov
2010-03-29 12:09             ` anfei
2010-03-28  2:46 ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100405104752.GB21207@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=anfei.zhou@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox