linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	vedran.furac@gmail.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	minchan.kim@gmail.com, Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: Memory overcommit
Date: Tue, 27 Oct 2009 14:04:29 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.0910271351140.9183@chino.kir.corp.google.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0910271843510.11372@sister.anvils>

On Tue, 27 Oct 2009, Hugh Dickins wrote:

> When preparing KSM unmerge to handle OOM, I looked at how the precedent
> was handled by running a little program which mmaps an anonymous region
> of the same size as physical memory, then tries to mlock it.  The
> program was such an obvious candidate to be killed, I was shocked
> by the poor decisions the OOM killer made.  Usually I ran it with
> mem=512M, with gnome and firefox active.  Often the OOM killer killed
> it right the first time, but went wrong when I tried it a second time
> (I think that's because of what's already swapped out the first time).
> 

The heuristics that the oom killer use in selecting a task seem to get 
debated quite often.

What hasn't been mentioned is that total_vm does do a good job of 
identifying tasks that are using far more memory than expected.  That 
seems to be the initial target: killing a rogue task that is hogging much 
more memory than it should, probably because of a memory leak.

The latest approach seems to be focused more on killing the task that will 
free the most resident memory.  That certainly is understandable to avoid 
killing additional tasks later and avoiding subsequent page allocations in 
the short term, but doesn't help to kill the memory leaker.

There's advantages to either approach, but it depends on the contextual 
goal of the oom killer when it's called: kill a rogue task that is 
allocating more memory than expected, or kill a task that will free the 
most memory.

> 1.  select_bad_process() tries to avoid killing another process while
> there's still a TIF_MEMDIE, but its loop starts by skipping !p->mm
> processes.  However, p->mm is set to NULL well before p reaches
> exit_mmap() to actually free the memory, and there may be significant
> delays in between (I think exit_robust_list() gave me a hang at one
> stage).  So in practice, even when the OOM killer selects the right
> process to kill, there can be lots of collateral damage from it not
> waiting long enough for that process to give up its memory.
> 
> I tried to deal with that by moving the TIF_MEMDIE test up before
> the p->mm test, but adding in a check on p->exit_state:
> 		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> 		    !p->exit_state)
> 			return ERR_PTR(-1UL);
> But this is then liable to hang the system if there's some reason
> why the selected process cannot proceed to free its memory (e.g.
> the current KSM unmerge case).  It needs to wait "a while", but
> give up if no progress is made, instead of hanging: originally
> I thought that setting PF_MEMALLOC more widely in page_alloc.c,
> and giving up on the TIF_MEMDIE if it was waiting in PF_MEMALLOC,
> would deal with that; but we cannot be sure that waiting of memory
> is the only reason for a holdup there (in the KSM unmerge case it's
> waiting for an mmap_sem, and there may well be other such cases).
> 

I've proposed an oom killer timeout in the past which adds a jiffies count 
to struct task_struct and will defer killing other tasks until the 
predefined time limit (we use 10*HZ) has been exceeded.  The problem is 
that even if you kill another task, it is highly unlikely that the expired 
task will ever exit at that point and is still holding a substantial 
amount of memory since it also had access to memory reserves and has still 
failed to exit.

> 2.  I started out running my mlock test program as root (later
> switched to use "ulimit -l unlimited" first).  But badness() reckons
> CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points;
> and CAP_SYS_RAWIO another reason to quarter your points: so running
> as root makes you sixteen times less likely to be killed.  Quartering
> is anyway debatable, but sixteenthing seems utterly excessive to me.
> 
> I moved the CAP_SYS_RAWIO test in with the others, so it does no
> more than quartering; but is quartering appropriate anyway?  I did
> wonder if I was right to be "subverting" the fine-grained CAPs in
> this way, but have since seen unrelated mail from one who knows
> better, implying they're something of a fantasy, that su and sudo
> are indeed what's used in the real world.  Maybe this patch was okay.
> 

I think someone (Nick?) proposed a patch at one time that removed most of 
the heuristics from select_bad_process() other than total_vm of the task 
and its children, mems_allowed intersection, and oom_adj.

> 4.  In some cases those children are sharing exactly the same mm,
> yet its total_vm is being added again and again to the points:
> I had a nasty inner loop searching back to see if we'd already
> counted this mm (but then, what if the different tasks sharing
> the mm deserved different adjustments to the total_vm?).
> 

oom_kill_process() may not kill the task selected by select_bad_process(), 
it will first attempt to kill one of these children with a different mm.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-10-27 21:04 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <hav57c$rso$1@ger.gmane.org>
     [not found] ` <20091013120840.a844052d.kamezawa.hiroyu@jp.fujitsu.com>
     [not found]   ` <hb2cfu$r08$2@ger.gmane.org>
     [not found]     ` <20091014135119.e1baa07f.kamezawa.hiroyu@jp.fujitsu.com>
2009-10-20 21:52       ` Vedran Furač
2009-10-26  1:55         ` KAMEZAWA Hiroyuki
2009-10-26 16:16           ` Vedran Furač
2009-10-27  3:22             ` KAMEZAWA Hiroyuki
2009-10-27  6:10               ` KOSAKI Motohiro
2009-10-27  6:34                 ` Minchan Kim
2009-10-27  6:36                   ` KAMEZAWA Hiroyuki
2009-10-27  6:55                     ` Minchan Kim
2009-10-27  7:45                       ` [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: " KAMEZAWA Hiroyuki
2009-10-27  7:56                         ` Minchan Kim
2009-10-27 12:38                           ` Andrea Arcangeli
2009-10-28  0:22                             ` KAMEZAWA Hiroyuki
2009-10-28  0:45                               ` Vedran Furač
2009-10-27  7:56                         ` KAMEZAWA Hiroyuki
2009-10-27  8:14                           ` Minchan Kim
2009-10-27  8:33                             ` KAMEZAWA Hiroyuki
2009-10-27  8:52                               ` Minchan Kim
2009-10-27  8:56                                 ` KAMEZAWA Hiroyuki
2009-10-27 17:41                         ` Vedran Furač
2009-10-28  0:13                           ` KAMEZAWA Hiroyuki
2009-10-27 18:39                         ` Hugh Dickins
2009-10-27 18:47                           ` Andrea Arcangeli
2009-10-28  0:32                             ` KAMEZAWA Hiroyuki
2009-11-05 19:02                             ` Pavel Machek
2009-10-28  0:28                           ` KAMEZAWA Hiroyuki
2009-10-27  6:46                   ` KOSAKI Motohiro
2009-10-27  6:56                     ` Minchan Kim
2009-10-27 17:12               ` Vedran Furač
2009-10-27 18:02                 ` KOSAKI Motohiro
2009-10-27 18:30                   ` Vedran Furač
2009-10-27 20:44               ` Hugh Dickins
2009-10-27 21:04                 ` David Rientjes [this message]
2009-10-28  0:08                   ` Vedran Furač
2009-10-28  0:25                     ` David Rientjes
2009-10-28  0:39                       ` Vedran Furač
2009-10-28  4:08                         ` David Rientjes
2009-10-28  4:55                           ` KAMEZAWA Hiroyuki
2009-10-28  5:13                             ` David Rientjes
2009-10-28  6:05                               ` KAMEZAWA Hiroyuki
2009-10-28  6:17                                 ` David Rientjes
2009-10-28  6:20                                   ` KAMEZAWA Hiroyuki
2009-10-29  8:38                                     ` David Rientjes
2009-10-29 11:11                                       ` Vedran Furač
2009-10-29 19:53                                         ` David Rientjes
2009-10-29 23:48                                           ` KAMEZAWA Hiroyuki
2009-10-30  9:10                                             ` David Rientjes
2009-10-30  9:36                                               ` KAMEZAWA Hiroyuki
2009-11-03 20:49                                                 ` David Rientjes
2009-11-04  0:50                                                   ` KAMEZAWA Hiroyuki
2009-11-04  1:58                                                     ` David Rientjes
2009-11-04  2:17                                                       ` KAMEZAWA Hiroyuki
2009-11-04  3:10                                                         ` David Rientjes
2009-11-04  3:19                                                           ` KAMEZAWA Hiroyuki
2009-10-30 13:59                                           ` Vedran Furač
2009-10-30 19:24                                             ` David Rientjes
2009-11-02 19:58                                               ` Vedran Furač
2009-10-28 13:28                           ` Vedran Furač
2009-10-28 20:10                             ` David Rientjes
2009-10-29  3:05                               ` Vedran Furač
2009-10-29  8:35                                 ` David Rientjes
2009-10-29 11:01                                   ` Vedran Furač
2009-10-29 19:42                                     ` David Rientjes
2009-10-30 13:53                                       ` Vedran Furač
2009-10-30 14:08                                         ` Thomas Fjellstrom
2009-10-30 15:13                                           ` Vedran Furač
2009-10-30 14:12                                         ` Andrea Arcangeli
2009-10-30 14:41                                           ` Vedran Furač
2009-10-30 15:15                                             ` Andrea Arcangeli
2009-10-30 16:24                                               ` Hugh Dickins
2009-11-02 19:56                                               ` Vedran Furač
2009-10-30 19:44                                         ` David Rientjes
2009-11-02 19:56                                           ` Vedran Furač
2009-10-28  0:43                 ` KAMEZAWA Hiroyuki
2009-10-28  2:47                 ` KOSAKI Motohiro
2009-10-28  3:17                   ` KAMEZAWA Hiroyuki
2009-10-28  4:12                   ` David Rientjes
2009-10-28  8:10                     ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.0910271351140.9183@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=vedran.furac@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox