linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: vedran.furac@gmail.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kosaki.motohiro@jp.fujitsu.com,
	minchan.kim@gmail.com, akpm@linux-foundation.org,
	rientjes@google.com, aarcange@redhat.com
Subject: Re: Memory overcommit
Date: Wed, 28 Oct 2009 09:43:43 +0900	[thread overview]
Message-ID: <20091028094343.d59fec94.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0910271843510.11372@sister.anvils>

On Tue, 27 Oct 2009 20:44:16 +0000 (GMT)
Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:

> On Tue, 27 Oct 2009, KAMEZAWA Hiroyuki wrote:
> > Sigh, gnome-session has twice value of mmap(1G).
> > Of course, gnome-session only uses 6M bytes of anon.
> > I wonder this is because gnome-session has many children..but need to
> > dig more. Does anyone has idea ?
> 
> When preparing KSM unmerge to handle OOM, I looked at how the precedent
> was handled by running a little program which mmaps an anonymous region
> of the same size as physical memory, then tries to mlock it.  The
> program was such an obvious candidate to be killed, I was shocked
> by the poor decisions the OOM killer made.  Usually I ran it with
> mem=512M, with gnome and firefox active.  Often the OOM killer killed
> it right the first time, but went wrong when I tried it a second time
> (I think that's because of what's already swapped out the first time).
> 
> I built up a patchset of fixes, but once I came to split them up for
> submission, not one of them seemed entirely satisfactory; and Andrea's
> fix to the KSM/mlock deadlock forced me to abandon even the first of
> the patches (we've since then fixed the way munlocking behaves, so
> in theory could revisit that; but Andrea disliked what I was trying
> to do there in KSM for other reasons, so I've not touched it since).
> I had to get on with KSM, so I set it all aside: none of the issues
> was a recent regression.
> 
> I did briefly wonder about the reliance on total_vm which you're now
> looking into, but didn't touch that at all.  Let me describe those
> issues which I did try but fail to fix - I've no more time to deal
> with them now than then, but ought at least to mention them to you.
> 
Okay, thank you for detailed information.


> 1.  select_bad_process() tries to avoid killing another process while
> there's still a TIF_MEMDIE, but its loop starts by skipping !p->mm
> processes.  However, p->mm is set to NULL well before p reaches
> exit_mmap() to actually free the memory, and there may be significant
> delays in between (I think exit_robust_list() gave me a hang at one
> stage).  So in practice, even when the OOM killer selects the right
> process to kill, there can be lots of collateral damage from it not
> waiting long enough for that process to give up its memory.
> 
Hmm.

> I tried to deal with that by moving the TIF_MEMDIE test up before
> the p->mm test, but adding in a check on p->exit_state:
> 		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> 		    !p->exit_state)
> 			return ERR_PTR(-1UL);
> But this is then liable to hang the system if there's some reason
> why the selected process cannot proceed to free its memory (e.g.
> the current KSM unmerge case).  It needs to wait "a while", but
> give up if no progress is made, instead of hanging: originally
> I thought that setting PF_MEMALLOC more widely in page_alloc.c,
> and giving up on the TIF_MEMDIE if it was waiting in PF_MEMALLOC,
> would deal with that; but we cannot be sure that waiting of memory
> is the only reason for a holdup there (in the KSM unmerge case it's
> waiting for an mmap_sem, and there may well be other such cases).
> 
ok, then, easy handling can't be a help.

> 2.  I started out running my mlock test program as root (later
> switched to use "ulimit -l unlimited" first).  But badness() reckons
> CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points;
> and CAP_SYS_RAWIO another reason to quarter your points: so running
> as root makes you sixteen times less likely to be killed.  Quartering
> is anyway debatable, but sixteenthing seems utterly excessive to me.
> 
I can't agree that part of heuristics, either.

> I moved the CAP_SYS_RAWIO test in with the others, so it does no
> more than quartering; but is quartering appropriate anyway?  I did
> wonder if I was right to be "subverting" the fine-grained CAPs in
> this way, but have since seen unrelated mail from one who knows
> better, implying they're something of a fantasy, that su and sudo
> are indeed what's used in the real world.  Maybe this patch was okay.
> 
ok.



> 3.  badness() has a comment above it which says:  
>  * 5) we try to kill the process the user expects us to kill, this
>  *    algorithm has been meticulously tuned to meet the principle
>  *    of least surprise ... (be careful when you change it)
> But Andrea's 2.6.11 86a4c6d9e2e43796bb362debd3f73c0e3b198efa (later
> refined by Kurt's 2.6.16 9827b781f20828e5ceb911b879f268f78fe90815)
> adds plenty of surprise there, by trying to factor children into the
> calculation.  Intended to deal with forkbombs, but any reasonable
> process whose purpose is to fork children (e.g. gnome-session)
> becomes very vulnerable.  And whereas badness() itself goes on to
> refine the total_vm points by various adjustments peculiar to the
> process in question, those refinements have been ignored when
> adding the child's total_vm/2.  (Andrea does remark that he'd
> rather have rewritten badness() from scratch.)
> 
> I tried to fix this by moving the PF_OOM_ORIGIN (was PF_SWAPOFF)
> part of the calculation up to select_bad_process(), making a
> solo_badness() function which makes all those adjustments to
> total_vm, then badness() itself a simple function adding half
> the children's solo_badness()es to the process' own solo_badness().
> But probably lots more needs doing - Andrea's rewrite?
> 
> 4.  In some cases those children are sharing exactly the same mm,
> yet its total_vm is being added again and again to the points:
> I had a nasty inner loop searching back to see if we'd already
> counted this mm (but then, what if the different tasks sharing
> the mm deserved different adjustments to the total_vm?).
> 
> 
> I hope these notes help someone towards a better solution
> (and be prepared to discover more on the way).  I agree with
> Vedran that the present behaviour is pretty unimpressive, and
> I'm puzzled as to how people can have been tinkering with
> oom_kill.c down the years without seeing any of this.
> 

Sorry, I usually don't use X on servers and almost all recent my OOM test
was done under memcg ;(
Thank you for your investigation. Maybe I'll need several steps.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-10-28  0:57 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <hav57c$rso$1@ger.gmane.org>
     [not found] ` <20091013120840.a844052d.kamezawa.hiroyu@jp.fujitsu.com>
     [not found]   ` <hb2cfu$r08$2@ger.gmane.org>
     [not found]     ` <20091014135119.e1baa07f.kamezawa.hiroyu@jp.fujitsu.com>
2009-10-20 21:52       ` Vedran Furač
2009-10-26  1:55         ` KAMEZAWA Hiroyuki
2009-10-26 16:16           ` Vedran Furač
2009-10-27  3:22             ` KAMEZAWA Hiroyuki
2009-10-27  6:10               ` KOSAKI Motohiro
2009-10-27  6:34                 ` Minchan Kim
2009-10-27  6:36                   ` KAMEZAWA Hiroyuki
2009-10-27  6:55                     ` Minchan Kim
2009-10-27  7:45                       ` [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: " KAMEZAWA Hiroyuki
2009-10-27  7:56                         ` Minchan Kim
2009-10-27 12:38                           ` Andrea Arcangeli
2009-10-28  0:22                             ` KAMEZAWA Hiroyuki
2009-10-28  0:45                               ` Vedran Furač
2009-10-27  7:56                         ` KAMEZAWA Hiroyuki
2009-10-27  8:14                           ` Minchan Kim
2009-10-27  8:33                             ` KAMEZAWA Hiroyuki
2009-10-27  8:52                               ` Minchan Kim
2009-10-27  8:56                                 ` KAMEZAWA Hiroyuki
2009-10-27 17:41                         ` Vedran Furač
2009-10-28  0:13                           ` KAMEZAWA Hiroyuki
2009-10-27 18:39                         ` Hugh Dickins
2009-10-27 18:47                           ` Andrea Arcangeli
2009-10-28  0:32                             ` KAMEZAWA Hiroyuki
2009-11-05 19:02                             ` Pavel Machek
2009-10-28  0:28                           ` KAMEZAWA Hiroyuki
2009-10-27  6:46                   ` KOSAKI Motohiro
2009-10-27  6:56                     ` Minchan Kim
2009-10-27 17:12               ` Vedran Furač
2009-10-27 18:02                 ` KOSAKI Motohiro
2009-10-27 18:30                   ` Vedran Furač
2009-10-27 20:44               ` Hugh Dickins
2009-10-27 21:04                 ` David Rientjes
2009-10-28  0:08                   ` Vedran Furač
2009-10-28  0:25                     ` David Rientjes
2009-10-28  0:39                       ` Vedran Furač
2009-10-28  4:08                         ` David Rientjes
2009-10-28  4:55                           ` KAMEZAWA Hiroyuki
2009-10-28  5:13                             ` David Rientjes
2009-10-28  6:05                               ` KAMEZAWA Hiroyuki
2009-10-28  6:17                                 ` David Rientjes
2009-10-28  6:20                                   ` KAMEZAWA Hiroyuki
2009-10-29  8:38                                     ` David Rientjes
2009-10-29 11:11                                       ` Vedran Furač
2009-10-29 19:53                                         ` David Rientjes
2009-10-29 23:48                                           ` KAMEZAWA Hiroyuki
2009-10-30  9:10                                             ` David Rientjes
2009-10-30  9:36                                               ` KAMEZAWA Hiroyuki
2009-11-03 20:49                                                 ` David Rientjes
2009-11-04  0:50                                                   ` KAMEZAWA Hiroyuki
2009-11-04  1:58                                                     ` David Rientjes
2009-11-04  2:17                                                       ` KAMEZAWA Hiroyuki
2009-11-04  3:10                                                         ` David Rientjes
2009-11-04  3:19                                                           ` KAMEZAWA Hiroyuki
2009-10-30 13:59                                           ` Vedran Furač
2009-10-30 19:24                                             ` David Rientjes
2009-11-02 19:58                                               ` Vedran Furač
2009-10-28 13:28                           ` Vedran Furač
2009-10-28 20:10                             ` David Rientjes
2009-10-29  3:05                               ` Vedran Furač
2009-10-29  8:35                                 ` David Rientjes
2009-10-29 11:01                                   ` Vedran Furač
2009-10-29 19:42                                     ` David Rientjes
2009-10-30 13:53                                       ` Vedran Furač
2009-10-30 14:08                                         ` Thomas Fjellstrom
2009-10-30 15:13                                           ` Vedran Furač
2009-10-30 14:12                                         ` Andrea Arcangeli
2009-10-30 14:41                                           ` Vedran Furač
2009-10-30 15:15                                             ` Andrea Arcangeli
2009-10-30 16:24                                               ` Hugh Dickins
2009-11-02 19:56                                               ` Vedran Furač
2009-10-30 19:44                                         ` David Rientjes
2009-11-02 19:56                                           ` Vedran Furač
2009-10-28  0:43                 ` KAMEZAWA Hiroyuki [this message]
2009-10-28  2:47                 ` KOSAKI Motohiro
2009-10-28  3:17                   ` KAMEZAWA Hiroyuki
2009-10-28  4:12                   ` David Rientjes
2009-10-28  8:10                     ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091028094343.d59fec94.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=rientjes@google.com \
    --cc=vedran.furac@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox