linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ying Han <yinghan@google.com>,
	linux-mm@kvack.org, Rohit Seth <rohitseth@google.com>,
	Paul Menage <menage@google.com>
Subject: Re: [RFC][PATCH]Per-cgroup OOM handler
Date: Tue, 11 Nov 2008 00:14:54 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.00.0811102356350.27365@chino.kir.corp.google.com> (raw)
In-Reply-To: <20081111162812.492218fc.kamezawa.hiroyu@jp.fujitsu.com>

Sorry, there's been some confusion in this proposal.

On Tue, 11 Nov 2008, KAMEZAWA Hiroyuki wrote:

> >     Here is how we do the one-tick-wait in cgroup_should_oom() in oom_kill.c
> >     >-------if (!ret) {
> >     >------->-------/* If we're not going to OOM, we should sleep for a
> >     >------->------- * bit to give userspace a chance to respond before we
> >     >------->------- * go back and try to reclaim again */
> >     >------->-------schedule_timeout_uninterruptible(1);
> >     >-------}
> >    and it works well in-house so far as i mentioned earlier. what's
> > important here is not "sleeping for one tick", the idea here is to
> > reschedule the ooming thread so the oom handler can make action ( like
> > adding memory node to the cpuset) and the subsequent page allocator in
> > get_page_from_freelist() can use it.
> > 
> Can't we avoid this kind of magical one-tick wait ?
> 

cgroup_should_oom() determines whether the oom killer should be invoked or 
whether userspace should be given the opportunity to act first; it returns 
zero only when the kernel has deferred to userspace.

In these situations, the kernel will return to the page allocator to 
attempt the allocation again.  If current were not rescheduled like this 
(the schedule_timeout is simply more powerful than a cond_resched), there 
is a very high liklihood that this subsequent allocation attempt would 
fail just as it did before the oom killer was triggered and then we'd 
enter reclaim unnecessarily when userspace could have reclaimed on its 
own, killed a task by overriding the kernels' heuristics, added a node to 
the cpuset, increased its memcg allocation, etc.

So this reschedule simply prevents needlessly entering reclaim, just as 
its comment indicates.

> > > (Before OOM, the system tend to wait in congestion_wait() or some.)
> > 

Yes, we wait on block congestion as part of direct reclaim but at this 
point we've yet to notify the userspace oom handler so that it may act to 
avoid invoking the oom kiler.

> Hmm, from discussion of mem_notify handler in Feb/March of this year,
> oom-hanlder cannot works well if memory is near to OOM, in general.
> Then, mlockall was recomemded to handler.
> (and it must not do file access.)
> 

This would be a legitimate point if we were talking about a system-wide 
oom notifier like /dev/mem_notify was and we were addressing unconstrained 
ooms.  This patch was specific to cgroups and the only likely usecases are 
for either cpusets or memcg.

> I wonder creating small cpuset (and isolated node) for oom-handler may be
> another help.
> 

This is obviously a pure userspace issue; any sane oom handler that is 
itself subjected to the same memory constraints would be written to avoid 
memory allocations when woken up.

> > > I'm wondering
> > >  - freeeze-all-threads-in-group-at-oom
> > >  - free emergency memory to page allocator which was pooled at cgroup
> > > creation
> > >    rather than 1-tick wait
> > >
> > > BTW, it seems this patch allows task detach/attach always. it's safe(and
> > > sane) ?
> > 
> >    yes, we allows task detach/attach. So far we don't see any race condition
> > except the livelock
> > i mentioned above. Any particular scenario can think of now? thanks
> > 
> I don't find it ;)
> BTW, shouldn't we disable preempt(or irq) before taking spinlocks ?
> 

I don't know which spinlock you're specifically referring to here, but the 
oom killer (and thus the handling of the oom handler) is invoked in 
process context with irqs enabled.

		David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-11-11  8:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-03 21:40 Ying Han
2008-11-03 22:19 ` Ying Han
2008-11-06  5:34   ` KAMEZAWA Hiroyuki
     [not found]     ` <604427e00811102042x202906ecq2a10eb5e404e2ec9@mail.gmail.com>
2008-11-11  7:28       ` KAMEZAWA Hiroyuki
2008-11-11  8:14         ` David Rientjes [this message]
2008-11-11  8:27       ` Paul Menage

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.0811102356350.27365@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=menage@google.com \
    --cc=rohitseth@google.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox