linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: linux-mm@kvack.org, rientjes@google.com, hannes@cmpxchg.org,
	tj@kernel.org, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC -v2] panic_on_oom_timeout
Date: Wed, 17 Jun 2015 17:41:59 +0200	[thread overview]
Message-ID: <20150617154159.GJ25056@dhcp22.suse.cz> (raw)
In-Reply-To: <201506172259.EAI00575.OFQtVFFSHMOLJO@I-love.SAKURA.ne.jp>

On Wed 17-06-15 22:59:54, Tetsuo Handa wrote:
> Michal Hocko wrote:
[...]
> > But you have a point that we could have
> > - constrained OOM which elevates oom_victims
> > - global OOM killer strikes but wouldn't start the timer
> > 
> > This is certainly possible and timer_pending(&panic_on_oom) replacing
> > oom_victims check should help here. I will think about this some more.
> 
> Yes, please.

Fixed in my local version. I will post the new version of the patch
after we settle with the approach.
 
> > The important thing is to decide what is the reasonable way forward. We
> > have two two implementations of panic based timeout. So we should decide
> > - Should we add a panic timeout at all?
> > - Should be the timeout bound to panic_on_oom?
> > - Should we care about constrained OOM contexts?
> > - If yes should they use the same timeout?
> > - If no should each memcg be able to define its own timeout?
> > 
> Exactly.
> 
> > My thinking is that it should be bound to panic_on_oom=1 only until we
> > hear from somebody actually asking for a constrained oom and even then
> > do not allow for too large configuration space (e.g. no per-memcg
> > timeout) or have separate mempolicy vs. memcg timeouts.
> 
> My implementation comes from providing debugging hints when analyzing
> vmcore of a stalled system. I'm posting logs of stalls after global OOM
> killer struck because it is easy to reproduce. But what I have problem
> is when a system stalled before the OOM killer strikes (I saw many cases
> for customer's enterprise servers), for we don't have hints for guessing
> whether memory allocator is the cause or not. Thus, my version tried to
> emit warning messages using sysctl_memalloc_task_warn_secs .

I can understand your frustration but I believe that a debugability is
a separate topic and we should start by defining a reasonable _policy_
so that an administrator has a way to handle potential OOM stalls
reasonably and with a well defined semantic.

> Ability to take care of constrained OOM contexts is a side effect of use of
> per a "struct task_struct" variable. Even if we come to a conclusion that
> we should not add a timeout for panic, I hope that a timeout for warning
> about memory allocation stalls is added.
> 
> > Let's start simple and make things more complicated later!
> 
> I think we mismatch about what the timeout counters are for.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-06-17 15:42 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-09 17:03 [RFC] panic_on_oom_timeout Michal Hocko
2015-06-10 12:20 ` Tetsuo Handa
2015-06-10 14:28   ` Michal Hocko
2015-06-10 15:56     ` Michal Hocko
2015-06-12 15:23       ` Tetsuo Handa
2015-06-15 12:45         ` Michal Hocko
2015-06-16 13:14           ` Tetsuo Handa
2015-06-16 13:46             ` Michal Hocko
2015-06-17 12:16               ` Tetsuo Handa
2015-06-17 12:36                 ` Michal Hocko
2015-06-11 13:12     ` Tetsuo Handa
2015-06-11 14:18       ` Michal Hocko
2015-06-11 14:45         ` Tetsuo Handa
2015-06-11 15:38           ` Michal Hocko
2015-06-17 12:11 ` [RFC -v2] panic_on_oom_timeout Michal Hocko
2015-06-17 12:31   ` Tetsuo Handa
2015-06-17 12:51     ` Michal Hocko
2015-06-17 13:24       ` Michal Hocko
2015-07-29 11:55         ` Michal Hocko
2015-07-29 13:20           ` Tetsuo Handa
2015-06-17 13:59       ` Tetsuo Handa
2015-06-17 15:41         ` Michal Hocko [this message]
2015-06-19 11:30           ` Tetsuo Handa
2015-06-19 15:36             ` Michal Hocko
2015-06-19 18:54               ` Tetsuo Handa
2015-06-20  7:57                 ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150617154159.GJ25056@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox