linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, rientjes@google.com, akpm@linux-foundation.org
Subject: Re: [PATCH v2] mm,oom: Re-enable OOM killer using timeout.
Date: Tue, 26 Apr 2016 23:00:15 +0900	[thread overview]
Message-ID: <201604262300.IFD43745.FMOLFJFQOVStHO@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20160425114733.GF23933@dhcp22.suse.cz>

Michal Hocko wrote:
> Hmm, I guess we have already discussed that in the past but I might
> misremember. The above relies on oom killer to be triggered after the
> previous victim was selected. There is no guarantee this will happen.

Why there is no guarantee this will happen?

This OOM livelock is caused by waiting for TIF_MEMDIE threads forever
unconditionally. If oom_unkillable_task() is not called, it is not
the OOM killer's problem.

Are you talking about doing did_some_progress = 1 for !__GFP_FS && !__GFP_NOFAIL
allocations without calling oom_unkillable_task() ? Then, I insist that this is
the page allocator's problem. GFP_NOIO and GFP_NOFS allocations wake up kswapd,
and kswapd does __GFP_FS reclaim. If the kswapd is unable to make forward
progress (an example is
http://I-love.SAKURA.ne.jp/tmp/serial-20160314-too-many-isolated2.txt.xz

----------
[  485.216878] Out of memory: Kill process 1356 (a.out) score 999 or sacrifice child
[  485.219170] Killed process 1356 (a.out) total-vm:4176kB, anon-rss:80kB, file-rss:0kB, shmem-rss:0kB
[  514.255929] MemAlloc-Info: stalling=146 dying=0 exiting=0 victim=0 oom_count=1/226
(...snipped...)
[  540.998623] MemAlloc-Info: stalling=146 dying=0 exiting=0 victim=0 oom_count=1/226
[  571.003817] MemAlloc-Info: stalling=152 dying=0 exiting=0 victim=0 oom_count=1/226
(...snipped...)
[  585.888300] MemAlloc-Info: stalling=152 dying=0 exiting=0 victim=0 oom_count=1/226
----------
), we are already OOM and we need to hear from administrator's decision (i.e.
either fail !__GFP_FS && !__GFP_NOFAIL allocations or select an OOM victim).
This OOM livelock is caused by waiting for somebody else forever unconditionally.

These OOM livelocks are caused by lack of mechanism for hearing administrator's
policy. We are missing rescue mechanisms which are needed for recovering from
situations your model did not expect.

I'm talking about corner cases where your deterministic approach fail. What we
need is "stop waiting for something forever unconditionally" and "hear what the
administrator wants to do". You can deprecate and then remove sysctl knobs for
hearing what the administrator wants to do when you developed perfect model and
mechanism.

> Why cannot we get back to the timer based solution at least for the
> panic timeout?

Use of global timer can cause false positive panic() calls.
Timeout should be calculated for per task_struct or signal_struct basis.

Also, although a different problem, global timer based solution does not
work for OOM livelock without any TIF_MEMDIE thread case (an example
shown above).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-04-26 14:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-19 15:06 [PATCH] " Tetsuo Handa
2016-04-19 20:07 ` Michal Hocko
2016-04-19 21:55   ` Tetsuo Handa
2016-04-20 10:37     ` [PATCH v2] " Tetsuo Handa
2016-04-25 11:47       ` Michal Hocko
2016-04-26 14:00         ` Tetsuo Handa [this message]
2016-04-26 14:31           ` Michal Hocko
2016-04-27 10:43             ` Tetsuo Handa
2016-04-20 14:47     ` [PATCH] " Michal Hocko
2016-04-21 11:49       ` Tetsuo Handa
2016-04-21 13:07         ` Michal Hocko
2016-04-24 14:19           ` Tetsuo Handa
2016-04-25  9:55             ` Michal Hocko
2016-04-26 13:54               ` Michal Hocko
2016-04-27 10:43                 ` Tetsuo Handa
2016-04-27 11:11                   ` Michal Hocko
2016-05-14  0:39                     ` Tetsuo Handa
2016-05-16 14:18                       ` Michal Hocko
2016-05-17 11:08                         ` Tetsuo Handa
2016-05-17 12:51                           ` Michal Hocko
2016-04-26 14:00               ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201604262300.IFD43745.FMOLFJFQOVStHO@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox