linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [rfc patch] mm, oom: fix unnecessary killing of additional processes
Date: Wed, 30 May 2018 14:06:51 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.21.1805301357100.150424@chino.kir.corp.google.com> (raw)
In-Reply-To: <20180528081345.GD1517@dhcp22.suse.cz>

On Mon, 28 May 2018, Michal Hocko wrote:

> > That's not sufficient since the oom reaper is also not able to oom reap if 
> > the mm has blockable mmu notifiers or all memory is shared filebacked 
> > memory, so it immediately sets MMF_OOM_SKIP and additional processes are 
> > oom killed.
> 
> Could you be more specific with a real world example where that is the
> case? I mean the full address space of non-reclaimable file backed
> memory where waiting some more would help? Blockable mmu notifiers are
> a PITA for sure. I wish we could have a better way to deal with them.
> Maybe we can tell them we are in the non-blockable context and have them
> release as much as possible. Still something that a random timeout
> wouldn't help I am afraid.
> 

It's not a random timeout, it's sufficiently long such that we don't oom 
kill several processes needlessly in the very rare case where oom livelock 
would actually prevent the original victim from exiting.  The oom reaper 
processing an mm, finding everything to be mlocked, and immediately 
MMF_OOM_SKIP is inappropriate.  This is rather trivial to reproduce for a 
large memory hogging process that mlocks all of its memory; we 
consistently see spurious and unnecessary oom kills simply because the oom 
reaper has set MMF_OOM_SKIP very early.

This patch introduces a "give up" period such that the oom reaper is still 
allowed to do its good work but only gives up in the hope the victim can 
make forward progress at some substantial period of time in the future.  I 
would understand the objection if oom livelock where the victim cannot 
make forward progress were commonplace, but in the interest of not killing 
several processes needlessly every time a large mlocked process is 
targeted, I think it compels a waiting period.

> Trying to reap a different oom victim when the current one is not making
> progress during the lock contention is certainly something that make
> sense. It has been proposed in the past and we just gave it up because
> it was more complex. Do you have any specific example when this would
> help to justify the additional complexity?
> 

I'm not sure how you're defining complexity, the patch adds ~30 lines of 
code and prevents processes from needlessly being oom killed when oom 
reaping is largely unsuccessful and before the victim finishes 
free_pgtables() and then also allows the oom reaper to operate on multiple 
mm's instead of processing one at a time.  Obviously if there is a delay 
before MMF_OOM_SKIP is set it requires that the oom reaper be able to 
process other mm's, otherwise we stall needlessly for 10s.  Operating on 
multiple mm's in a linked list while waiting for victims to exit during a 
timeout period is thus very much needed, it wouldn't make sense without 
it.

> > But also note that even if oom reaping is possible, in the presence of an 
> > antagonist that continues to allocate memory, that it is possible to oom 
> > kill additional victims unnecessarily if we aren't able to complete 
> > free_pgtables() in exit_mmap() of the original victim.
> 
> If there is unbound source of allocations then we are screwed no matter
> what. We just hope that the allocator will get noticed by the oom killer
> and it will be stopped.
> 

It's not unbounded, it's just an allocator that acts as an antagonist.  At 
the risk of being overly verbose, for system or memcg oom conditions: a 
large mlocked process is oom killed, other processes continue to 
allocate/charge, the oom reaper almost immediately grants MMF_OOM_SKIP 
without being able to free any memory, and the other important processes 
are needlessly oom killed before the original victim can reach 
exit_mmap().  This happens a _lot_.

I'm open to hearing any other suggestions that you have other than waiting 
some time period before MMF_OOM_SKIP gets set to solve this problem.

  reply	other threads:[~2018-05-30 21:06 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-24 21:22 David Rientjes
2018-05-25  0:19 ` Tetsuo Handa
2018-05-25 19:44   ` David Rientjes
2018-05-25  7:26 ` Michal Hocko
2018-05-25 19:36   ` David Rientjes
2018-05-28  8:13     ` Michal Hocko
2018-05-30 21:06       ` David Rientjes [this message]
2018-05-31  6:32         ` Michal Hocko
2018-05-31 21:16           ` David Rientjes
2018-06-01  7:46             ` Michal Hocko
2018-06-05  4:25               ` David Rientjes
2018-06-05  8:57                 ` Michal Hocko
2018-06-13 13:20                   ` Tetsuo Handa
2018-06-13 13:29                     ` Michal Hocko
2018-06-04  5:48 ` [lkp-robot] [mm, oom] 2d251ff6e6: BUG:unable_to_handle_kernel kernel test robot
2018-06-14 20:42 ` [patch] mm, oom: fix unnecessary killing of additional processes David Rientjes
2018-06-15  6:55   ` Michal Hocko
2018-06-15 23:15     ` David Rientjes
2018-06-19  8:33       ` Michal Hocko
2018-06-20 13:03         ` Michal Hocko
2018-06-20 20:34           ` David Rientjes
2018-06-21  7:45             ` Michal Hocko
2018-06-21  7:54               ` Michal Hocko
2018-06-21 20:50               ` David Rientjes
2018-06-22  7:42                 ` Michal Hocko
2018-06-22 14:29                   ` Michal Hocko
2018-06-22 18:49                     ` David Rientjes
2018-06-25  9:04                       ` Michal Hocko
2018-06-19  0:27   ` Andrew Morton
2018-06-19  8:47     ` Michal Hocko
2018-06-19 20:34     ` David Rientjes
2018-06-20 21:59       ` [patch v2] " David Rientjes
2018-06-21 10:58         ` kbuild test robot
2018-06-21 10:58         ` [RFC PATCH] mm, oom: oom_free_timeout_ms can be static kbuild test robot
2018-06-24  2:36   ` [patch] mm, oom: fix unnecessary killing of additional processes Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1805301357100.150424@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox