linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [rfc patch] mm, oom: fix unnecessary killing of additional processes
Date: Thu, 31 May 2018 14:16:34 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.21.1805311400260.74563@chino.kir.corp.google.com> (raw)
In-Reply-To: <20180531063212.GF15278@dhcp22.suse.cz>

On Thu, 31 May 2018, Michal Hocko wrote:

> > It's not a random timeout, it's sufficiently long such that we don't oom 
> > kill several processes needlessly in the very rare case where oom livelock 
> > would actually prevent the original victim from exiting.  The oom reaper 
> > processing an mm, finding everything to be mlocked, and immediately 
> > MMF_OOM_SKIP is inappropriate.  This is rather trivial to reproduce for a 
> > large memory hogging process that mlocks all of its memory; we 
> > consistently see spurious and unnecessary oom kills simply because the oom 
> > reaper has set MMF_OOM_SKIP very early.
> 
> It takes quite some additional steps for admin to allow a large amount
> of mlocked memory and such an application should be really careful to
> not consume too much memory. So how come this is something you see that
> consistently? Is this some sort of bug or an unfortunate workload side
> effect? I am asking this because I really want to see how relevant this
> really is.
> 

The bug is that the oom reaper sets MMF_OOM_SKIP almost immediately after 
the victim has been chosen for oom kill and we get follow-up oom kills, 
not that the process is able to mlock a large amount of memory.  Mlock 
here is only being discussed as a single example.  Tetsuo has brought up 
the example of all shared file-backed memory.  We've discussed the mm 
having a single blockable mmu notifier.  Regardless of how we arrive at 
the point where the oom reaper can't free memory, which could be any of 
those three cases, if (1) the original victim is sufficiently large that 
follow-up oom kills would become unnecessary and (2) other threads 
allocate/charge before the oom victim reaches exit_mmap(), this occurs.

We have examples of cases where oom reaping was successful, but the rss 
numbers in the kernel log are very similar to when it was oom killed and 
the process is known not to mlock, the reason is because the oom reaper 
could free very little memory due to blockable mmu notifiers.

> But the waiting periods just turn out to be a really poor design. There
> will be no good timeout to fit for everybody. We can do better and as
> long as this is the case the timeout based solution should be really
> rejected. It is a shortcut that doesn't really solve the underlying
> problem.
> 

The current implementation is a timeout based solution for mmap_sem, it 
just has the oom reaper spinning trying to grab the sem and eventually 
gives up.  This patch allows it to currently work on other mm's and 
detects the timeout in a different way, with jiffies instead of an 
iterator.

I'd love a solution where we can reliably detect an oom livelock and oom 
kill additional processes but only after the original victim has had a 
chance to do exit_mmap() without a timeout, but I don't see one being 
offered.  Given Tetsuo has seen issues with this in the past and suggested 
a similar proposal means we are not the only ones feeling pain from this.

> > I'm open to hearing any other suggestions that you have other than waiting 
> > some time period before MMF_OOM_SKIP gets set to solve this problem.
> 
> I've already offered one. Make mlocked pages reapable.

Making mlocked pages reapable would only solve the most trivial reproducer 
of this.  Unless the oom reaper can guarantee that it will never block and 
can free all memory that exit_mmap() can free, we need to ensure that a 
victim has a chance to reach the exit path on its own before killing every 
other process on the system.

I'll fix the issue I identified with doing list_add_tail() rather than 
list_add(), fix up the commit message per Tetsuo to identify the other 
possible ways this can occur other than mlock, remove the rfc tag, and 
repost.

  reply	other threads:[~2018-05-31 21:16 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-24 21:22 David Rientjes
2018-05-25  0:19 ` Tetsuo Handa
2018-05-25 19:44   ` David Rientjes
2018-05-25  7:26 ` Michal Hocko
2018-05-25 19:36   ` David Rientjes
2018-05-28  8:13     ` Michal Hocko
2018-05-30 21:06       ` David Rientjes
2018-05-31  6:32         ` Michal Hocko
2018-05-31 21:16           ` David Rientjes [this message]
2018-06-01  7:46             ` Michal Hocko
2018-06-05  4:25               ` David Rientjes
2018-06-05  8:57                 ` Michal Hocko
2018-06-13 13:20                   ` Tetsuo Handa
2018-06-13 13:29                     ` Michal Hocko
2018-06-04  5:48 ` [lkp-robot] [mm, oom] 2d251ff6e6: BUG:unable_to_handle_kernel kernel test robot
2018-06-14 20:42 ` [patch] mm, oom: fix unnecessary killing of additional processes David Rientjes
2018-06-15  6:55   ` Michal Hocko
2018-06-15 23:15     ` David Rientjes
2018-06-19  8:33       ` Michal Hocko
2018-06-20 13:03         ` Michal Hocko
2018-06-20 20:34           ` David Rientjes
2018-06-21  7:45             ` Michal Hocko
2018-06-21  7:54               ` Michal Hocko
2018-06-21 20:50               ` David Rientjes
2018-06-22  7:42                 ` Michal Hocko
2018-06-22 14:29                   ` Michal Hocko
2018-06-22 18:49                     ` David Rientjes
2018-06-25  9:04                       ` Michal Hocko
2018-06-19  0:27   ` Andrew Morton
2018-06-19  8:47     ` Michal Hocko
2018-06-19 20:34     ` David Rientjes
2018-06-20 21:59       ` [patch v2] " David Rientjes
2018-06-21 10:58         ` [RFC PATCH] mm, oom: oom_free_timeout_ms can be static kbuild test robot
2018-06-21 10:58         ` [patch v2] mm, oom: fix unnecessary killing of additional processes kbuild test robot
2018-06-24  2:36   ` [patch] " Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1805311400260.74563@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox