linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] mm, oom: fix unnecessary killing of additional processes
Date: Fri, 15 Jun 2018 16:15:39 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.21.1806151559360.49038@chino.kir.corp.google.com> (raw)
In-Reply-To: <20180615065541.GA24039@dhcp22.suse.cz>

On Fri, 15 Jun 2018, Michal Hocko wrote:

> > Signed-off-by: David Rientjes <rientjes@google.com>
> 
> Nacked-by: Michal Hocko <mhocko@suse.com>
> as already explained elsewhere in this email thread.
> 

I don't find this to be surprising, but I'm not sure that it actually 
matters if you won't fix a regression that you introduced.  Tetsuo 
initially found this issue and presented a similar solution, so I think 
his feedback on this is more important since it would fix a problem for 
him as well.

> > ---
> >  Note: I understand there is an objection based on timeout based delays.
> >  This is currently the only possible way to avoid oom killing important
> >  processes completely unnecessarily.  If the oom reaper can someday free
> >  all memory, including mlocked memory and those mm's with blockable mmu
> >  notifiers, and is guaranteed to always be able to grab mm->mmap_sem,
> >  this can be removed.  I do not believe any such guarantee is possible
> >  and consider the massive killing of additional processes unnecessarily
> >  to be a regression introduced by the oom reaper and its very quick
> >  setting of MMF_OOM_SKIP to allow additional processes to be oom killed.
> 
> If you find oom reaper more harmful than useful I would be willing to
> ack a comman line option to disable it. Especially when you keep
> claiming that the lockups are not really happening in your environment.
> 

There's no need to disable it, we simply need to ensure that it doesn't 
set MMF_OOM_SKIP too early, which my patch does.  We also need to avoid 
setting MMF_OOM_SKIP in exit_mmap() until after all memory has been freed, 
i.e. after free_pgtables().

I'd be happy to make the this timeout configurable, however, and default 
it to perhaps one second as the blockable mmu notifier timeout in your own 
code does.  I find it somewhat sad that we'd need a sysctl for this, but 
if that will appease you and it will help to move this into -mm then we 
can do that.

> Other than that I've already pointed to a more robust solution. If you
> are reluctant to try it out I will do, but introducing a timeout is just
> papering over the real problem. Maybe we will not reach the state that
> _all_ the memory is reapable but we definitely should try to make as
> much as possible to be reapable and I do not see any fundamental
> problems in that direction.

You introduced the timeout already, I'm sure you realized yourself that 
the oom reaper sets MMF_OOM_SKIP much too early.  Trying to grab 
mm->mmap_sem 10 times in a row with HZ/10 sleeps in between is a timeout.  
If there are blockable mmu notifiers, your code puts the oom reaper to 
sleep for HZ before setting MMF_OOM_SKIP, which is a timeout.  This patch 
moves the timeout to reaching exit_mmap() where we actually free all 
memory possible and still allow for additional oom killing if there is a 
very rare oom livelock.

You haven't provided any data that suggests oom livelocking isn't a very 
rare event and that we need to respond immediately by randomly killing 
more and more processes rather than wait a bounded period of time to allow 
for forward progress to be made.  I have constantly provided data showing 
oom livelock in our fleet is extremely rare, less than 0.04% of the time.  
Yet your solution is to kill many processes so this 0.04% is fast.

The reproducer on powerpc is very simple.  Do an mmap() and mlock() the 
length.  Fork one 120MB process that does that and two 60MB processes that 
do that in a 128MB memcg.

[  402.064375] Killed process 17024 (a.out) total-vm:134080kB, anon-rss:122032kB, file-rss:1600kB
[  402.107521] Killed process 17026 (a.out) total-vm:64448kB, anon-rss:44736kB, file-rss:1600kB

Completely reproducible and completely unnecessary.  Killing two processes 
pointlessly when the first oom kill would have been successful.

Killing processes is important, optimizing for 0.04% of cases of true oom 
livelock by insisting everybody tolerate excessive oom killing is not.  If 
you have data to suggest the 0.04% is higher, please present it.  I'd be 
interested in any data you have that suggests its higher and has even 
1/1,000,000th oom occurrence rate that I have shown.

It's inappropriate to merge code that oom kills many processes 
unnecessarily when one happens to be mlocked or have blockable mmu 
notifiers or when mm->mmap_sem can't be grabbed fast enough but forward 
progress is actually being made.  It's a regression, and it impacts real 
users.  Insisting that we fix the problem you introduced by making all mmu 
notifiers unblockable and mlocked memory can always be reaped and 
mm->mmap_sem can always be grabbed within a second is irresponsible.

  reply	other threads:[~2018-06-15 23:15 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-24 21:22 [rfc patch] " David Rientjes
2018-05-25  0:19 ` Tetsuo Handa
2018-05-25 19:44   ` David Rientjes
2018-05-25  7:26 ` Michal Hocko
2018-05-25 19:36   ` David Rientjes
2018-05-28  8:13     ` Michal Hocko
2018-05-30 21:06       ` David Rientjes
2018-05-31  6:32         ` Michal Hocko
2018-05-31 21:16           ` David Rientjes
2018-06-01  7:46             ` Michal Hocko
2018-06-05  4:25               ` David Rientjes
2018-06-05  8:57                 ` Michal Hocko
2018-06-13 13:20                   ` Tetsuo Handa
2018-06-13 13:29                     ` Michal Hocko
2018-06-04  5:48 ` [lkp-robot] [mm, oom] 2d251ff6e6: BUG:unable_to_handle_kernel kernel test robot
2018-06-14 20:42 ` [patch] mm, oom: fix unnecessary killing of additional processes David Rientjes
2018-06-15  6:55   ` Michal Hocko
2018-06-15 23:15     ` David Rientjes [this message]
2018-06-19  8:33       ` Michal Hocko
2018-06-20 13:03         ` Michal Hocko
2018-06-20 20:34           ` David Rientjes
2018-06-21  7:45             ` Michal Hocko
2018-06-21  7:54               ` Michal Hocko
2018-06-21 20:50               ` David Rientjes
2018-06-22  7:42                 ` Michal Hocko
2018-06-22 14:29                   ` Michal Hocko
2018-06-22 18:49                     ` David Rientjes
2018-06-25  9:04                       ` Michal Hocko
2018-06-19  0:27   ` Andrew Morton
2018-06-19  8:47     ` Michal Hocko
2018-06-19 20:34     ` David Rientjes
2018-06-20 21:59       ` [patch v2] " David Rientjes
2018-06-21 10:58         ` kbuild test robot
2018-06-21 10:58         ` [RFC PATCH] mm, oom: oom_free_timeout_ms can be static kbuild test robot
2018-06-24  2:36   ` [patch] mm, oom: fix unnecessary killing of additional processes Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1806151559360.49038@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox