From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: kbuild test robot <fengguang.wu@intel.com>,
Michal Hocko <mhocko@suse.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch v3] mm, oom: fix unnecessary killing of additional processes
Date: Fri, 6 Jul 2018 17:05:39 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.21.1807061652430.71359@chino.kir.corp.google.com> (raw)
In-Reply-To: <20180705164621.0a4fe6ab3af27a1d387eecc9@linux-foundation.org>
On Thu, 5 Jul 2018, Andrew Morton wrote:
> > +#ifdef CONFIG_DEBUG_FS
> > +static int oom_free_timeout_ms_read(void *data, u64 *val)
> > +{
> > + *val = oom_free_timeout_ms;
> > + return 0;
> > +}
> > +
> > +static int oom_free_timeout_ms_write(void *data, u64 val)
> > +{
> > + if (val > 60 * 1000)
> > + return -EINVAL;
> > +
> > + oom_free_timeout_ms = val;
> > + return 0;
> > +}
> > +DEFINE_SIMPLE_ATTRIBUTE(oom_free_timeout_ms_fops, oom_free_timeout_ms_read,
> > + oom_free_timeout_ms_write, "%llu\n");
> > +#endif /* CONFIG_DEBUG_FS */
>
> One of the several things I dislike about debugfs is that nobody
> bothers documenting it anywhere. But this should really be documented.
> I'm not sure where, but the documentation will find itself alongside a
> bunch of procfs things which prompts the question "why it *this* one in
> debugfs"?
>
The only reason I have placed it in debugfs, or making it tunable at all,
is to appease others. I know the non-default value we need to use to stop
millions of processes being oom killed unnecessarily. Michal suggested a
tunable to disable the oom reaper entirely, which is not what we want, so
I found this to be the best alternative.
I'd like to say that it is purposefully undocumented since it's not a
sysctl and nobody can suggest that it is becoming a permanent API that we
must maintain for backwards compatibility. Having it be configurable is
kind of ridiculous, but such is the nature of trying to get patches merged
these days to prevent millions of processes being oom killed
unnecessarily.
Blockable mmu notifiers and mlocked memory is not the extent of the
problem, if a process has a lot of virtual memory we must wait until
free_pgtables() completes in exit_mmap() to prevent unnecessary oom
killing. For implementations such as tcmalloc, which does not release
virtual memory, this is important because, well, it releases this only at
exit_mmap(). Of course we cannot do that with only the protection of
mm->mmap_sem for read.
This is a patch that we'll always need if we continue with the current
implementation of the oom reaper. I wouldn't suggest it as a configurable
value, but, owell.
I'll document the tunable and purposefully repeat myself that this is
addresses millions of processes being oom killed unnecessarily so the
rather important motivation of the change is clear to anyone who reads
this thread now or in the future. Nobody can guess an appropriate value
until they have been hit by the issue themselves and need to deal with the
loss of work from important processes being oom killed when some best
effort logging cron job uses too much memory. Or, of course, pissed off
users who have their jobs killed off and you find yourself in the rather
unfortunate situation of explaining why the Linux kernel in 2018 needs to
immediately SIGKILL processes because of an arbitrary nack related to a
timestamp.
Thanks.
next prev parent reply other threads:[~2018-07-07 0:05 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-21 21:35 David Rientjes
2018-07-04 1:43 ` David Rientjes
2018-07-04 2:26 ` penguin-kernel
2018-07-05 23:46 ` Andrew Morton
2018-07-06 5:39 ` Michal Hocko
2018-07-07 0:05 ` David Rientjes [this message]
2018-07-09 12:35 ` Michal Hocko
2018-07-09 20:30 ` David Rientjes
2018-07-10 11:01 ` Michal Hocko
2018-07-17 21:09 ` Tetsuo Handa
2018-07-18 20:22 ` David Rientjes
2018-07-18 21:21 ` Tetsuo Handa
2018-07-19 14:23 ` Tetsuo Handa
2018-07-20 8:41 ` David Rientjes
2018-07-20 9:57 ` Tetsuo Handa
2018-07-20 20:19 ` David Rientjes
2018-07-20 20:47 ` Tetsuo Handa
2018-07-20 22:19 ` David Rientjes
2018-07-20 20:14 ` [patch v4] " David Rientjes
2018-07-20 20:43 ` Tetsuo Handa
2018-07-20 22:13 ` David Rientjes
2018-07-21 2:47 ` Tetsuo Handa
2018-07-24 21:45 ` David Rientjes
2018-07-24 22:31 ` Tetsuo Handa
2018-07-24 22:51 ` David Rientjes
2018-07-24 22:55 ` Tetsuo Handa
2018-07-25 0:24 ` David Rientjes
2018-07-24 21:44 ` [patch v5] " David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.21.1807061652430.71359@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=penguin-kernel@i-love.sakura.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox