From: Michal Hocko <mhocko@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
linux-mm@kvack.org, Roman Gushchin <guro@fb.com>
Subject: Re: [PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes.
Date: Mon, 20 Aug 2018 08:07:46 +0200 [thread overview]
Message-ID: <20180820060746.GB29735@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.21.1808191632230.193150@chino.kir.corp.google.com>
On Sun 19-08-18 16:45:36, David Rientjes wrote:
>
> > > > > At the risk of continually repeating the same statement, the oom reaper
> > > > > cannot provide the direct feedback for all possible memory freeing.
> > > > > Waking up periodically and finding mm->mmap_sem contended is one problem,
> > > > > but the other problem that I've already shown is the unnecessary oom
> > > > > killing of additional processes while a thread has already reached
> > > > > exit_mmap(). The oom reaper cannot free page tables which is problematic
> > > > > for malloc implementations such as tcmalloc that do not release virtual
> > > > > memory.
> > > >
> > > > But once we know that the exit path is past the point of blocking we can
> > > > have MMF_OOM_SKIP handover from the oom_reaper to the exit path. So the
> > > > oom_reaper doesn't hide the current victim too early and we can safely
> > > > wait for the exit path to reclaim the rest. So there is a feedback
> > > > channel. I would even do not mind to poll for that state few times -
> > > > similar to polling for the mmap_sem. But it would still be some feedback
> > > > rather than a certain amount of time has passed since the last check.
> > > >
> > >
> > > Yes, of course, it would be easy to rely on exit_mmap() to set
> > > MMF_OOM_SKIP itself and have the oom reaper drop the task from its list
> > > when we are assured of forward progress. What polling are you proposing
> > > other than a timeout based mechanism to do this?
> >
> > I was thinking about doing something like the following
> > - oom_reaper checks the amount of victim's memory after it is done with
> > reaping (e.g. by calling oom_badness before and after). If it wasn't able to
> > reclaim much then return false and keep retrying with the existing
> > mechanism
>
> I'm not sure how you define the threshold to consider what is substantial
> memory freeing.
If a rule of thumb (few Megs freed or X% of oom_badness reduced) doesn't
really turn out to be working well then we can try to be more clever
e.g. detect for too many ptes to free and wait for those.
> > - once a flag (e.g. MMF_OOM_MMAP) is set it bails out and won't set the
> > MMF_OOM_SKIP flag.
> >
> > > We could set a MMF_EXIT_MMAP in exit_mmap() to specify that it will
> > > complete free_pgtables() for that mm. The problem is the same: when does
> > > the oom reaper decide to set MMF_OOM_SKIP because MMF_EXIT_MMAP has not
> > > been set in a timely manner?
> >
> > reuse the current retry policy which is the number of attempts rather
> > than any timeout.
> >
> > > If this is an argument that the oom reaper should loop checking for
> > > MMF_EXIT_MMAP and doing schedule_timeout(1) a set number of times rather
> > > than just setting the jiffies in the mm itself, that's just implementing
> > > the same thing and doing so in a way where the oom reaper stalls operating
> > > on a single mm rather than round-robin iterating over mm's in my patch.
> >
> > I've said earlier that I do not mind doing round robin in the oom repaer
> > but this is certainly more complex than what we do now and I haven't
> > seen any actual example where it would matter. OOM reaper is a safely
> > measure. Nothing should fall apart if it is slow. The primary work
> > should be happening from the exit path anyway.
>
> The oom reaper will always be unable to free some memory, such as page
> tables. If it can't grab mm->mmap_sem in a reasonable amount of time, it
> also can give up early. The munlock() case is another example. We
> experience unnecessary oom killing during free_pgtables() where the
> single-threaded exit_mmap() is freeing an enormous amount of page tables
> (usually a malloc implementation such as tcmalloc that does not free
> virtual memory) and other processes are faulting faster than we can free.
> It's a combination of a multiprocessor system and a lot of virtual memory
> from the original victim. This is the same case as being unable to
> munlock quickly enough in exit_mmap() to free the memory.
>
> We must wait until free_pgtables() completes in exit_mmap() before killing
> additional processes in the large majority (99.96% of cases from my data)
> of instances where oom livelock does not occur. In the remainder of
> situations, livelock has been prevented by what the oom reaper has been
> able to free. We can, of course, not do free_pgtables() from the oom
> reaper. So my approach was to allow for a reasonable amount of time for
> the victim to free a lot of memory before declaring that additional
> processes must be oom killed. It would be functionally similar to having
> the oom reaper retry many, many more times than 10 and having a linked
> list of mm_structs to reap. I don't care one way or another if it's a
> timeout based solution or many, many retries that have schedule_timeout()
> that yields the same time period in the end.
I would really keep the current retry logic with an extension to allow
to keep retrying or hand over to exit_mmap when we know it is past the
last moment of blocking.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2018-08-20 6:07 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-04 13:29 [PATCH 1/4] mm, oom: Remove wake_oom_reaper() Tetsuo Handa
2018-08-04 13:29 ` [PATCH 2/4] mm, oom: Check pending victims earlier in out_of_memory() Tetsuo Handa
2018-08-04 13:29 ` [PATCH 3/4] mm, oom: Remove unused "abort" path Tetsuo Handa
2018-08-04 13:29 ` [PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes Tetsuo Handa
2018-08-06 13:45 ` Michal Hocko
2018-08-06 20:19 ` David Rientjes
2018-08-06 20:51 ` Michal Hocko
2018-08-09 20:16 ` David Rientjes
2018-08-10 9:07 ` Michal Hocko
2018-08-10 10:54 ` Tetsuo Handa
2018-08-10 11:16 ` Michal Hocko
2018-08-11 3:12 ` Tetsuo Handa
2018-08-14 11:33 ` Michal Hocko
2018-08-19 14:23 ` Tetsuo Handa
2018-08-20 5:54 ` Michal Hocko
2018-08-20 22:03 ` Tetsuo Handa
2018-08-21 6:16 ` Michal Hocko
2018-08-21 13:39 ` Tetsuo Handa
2018-08-19 23:45 ` David Rientjes
2018-08-20 6:07 ` Michal Hocko [this message]
2018-08-20 21:31 ` David Rientjes
2018-08-21 6:09 ` Michal Hocko
2018-08-21 17:20 ` David Rientjes
2018-08-22 8:03 ` Michal Hocko
2018-08-22 20:54 ` David Rientjes
2018-09-01 11:48 ` Tetsuo Handa
2018-09-06 11:35 ` Michal Hocko
2018-09-06 11:50 ` Tetsuo Handa
2018-09-06 12:05 ` Michal Hocko
2018-09-06 13:40 ` Tetsuo Handa
2018-09-06 13:56 ` Michal Hocko
2018-09-06 14:06 ` Tetsuo Handa
2018-09-06 14:16 ` Michal Hocko
2018-09-06 21:13 ` Tetsuo Handa
2018-09-07 11:10 ` Michal Hocko
2018-09-07 11:36 ` Tetsuo Handa
2018-09-07 11:51 ` Michal Hocko
2018-09-07 13:30 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180820060746.GB29735@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=guro@fb.com \
--cc=linux-mm@kvack.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox