Re: [PATCH RFC] memcg: close the race window between OOM detection and killing

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@suse.cz>
To: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH RFC] memcg: close the race window between OOM detection and killing
Date: Fri, 5 Jun 2015 16:35:34 +0200	[thread overview]
Message-ID: <20150605143534.GD26113@dhcp22.suse.cz> (raw)
In-Reply-To: <20150604192936.GR20091@mtj.duckdns.org>

On Fri 05-06-15 04:29:36, Tejun Heo wrote:
> Hello, Michal.
> 
> On Thu, Jun 04, 2015 at 11:30:31AM +0200, Michal Hocko wrote:
> > > Hmmm?  In -mm, if __alloc_page_may_oom() fails trylock, it never calls
> > > out_of_memory().
> > 
> > Sure but the oom_lock might be free already. out_of_memory doesn't wait
> > for the victim to finish. It just does schedule_timeout_killable.
> 
> That doesn't matter because the detection and TIF_MEMDIE assertion are
> atomic w.r.t. oom_lock and TIF_MEMDIE essentially extends the locking
> by preventing further OOM kills.  Am I missing something?

This is true but TIF_MEMDIE releasing is not atomic wrt. the allocation
path. So the oom victim could have released memory and dropped
TIF_MEMDIE but the allocation path hasn't noticed that because it's passed
        /*
         * Go through the zonelist yet one more time, keep very high watermark
         * here, this is only to catch a parallel oom killing, we must fail if
         * we're still under heavy pressure.
         */
        page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order,
                                        ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);

and goes on to kill another task because there is no TIF_MEMDIE
anymore.
 
> > > The main difference here is that the alloc path does the whole thing
> > > synchrnously and thus the OOM detection and killing can be put in the
> > > same critical section which isn't the case for the memcg OOM handling.
> > 
> > This is true but there is still a time window between the last
> > allocation attempt and out_of_memory when the OOM victim might have
> > exited and another task would be selected.
> 
> Please see above.
> 
> > > > This is not the only reason. In-kernel memcg oom handling needs it
> > > > as well. See 3812c8c8f395 ("mm: memcg: do not trap chargers with
> > > > full callstack on OOM"). In fact it was the in-kernel case which has
> > > > triggered this change. We simply cannot wait for oom with the stack and
> > > > all the state the charge is called from.
> > > 
> > > Why should this be any different from OOM handling from page allocator
> > > tho? 
> > 
> > Yes the global OOM is prone to deadlock. This has been discussed a lot
> > and we still do not have a good answer for that. The primary problem
> > is that small allocations do not fail and retry indefinitely so an OOM
> > victim might be blocked on a lock held by a task which is the allocator.
> > This is less likely and harder to trigger with standard loads than in
> > memcg environment though.
> 
> Deadlocks from infallible allocations getting interlocked are
> different.  OOM killer can't really get around that by itself but I'm
> not talking about those deadlocks but at the same time they're a lot
> less likely.  It's about OOM victim trapped in a deadlock failing to
> release memory because someone else is waiting for that memory to be
> released while blocking the victim. 

I thought those would be in the allocator context - which was the
example I've provided. What kind of context do you have in mind?

> Sure, the two issues are related
> but once you solve things getting blocked on single OOM victim, it
> becomes a lot less of an issue.
> 
> > There have been suggestions to add an OOM timeout and ignore the
> > previous OOM victim after the timeout expires and select a new
> > victim. This sounds attractive but this approach has its own problems
> > (http://marc.info/?l=linux-mm&m=141686814824684&w=2).
> 
> Here are the the issues the message lists

Let's focus on discussing those points in reply to Johannes' email. AFAIU
your notes very in line with his.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-06-05 14:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-03  3:15 Tejun Heo
2015-06-03 14:44 ` Michal Hocko
2015-06-03 19:36   ` Tejun Heo
2015-06-04  9:30     ` Michal Hocko
2015-06-04 19:06       ` Johannes Weiner
2015-06-05 14:29         ` Michal Hocko
2015-06-04 19:29       ` Tejun Heo
2015-06-05 14:35         ` Michal Hocko [this message]
2015-06-05 14:57           ` Tejun Heo
2015-06-05 15:21             ` Michal Hocko
2015-06-06  0:56               ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150605143534.GD26113@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox