linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>, Hugh Dickins <hughd@google.com>,
	Andrea Argangeli <andrea@kernel.org>,
	Rik van Riel <riel@redhat.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm, oom: introduce oom reaper
Date: Tue, 2 Feb 2016 14:51:41 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.10.1602021437140.9118@chino.kir.corp.google.com> (raw)
In-Reply-To: <20160202085758.GE19910@dhcp22.suse.cz>

On Tue, 2 Feb 2016, Michal Hocko wrote:

> > Not exclude them, but I would have expected untrack_pfn().
> 
> My understanding is that vm_normal_page will do the right thing for
> those mappings - especially for CoW VM_PFNMAP which are normal pages
> AFAIU. Wrt. to untrack_pfn I was relying that the victim will eventually
> enter exit_mmap and do the remaining house keepining. Maybe I am missing
> something but untrack_pfn shouldn't lead to releasing a considerable
> amount of memory. So is this really necessary or we can wait for
> exit_mmap?
> 

I think if you move the code to mm/memory.c that you may find a greater 
opportunity to share code with the implementations there and this will 
take care of itself :)  I'm concerned about this also from a 
maintainability standpoint where a future patch might modify one 
implementation while forgetting about the other.  I think there's a great 
opportunity here for a really clean and shiny interfance that doesn't 
introduce any more complexity.

> > The problem is that this is racy and quite easy to trigger: imagine if 
> > __oom_reap_vmas() finds mm->mm_users == 0, because the memory of the 
> > victim has been freed, and then another system-wide oom condition occurs 
> > before the oom reaper's mm_to_reap has been set to NULL.
> 
> Yes I realize this is potentially racy. I just didn't consider the race
> important enough to justify task queuing in the first submission. Tetsuo
> was pushing for this already and I tried to push back for simplicity in
> the first submission. But ohh well... I will queue up a patch to do this
> on top. I plan to repost the full patchset shortly.
> 

Ok, thanks!  It should probably be dropped from -mm in the interim until 
it has some acked-by's, but I think those will come pretty quickly once 
it's refreshed if all of this is handled.

> > In this case, the oom reaper has ignored the next victim and doesn't do 
> > anything; the simple race has prevented it from zapping memory and does 
> > not reduce the livelock probability.
> > 
> > This can be solved either by queueing mm's to reap or involving the oom 
> > reaper into the oom killer synchronization itself.
> 
> as we have already discussed previously oom reaper is really tricky to
> be called from the direct OOM context. I will go with queuing. 
>  

Hmm, I wasn't referring to oom context: it would be possible without 
queueing with an mm_to_reap_lock (or cmpxchg) in the oom reaper and when 
the final mmput() is done.  Set it when the mm is ready for reaping, clear 
it when the mm is being destroyed, and test it before calling the oom 
killer.  I think we'd want to defer the oom killer until potential reaping 
could be done anyway and I don't anticipate an issue where oom_reaper 
fails to schedule.

> > I'm baffled by any reference to "memcg oom heavy loads", I don't 
> > understand this paragraph, sorry.  If a memcg is oom, we shouldn't be
> > disrupting the global runqueue by running oom_reaper at a high priority.  
> > The disruption itself is not only in first wakeup but also in how long the 
> > reaper can run and when it is rescheduled: for a lot of memory this is 
> > potentially long.  The reaper is best-effort, as the changelog indicates, 
> > and we shouldn't have a reliance on this high priority: oom kill exiting 
> > can't possibly be expected to be immediate.  This high priority should be 
> > removed so memcg oom conditions are isolated and don't affect other loads.
> 
> If this is a concern then I would be tempted to simply disable oom
> reaper for memcg oom altogether. For me it is much more important that
> the reaper, even though a best effort, is guaranteed to schedule if
> something goes terribly wrong on the machine.
> 

I don't believe the higher priority guarantees it is able to schedule any 
more than it was guaranteed to schedule before.  It will run, but it won't 
preempt other innocent processes in disjoint memcgs or cpusets.  It's not 
only a memcg issue, but it also impacts disjoint cpuset mems and mempolicy 
nodemasks.  I think it would be disappointing to leave those out.  I think 
the higher priority should simply be removed in terms of fairness.

Other than these issues, I don't see any reason why a refreshed series 
wouldn't be immediately acked.  Thanks very much for continuing to work on 
this!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-02-02 22:51 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-06 15:42 [PATCH 0/2 -mm] oom reaper v4 Michal Hocko
2016-01-06 15:42 ` [PATCH 1/2] mm, oom: introduce oom reaper Michal Hocko
2016-01-07 11:23   ` Tetsuo Handa
2016-01-07 12:30     ` Michal Hocko
2016-01-11 22:54   ` Andrew Morton
2016-01-12  8:16     ` Michal Hocko
2016-01-28  1:28   ` David Rientjes
2016-01-28 21:42     ` Michal Hocko
2016-02-02  3:02       ` David Rientjes
2016-02-02  8:57         ` Michal Hocko
2016-02-02 11:48           ` Tetsuo Handa
2016-02-02 22:55             ` David Rientjes
2016-02-02 22:51           ` David Rientjes [this message]
2016-02-03 10:31             ` Tetsuo Handa
2016-01-06 15:42 ` [PATCH 2/2] oom reaper: handle anonymous mlocked pages Michal Hocko
2016-01-07  8:14   ` Michal Hocko
2016-01-11 12:42 ` [PATCH 3/2] oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space Michal Hocko
2016-01-11 16:52   ` Johannes Weiner
2016-01-11 17:46     ` Michal Hocko
2016-02-15 10:58     ` Tetsuo Handa
2016-01-18  4:35   ` Tetsuo Handa
2016-01-18 10:22     ` Tetsuo Handa
2016-01-26 16:38     ` Michal Hocko
2016-01-28 11:24       ` Tetsuo Handa
2016-01-28 21:51         ` Michal Hocko
2016-01-28 22:26           ` Tetsuo Handa
2016-01-28 22:36             ` Michal Hocko
2016-01-28 22:33   ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2015-12-15 18:36 [PATCH 1/2] mm, oom: introduce oom reaper Michal Hocko
2015-12-17  0:50 ` Andrew Morton
2015-12-17 13:02   ` Michal Hocko
2015-12-17 19:55     ` Linus Torvalds
2015-12-17 20:00       ` Andrew Morton
2015-12-18 11:54         ` Michal Hocko
2015-12-18 21:14           ` Andrew Morton
2015-12-21  8:38             ` Michal Hocko
2015-12-17 21:13     ` Andrew Morton
2015-12-18 12:11       ` Michal Hocko
2015-12-18 12:10     ` Tetsuo Handa
2015-12-20  7:14       ` Tetsuo Handa
2015-12-18  0:15 ` Andrew Morton
2015-12-18 11:48   ` Michal Hocko
2015-12-21 20:38 ` Paul Gortmaker
2016-01-06  9:10   ` Michal Hocko
2016-01-06 14:26     ` Paul Gortmaker
2016-01-06 15:00       ` Michal Hocko
2015-12-23 23:00 ` Ross Zwisler
2015-12-24  9:47   ` Michal Hocko
2015-12-24 11:06     ` Tetsuo Handa
2015-12-24 20:39       ` Ross Zwisler
2015-12-25 11:41       ` Michal Hocko
2015-12-24 20:44     ` Ross Zwisler
2015-12-25 11:35       ` Michal Hocko
2015-12-25 11:44         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1602021437140.9118@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@kernel.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox