From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@suse.cz
Cc: akpm@linux-foundation.org, linux-mm@kvack.org, oleg@redhat.com,
rientjes@google.com, vdavydov@parallels.com, mst@redhat.com
Subject: Re: [PATCH v3 0/8] Change OOM killer to use list of mm_struct.
Date: Sat, 23 Jul 2016 11:59:25 +0900 [thread overview]
Message-ID: <201607231159.IFD26547.HVMOQtSJFOFFOL@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20160722120519.GJ794@dhcp22.suse.cz>
Michal Hocko wrote:
> > > Now what about future plans? I would like to get rid of TIF_MEMDIE
> > > altogether and give access to memory reserves to oom victim when they
> > > allocate the memory. Something like:
> >
> > Before doing so, can we handle a silent hang up caused by lowmem livelock
> > at http://lkml.kernel.org/r/20160211225929.GU14668@dastard ? It is a nearly
> > 7 years old bug (since commit 35cd78156c499ef8 "vmscan: throttle direct
> > reclaim when too many pages are isolated already") which got no progress
> > so far.
>
> I do not see any dependecy/relation on/to the OOM work. I am even not
> sure why you are bringing that up here.
This is a ABBA deadlock bug which disables the OOM killer caused by kswapd
waiting for GFP_NOIO allocations whereas GFP_NOIO allocations waiting for
kswapd. A flag like GFP_TRANSIENT suggested at
http://lkml.kernel.org/r/878twt5i1j.fsf@notabene.neil.brown.name which
prevents the allocating task from being throttled is expected if we want to
avoid escaping from too_many_isolated() loop in shrink_inactive_list()
using timeout.
> [...]
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 788e4f22e0bb..34446f49c2e1 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -3358,7 +3358,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> > > alloc_flags |= ALLOC_NO_WATERMARKS;
> > > else if (!in_interrupt() &&
> > > ((current->flags & PF_MEMALLOC) ||
> > > - unlikely(test_thread_flag(TIF_MEMDIE))))
> > > + tsk_is_oom_victim(current))
> > > alloc_flags |= ALLOC_NO_WATERMARKS;
> > > }
> > > #ifdef CONFIG_CMA
> > >
> > > where tsk_is_oom_victim wouldn't require the given task to go via
> > > out_of_memory. This would solve some of the problems we have right now
> > > when a thread doesn't get access to memory reserves because it never
> > > reaches out_of_memory (e.g. recently mentioned mempool_alloc doing
> > > __GFP_NORETRY). It would also make the code easier to follow. If we want
> > > to implement that we need an easy to implement tsk_is_oom_victim
> > > obviously. With the signal_struct::oom_mm this is really trivial thing.
> > > I am not sure we can do that with the mm list though because we are
> > > loosing the task->mm at certain point in time.
> >
> > bool tsk_is_oom_victim(void)
> > {
> > return current->mm && test_bit(MMF_OOM_KILLED, ¤t->mm->flags) &&
> > (fatal_signal_pending(current) || (current->flags & PF_EXITING));
> > }
>
> which doesn't work as soon as exit_mm clears the mm which is exactly
> the concern I have raised above.
Are you planning to change the scope where the OOM victims can access memory
reserves?
(1) If you plan to allow the OOM victims to access memory reserves until
TASK_DEAD, tsk_is_oom_victim() will be as trivial as
bool tsk_is_oom_victim(struct task_struct *task)
{
return task->signal->oom_mm;
}
because you won't prevent the OOM victims to access memory reserves at
e.g. exit_task_work() from do_exit(). In that case, I will suggest
bool tsk_is_oom_victim(struct task_struct *task)
{
return (fatal_signal_pending(task) || (task->flags & PF_EXITING));
}
like "[PATCH 2/3] mm,page_alloc: favor exiting tasks over normal tasks."
does.
(2) If you plan to allow the OOM victims to access memory reserves until only
before calling mmput() from exit_mm() from do_exit(), tsk_is_oom_victim()
will be
bool tsk_is_oom_victim(struct task_struct *task)
{
return task->signal->oom_mm && task->mm;
}
because you don't allow the OOM victims to access memory reserves at
__mmput() from mmput() from exit_mm() from do_exit(). In that case, I think
bool tsk_is_oom_victim(void)
{
return current->mm && test_bit(MMF_OOM_KILLED, ¤t->mm->flags) &&
(fatal_signal_pending(current) || (current->flags & PF_EXITING));
}
should work. But as you think it does not work, you are not planning to
allow the OOM victims to access memory reserves until only before calling
mmput() from exit_mm() from do_exit(), are you?
(3) If you are not planning to change the scope where the OOM victims can access
memory reserves (i.e. neither (1) nor (2) above), how can we control it
without using per task_struct flags like TIF_MEMDIE?
>
> >
> > > The only way I can see
> > > this would fly would be preserving TIF_MEMDIE and setting it for all
> > > threads but I am not sure this is very much better and puts the mm list
> > > approach to a worse possition from my POV.
> > >
> >
> > But do we still need ALLOC_NO_WATERMARKS for OOM victims?
>
> Yes as a safety net for cases when the oom_reaper cannot reclaim enough
> to get us out of OOM. Maybe one day we can make the oom_reaper
> completely bullet proof and granting access to memory reserves would be
> pointless. One reason I want to get rid of TIF_MEMDIE is that all would
> need to do at that time would be a single line dropping
> tsk_is_oom_victim from gfp_to_alloc_flags.
I didn't mean to forbid access to memory reserves completely. I meant that
do we need to allow access to all of memory reserves (via ALLOC_NO_WATERMARKS)
rather than portion of memory reserves (via ALLOC_HARDER like [PATCH 2/3] does).
I'm thinking that we can treat "threads killed by the OOM killer" and "threads
killed by SIGKILL" and "threads normally exiting via exit()" evenly by allowing
them access to portion of memory reserves.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-07-23 2:59 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-12 13:29 Tetsuo Handa
2016-07-12 13:29 ` [PATCH 1/8] mm,oom_reaper: Reduce find_lock_task_mm() usage Tetsuo Handa
2016-07-12 13:29 ` [PATCH 2/8] mm,oom_reaper: Do not attempt to reap a task twice Tetsuo Handa
2016-07-12 14:19 ` Michal Hocko
2016-07-12 13:29 ` [PATCH 3/8] mm,oom: Use list of mm_struct used by OOM victims Tetsuo Handa
2016-07-12 14:28 ` Michal Hocko
2016-07-12 13:29 ` [PATCH 4/8] mm,oom: Close oom_has_pending_mm race Tetsuo Handa
2016-07-12 14:36 ` Michal Hocko
2016-07-12 13:29 ` [PATCH 5/8] mm,oom_reaper: Make OOM reaper use list of mm_struct Tetsuo Handa
2016-07-12 14:51 ` Michal Hocko
2016-07-12 15:42 ` Tetsuo Handa
2016-07-13 7:48 ` Michal Hocko
2016-07-12 13:29 ` [PATCH 6/8] mm,oom: Remove OOM_SCAN_ABORT case and signal_struct->oom_victims Tetsuo Handa
2016-07-12 13:29 ` [PATCH 7/8] mm,oom: Stop clearing TIF_MEMDIE on remote thread Tetsuo Handa
2016-07-12 14:53 ` Michal Hocko
2016-07-12 15:45 ` Tetsuo Handa
2016-07-13 8:13 ` Michal Hocko
2016-07-12 13:29 ` [PATCH 8/8] oom_reaper: Revert "oom_reaper: close race with exiting task" Tetsuo Handa
2016-07-12 14:56 ` Michal Hocko
2016-07-21 11:21 ` [PATCH v3 0/8] Change OOM killer to use list of mm_struct Michal Hocko
2016-07-22 11:09 ` Tetsuo Handa
2016-07-22 12:05 ` Michal Hocko
2016-07-23 2:59 ` Tetsuo Handa [this message]
2016-07-25 8:48 ` Michal Hocko
2016-07-25 11:07 ` Tetsuo Handa
2016-07-25 11:21 ` Michal Hocko
2016-07-25 11:47 ` Tetsuo Handa
2016-07-25 11:59 ` Michal Hocko
2016-07-25 14:02 ` Tetsuo Handa
2016-07-25 14:17 ` Michal Hocko
2016-07-25 21:40 ` Tetsuo Handa
2016-07-26 7:52 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201607231159.IFD26547.HVMOQtSJFOFFOL@I-love.SAKURA.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=mst@redhat.com \
--cc=oleg@redhat.com \
--cc=rientjes@google.com \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox