linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: Silent hang up caused by pages being not scanned?
@ 2015-10-14  8:03 Hillf Danton
  0 siblings, 0 replies; 18+ messages in thread
From: Hillf Danton @ 2015-10-14  8:03 UTC (permalink / raw)
  To: 'Tetsuo Handa'
  Cc: Linus Torvalds, Michal Hocko, David Rientjes, Johannes Weiner,
	linux-kernel, linux-mm

> > 
> > In particular, I think that you'll find that you will have to change
> > the heuristics in __alloc_pages_slowpath() where we currently do
> > 
> >         if ((did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER) || ..
> > 
> > when the "did_some_progress" logic changes that radically.
> > 
> 
> Yes. But we can't simply do
> 
>	if (order <= PAGE_ALLOC_COSTLY_ORDER || ..
> 
> because we won't be able to call out_of_memory(), can we?
>
Can you please try a simplified retry logic?

thanks
Hillf
--- a/mm/page_alloc.c	Wed Oct 14 14:45:28 2015
+++ b/mm/page_alloc.c	Wed Oct 14 15:43:31 2015
@@ -3154,8 +3154,7 @@ retry:
 
 	/* Keep reclaiming pages as long as there is reasonable progress */
 	pages_reclaimed += did_some_progress;
-	if ((did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER) ||
-	    ((gfp_mask & __GFP_REPEAT) && pages_reclaimed < (1 << order))) {
+	if (did_some_progress) {
 		/* Wait for some write requests to complete then retry */
 		wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50);
 		goto retry;
--


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: can't oom-kill zap the victim's memory?
@ 2015-09-28 16:18 Tetsuo Handa
  2015-10-02 12:36 ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: Tetsuo Handa @ 2015-09-28 16:18 UTC (permalink / raw)
  To: mhocko, rientjes
  Cc: oleg, torvalds, kwalker, cl, akpm, hannes, vdavydov, linux-mm,
	linux-kernel, skozina

Michal Hocko wrote:
> The point I've tried to made is that oom unmapper running in a detached
> context (e.g. kernel thread) vs. directly in the oom context doesn't
> make any difference wrt. lock because the holders of the lock would loop
> inside the allocator anyway because we do not fail small allocations.

We tried to allow small allocations to fail. It resulted in unstable system
with obscure bugs.

We tried to allow small !__GFP_FS allocations to fail. It failed to fail by
effectively __GFP_NOFAIL allocations.

We are now trying to allow zapping OOM victim's mm. Michal is already
skeptical about this approach due to lock dependency.

We already spent 9 months on this OOM livelock. No silver bullet yet.
Proposed approaches are too drastic to backport for existing users.
I think we are out of bullet.

Until we complete adding/testing __GFP_NORETRY (or __GFP_KILLABLE) to most
of callsites, timeout based workaround will be the only bullet we can use.

Michal's panic_on_oom_timeout and David's "global access to memory reserves"
will be acceptable for some users if these approaches are used as opt-in.
Likewise, my memdie_task_skip_secs / memdie_task_panic_secs will be
acceptable for those who want to retry a bit more rather than panic on
accidental livelock if this approach is used as opt-in.

Tetsuo Handa wrote:
> Excuse me, but thinking about CLONE_VM without CLONE_THREAD case...
> Isn't there possibility of hitting livelocks at
> 
>         /*
>          * If current has a pending SIGKILL or is exiting, then automatically
>          * select it.  The goal is to allow it to allocate so that it may
>          * quickly exit and free its memory.
>          *
>          * But don't select if current has already released its mm and cleared
>          * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur.
>          */
>         if (current->mm &&
>             (fatal_signal_pending(current) || task_will_free_mem(current))) {
>                 mark_oom_victim(current);
>                 return true;
>         }
> 
> if current thread receives SIGKILL just before reaching here, for we don't
> send SIGKILL to all threads sharing the mm?

Seems that CLONE_VM without CLONE_THREAD is irrelevant here.
We have sequences like

  Do a GFP_KENREL allocation.
  Hold a lock.
  Do a GFP_NOFS allocation.
  Release a lock.

where an example is seen in VFS operations which receive pathname from
user space using getname() and then call VFS functions and filesystem
code takes locks which can contend with other threads.

------------------------------------------------------------
diff --git a/fs/namei.c b/fs/namei.c
index d68c21f..d51c333 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4005,6 +4005,8 @@ int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname)
        if (error)
                return error;

+       if (fatal_signal_pending(current))
+               printk(KERN_INFO "Calling symlink with SIGKILL pending\n");
        error = dir->i_op->symlink(dir, dentry, oldname);
        if (!error)
                fsnotify_create(dir, dentry);
@@ -4021,6 +4023,10 @@ SYSCALL_DEFINE3(symlinkat, const char __user *, oldname,
        struct path path;
        unsigned int lookup_flags = 0;

+       if (!strcmp(current->comm, "a.out")) {
+               printk(KERN_INFO "Sending SIGKILL to current thread\n");
+               do_send_sig_info(SIGKILL, SEND_SIG_FORCED, current, true);
+       }
        from = getname(oldname);
        if (IS_ERR(from))
                return PTR_ERR(from);
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 996481e..2b6faa5 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -240,6 +240,8 @@ xfs_symlink(
        if (error)
                goto out_trans_cancel;

+       if (fatal_signal_pending(current))
+               printk(KERN_INFO "Calling xfs_ilock() with SIGKILL pending\n");
        xfs_ilock(dp, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL |
                      XFS_IOLOCK_PARENT | XFS_ILOCK_PARENT);
        unlock_dp_on_error = true;
------------------------------------------------------------

[  119.534976] Sending SIGKILL to current thread
[  119.535898] Calling symlink with SIGKILL pending
[  119.536870] Calling xfs_ilock() with SIGKILL pending

Any program can potentially hit this silent livelock. We can't predict
what locks the OOM victim threads will depend on after TIF_MEMDIE was
set by the OOM killer. Therefore, I think that TIF_MEMDIE disables the
OOM killer indefinitely is one of possible causes regarding silent
hangup troubles.

Michal Hocko wrote:
> I really hate to do "easy" things now just to feel better about
> particular case which will kick us back little bit later. And from my
> own experience I can tell you that a more non-deterministic OOM behavior
> is thing people complain about.

I believe that not waiting for TIF_MEMDIE thread indefinitely is the first
choice we can propose people to try. From my own experience I can tell you
that some customers are really sensitive about bugs which halt their systems
(e.g. https://access.redhat.com/solutions/68466 ).
Opt-in version of TIF_MEMDIE timeout should be acceptable for people
who prefer avoiding silent hangup over non-deterministic OOM behavior if
they were explained about the truth of current memory allocator's behavior.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-10-19 12:57 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-14  8:03 Silent hang up caused by pages being not scanned? Hillf Danton
  -- strict thread matches above, loose matches on Subject: below --
2015-09-28 16:18 can't oom-kill zap the victim's memory? Tetsuo Handa
2015-10-02 12:36 ` Michal Hocko
2015-10-03  6:02   ` Can't we use timeout based OOM warning/killing? Tetsuo Handa
2015-10-06 14:51     ` Tetsuo Handa
2015-10-12  6:43       ` Tetsuo Handa
2015-10-12 15:25         ` Silent hang up caused by pages being not scanned? Tetsuo Handa
2015-10-12 21:23           ` Linus Torvalds
2015-10-13 12:21             ` Tetsuo Handa
2015-10-13 16:37               ` Linus Torvalds
2015-10-14 12:21                 ` Tetsuo Handa
2015-10-15 13:14                 ` Michal Hocko
2015-10-16 15:57                   ` Michal Hocko
2015-10-16 18:34                     ` Linus Torvalds
2015-10-16 18:49                       ` Tetsuo Handa
2015-10-19 12:57                         ` Michal Hocko
2015-10-19 12:53                       ` Michal Hocko
2015-10-13 13:32           ` Michal Hocko
2015-10-13 16:19             ` Tetsuo Handa
2015-10-14 13:22               ` Michal Hocko
2015-10-14 14:38                 ` Tetsuo Handa
2015-10-14 14:59                   ` Michal Hocko
2015-10-14 15:06                     ` Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox