linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Lubos Lunak <l.lunak@suse.cz>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch -mm 4/9 v2] oom: remove compulsory panic_on_oom mode
Date: Wed, 17 Feb 2010 14:04:40 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.00.1002171345330.6217@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100217095221.GQ5723@laptop>

On Wed, 17 Feb 2010, Nick Piggin wrote:

> > > quick glance around core codes...
> > >  - HUGEPAGE at el. should return some VM_FAULT_NO_RESOUECE rather than VM_FAULT_OOM.
> > 
> > We can detect this with is_vm_hugetlb_page() if we pass the vma into 
> > pagefault_out_of_memory() without adding another VM_FAULT flag.
> 
> The real question is, what to do when returning to userspace. I don't
> think there's a lot of options. SIGBUS is traditionally used for "no
> resource".
> 

For is_vm_hugetlb_page() in the pagefault oom handler, I think it should 
default to killing current as we did previously until that's worked out 
(and as some architectures like ia64 and powerpc still do).  In fact, 
pagefault ooms should probably always default to killing current if its 
killable.

> > The filemap, shmem, and block_prepare_write() cases will call the oom 
> > killer but, depending on the gfp mask, they will retry their allocations 
> > after the oom killer is called so we should never return VM_FAULT_OOM 
> > because they return -ENOMEM.  They fail from either small objsize slab 
> > allocations or with orders less than PAGE_ALLOC_COSTLY_ORDER which by 
> > default continues to retry even if direct reclaim fails.  If we're 
> > returning with VM_FAULT_OOM from these handlers, it should only be because 
> > of GFP_NOFS | __GFP_NORETRY or current has been oom killed and still can't 
> > find memory (so we don't care if the oom killer is called again since it 
> > won't kill anything else).
> 
> Yep. And yes you are right that we prefer to do the oom killing at the
> allocation point where we know all the context, however the fact is that
> VM_FAULT_OOM is an allowed part of the fault API so we have to handle it
> somehow.
> 
> It can theoretically be called for valid reasons say if a driver or
> arch page table has a high order allocation, or if the page allocator
> implementation were to be changed.
> 
> We can't rightly just kill the task at this point, even if it has
> invoked the oom killer, because it could have been marked as unkillable.
> 

That's easy to test in the oom handler, we can default to killing current 
but then kill another task if it is unkillable:

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -696,15 +696,23 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 }
 
 /*
- * The pagefault handler calls here because it is out of memory, so kill a
- * memory-hogging task.  If a populated zone has ZONE_OOM_LOCKED set, a parallel
- * oom killing is already in progress so do nothing.  If a task is found with
- * TIF_MEMDIE set, it has been killed so do nothing and allow it to exit.
+ * The pagefault handler calls here because it is out of memory, so kill current
+ * by default.  If it's unkillable, then fallback to killing a memory-hogging
+ * task.  If a populated zone has ZONE_OOM_LOCKED set, a parallel oom killing is
+ * already in progress so do nothing.  If a task is found with TIF_MEMDIE set,
+ * it has been killed so do nothing and allow it to exit.
  */
 void pagefault_out_of_memory(void)
 {
+	unsigned long totalpages;
+	int err;
+
 	if (!try_set_system_oom())
 		return;
-	out_of_memory(NULL, 0, 0, NULL);
+	constrained_alloc(NULL, 0, NULL, &totalpages);
+	err = oom_kill_process(current, 0, 0, 0, totalpages, NULL,
+				"Out of memory (pagefault)"))
+	if (err)
+		out_of_memory(NULL, 0, 0, NULL);
 	clear_system_oom();
 }

We'll need to convert the architectures that still only issue a SIGKILL to 
current to use pagefault_out_of_memory() before OOM_DISABLE is fully 
respected across the kernel, though.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-02-17 22:04 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-15 22:19 [patch -mm 0/9 v2] oom killer rewrite David Rientjes
2010-02-15 22:20 ` [patch -mm 1/9 v2] oom: filter tasks not sharing the same cpuset David Rientjes
2010-02-16  6:14   ` Nick Piggin
2010-02-15 22:20 ` [patch -mm 2/9 v2] oom: sacrifice child with highest badness score for parent David Rientjes
2010-02-16  6:15   ` Nick Piggin
2010-02-15 22:20 ` [patch -mm 3/9 v2] oom: select task from tasklist for mempolicy ooms David Rientjes
2010-02-23  6:31   ` Balbir Singh
2010-02-23  8:17     ` David Rientjes
2010-02-15 22:20 ` [patch -mm 4/9 v2] oom: remove compulsory panic_on_oom mode David Rientjes
2010-02-16  0:00   ` KAMEZAWA Hiroyuki
2010-02-16  0:14     ` David Rientjes
2010-02-16  0:23       ` KAMEZAWA Hiroyuki
2010-02-16  9:02         ` David Rientjes
2010-02-16 23:42           ` KAMEZAWA Hiroyuki
2010-02-16 23:54             ` David Rientjes
2010-02-17  0:01               ` KAMEZAWA Hiroyuki
2010-02-17  0:31                 ` David Rientjes
2010-02-17  0:41                   ` KAMEZAWA Hiroyuki
2010-02-17  0:54                     ` David Rientjes
2010-02-17  1:03                       ` KAMEZAWA Hiroyuki
2010-02-17  1:58                       ` David Rientjes
2010-02-17  2:13                         ` KAMEZAWA Hiroyuki
2010-02-17  2:23                           ` KAMEZAWA Hiroyuki
2010-02-17  2:37                             ` David Rientjes
2010-02-17  2:28                           ` David Rientjes
2010-02-17  2:34                             ` KAMEZAWA Hiroyuki
2010-02-17  2:58                               ` David Rientjes
2010-02-17  3:21                                 ` KAMEZAWA Hiroyuki
2010-02-17  9:11                                   ` David Rientjes
2010-02-17  9:52                                     ` Nick Piggin
2010-02-17 22:04                                       ` David Rientjes [this message]
2010-02-22  5:31                               ` Daisuke Nishimura
2010-02-22  6:15                                 ` KAMEZAWA Hiroyuki
2010-02-22 11:42                                   ` Daisuke Nishimura
2010-02-22 20:59                                     ` David Rientjes
2010-02-22 23:51                                     ` KAMEZAWA Hiroyuki
2010-02-22 20:55                                   ` David Rientjes
2010-02-17  2:19                         ` KOSAKI Motohiro
2010-02-16  6:20   ` Nick Piggin
2010-02-16  6:59     ` David Rientjes
2010-02-16  7:20       ` Nick Piggin
2010-02-16  7:53         ` David Rientjes
2010-02-16  8:08           ` Nick Piggin
2010-02-16  8:10             ` KAMEZAWA Hiroyuki
2010-02-16  8:42             ` David Rientjes
2010-02-15 22:20 ` [patch -mm 5/9 v2] oom: badness heuristic rewrite David Rientjes
2010-02-15 22:20 ` [patch -mm 6/9 v2] oom: deprecate oom_adj tunable David Rientjes
2010-02-15 22:28   ` Alan Cox
2010-02-15 22:35     ` David Rientjes
2010-02-15 22:20 ` [patch -mm 7/9 v2] oom: replace sysctls with quick mode David Rientjes
2010-02-16  6:28   ` Nick Piggin
2010-02-16  8:58     ` David Rientjes
2010-02-15 22:20 ` [patch -mm 8/9 v2] oom: avoid oom killer for lowmem allocations David Rientjes
2010-02-15 23:57   ` KAMEZAWA Hiroyuki
2010-02-16  0:10     ` David Rientjes
2010-02-16  0:21       ` KAMEZAWA Hiroyuki
2010-02-16  1:13         ` [patch] mm: add comment about deprecation of __GFP_NOFAIL David Rientjes
2010-02-16  1:26           ` KAMEZAWA Hiroyuki
2010-02-16  7:03             ` David Rientjes
2010-02-16  7:23               ` Nick Piggin
2010-02-16  5:32       ` [patch -mm 8/9 v2] oom: avoid oom killer for lowmem allocations KOSAKI Motohiro
2010-02-16  7:29         ` David Rientjes
2010-02-16  6:44       ` Nick Piggin
2010-02-16  7:41         ` David Rientjes
2010-02-16  7:53           ` Nick Piggin
2010-02-16  8:25             ` David Rientjes
2010-02-16 23:48               ` KAMEZAWA Hiroyuki
2010-02-17  0:03                 ` David Rientjes
2010-02-17  0:03                   ` KAMEZAWA Hiroyuki
2010-02-17  0:21                     ` David Rientjes
2010-02-23 11:24                       ` Balbir Singh
2010-02-23 21:12                         ` David Rientjes
2010-02-15 22:20 ` [patch -mm 9/9 v2] oom: remove unnecessary code and cleanup David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1002171345330.6217@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=l.lunak@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox