[PATCH] mm, oom: Disable preemption during OOM-kill operation.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm, oom: Disable preemption during OOM-kill operation.
@ 2015-09-19  7:05 Tetsuo Handa
  2015-09-22 16:55 ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2015-09-19  7:05 UTC (permalink / raw)
  To: mhocko; +Cc: rientjes, hannes, linux-mm

Well, this seems to be a problem which prevents me from testing various
patches that tries to address OOM livelock problem.

---------- rcu-stall.c start ----------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sched.h>

static int dummy(void *fd)
{
	char c;
	/* Wait until the first child thread is killed by the OOM killer. */
	read(* (int *) fd, &c, 1);
	/* Try to consume as much CPU time as possible via preemption. */
	while (1);
	return 0;
}

int main(int argc, char *argv[])
{
	cpu_set_t cpu = { { 1 } };
	static int pipe_fd[2] = { EOF, EOF };
	char *buf = NULL;
	unsigned long size = 0;
	unsigned int i;
	const int fd = open("/dev/zero", O_RDONLY);
	pipe(pipe_fd);
	for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
		char *cp = realloc(buf, size);
		if (!cp) {
			size >>= 1;
			break;
		}
		buf = cp;
	}
	sched_setaffinity(0, sizeof(cpu), &cpu);
	/*
	 * Create many child threads which will disturb operations with
	 * oom_lock and RCU held.
	 */
	for (i = 0; i < 1000; i++) {
		clone(dummy, malloc(1024) + 1024, CLONE_SIGHAND | CLONE_VM,
		      &pipe_fd[0]);
		if (!i)
			close(pipe_fd[1]);
	}
	read(fd, buf, size); /* Will cause OOM due to overcommit */
	return * (char *) NULL; /* Kill all threads. */
}
---------- rcu-stall.c end ----------

---------- console log start ----------
[   53.020558] rcu-stall invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
[   53.022200] rcu-stall cpuset=/ mems_allowed=0
[   53.023172] CPU: 0 PID: 3780 Comm: rcu-stall Not tainted 4.3.0-rc1-next-20150918 #125
(...snipped...)
[   53.119884] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
(...snipped...)
[   53.198811] [ 3780]  1000  3780   541715   392717     777       6        0             0 rcu-stall
(...snipped...)
[   55.087789] [ 4780]  1000  4780   541715   392717     777       6        0             0 rcu-stall
[   55.089637] Out of memory: Kill process 3780 (rcu-stall) score 879 or sacrifice child
[   55.091437] Killed process 3780 (rcu-stall) total-vm:2166860kB, anon-rss:1570864kB, file-rss:4kB
[   55.093269] Kill process 3781 (rcu-stall) sharing same memory
[   55.094553] Kill process 3782 (rcu-stall) sharing same memory
[   65.541045] Kill process 3783 (rcu-stall) sharing same memory
[   65.542382] Kill process 3784 (rcu-stall) sharing same memory
[   65.543689] Kill process 3785 (rcu-stall) sharing same memory
[   69.519022] Kill process 3786 (rcu-stall) sharing same memory
[   69.520425] Kill process 3787 (rcu-stall) sharing same memory
[   69.521893] Kill process 3788 (rcu-stall) sharing same memory
[   73.735956] Kill process 3789 (rcu-stall) sharing same memory
[   73.737336] Kill process 3790 (rcu-stall) sharing same memory
[   73.738672] Kill process 3791 (rcu-stall) sharing same memory
[   77.781839] Kill process 3792 (rcu-stall) sharing same memory
[   77.783183] Kill process 3793 (rcu-stall) sharing same memory
[   77.784506] Kill process 3794 (rcu-stall) sharing same memory
[   81.725121] Kill process 3795 (rcu-stall) sharing same memory
[   81.726454] Kill process 3796 (rcu-stall) sharing same memory
[   81.727665] Kill process 3797 (rcu-stall) sharing same memory
(...snipped...)
[  113.019058] Kill process 3821 (rcu-stall) sharing same memory
[  115.094645] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  115.095971] 	Tasks blocked on level-0 rcu_node (CPUs 0-7): P3780
[  115.097405] 	(detected by 0, t=60002 jiffies, g=3458, c=3457, q=0)
[  115.098780] rcu-stall       R  running task        0  3780   3757 0x00100082
(...snipped...)
[ 1194.420740] Kill process 4647 (rcu-stall) sharing same memory
[ 1194.421992] Kill process 4648 (rcu-stall) sharing same memory
[ 1194.423196] Kill process 4649 (rcu-stall) sharing same memory
[ 1195.124700] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1195.125970] 	Tasks blocked on level-0 rcu_node (CPUs 0-7): P3780
[ 1195.127286] 	(detected by 0, t=1140032 jiffies, g=3458, c=3457, q=0)
[ 1195.128663] rcu-stall       R  running task        0  3780   3757 0x00100082
(...snipped...)
[ 1366.561198] Kill process 4780 (rcu-stall) sharing same memory
---------- console log end ----------
Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20150919.txt.xz .
Kernel config is at http://I-love.SAKURA.ne.jp/tmp/config-4.3-rc1 .

After applying this patch, I can no longer reproduce this problem.
Please check whether I disabled preemption appropriately.
----------------------------------------
>From 9e832b0b9123c38e5f34240d43e41bdefed66a4a Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 19 Sep 2015 16:02:26 +0900
Subject: [PATCH] mm, oom: Disable preemption during OOM-kill operation.

Under CONFIG_PREEMPT=y kernels, I can observe that a local unprivileged
user can make out_of_memory() stall for longer than 20 minutes due to
preemption, by invoking OOM killer with 1000 processes.

Operations with oom_lock held should complete as soon as possible
because we might be preserving OOM condition for most of that period
if we are in OOM condition.

Since we don't use operations which might sleep regarding global OOM, this
patch disables preemption from check_panic_on_oom() till oom_kill_process()
altogether. On the other hand, since we use operations which might sleep
regarding memcg OOM, this patch disables preemption separately.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/memcontrol.c | 7 +++++++
 mm/oom_kill.c   | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5d9a6e8..7ee629e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1326,13 +1326,16 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 		goto unlock;
 	}
 
+	preempt_disable();
 	check_panic_on_oom(&oc, CONSTRAINT_MEMCG, memcg);
+	preempt_enable();
 	totalpages = mem_cgroup_get_limit(memcg) ? : 1;
 	for_each_mem_cgroup_tree(iter, memcg) {
 		struct css_task_iter it;
 		struct task_struct *task;
 
 		css_task_iter_start(&iter->css, &it);
+		preempt_disable();
 		while ((task = css_task_iter_next(&it))) {
 			switch (oom_scan_process_thread(&oc, task, totalpages)) {
 			case OOM_SCAN_SELECT:
@@ -1349,6 +1352,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 				mem_cgroup_iter_break(memcg, iter);
 				if (chosen)
 					put_task_struct(chosen);
+				preempt_enable();
 				goto unlock;
 			case OOM_SCAN_OK:
 				break;
@@ -1367,13 +1371,16 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 			chosen_points = points;
 			get_task_struct(chosen);
 		}
+		preempt_enable();
 		css_task_iter_end(&it);
 	}
 
 	if (chosen) {
 		points = chosen_points * 1000 / totalpages;
+		preempt_disable();
 		oom_kill_process(&oc, chosen, points, totalpages, memcg,
 				 "Memory cgroup out of memory");
+		preempt_enable();
 	}
 unlock:
 	mutex_unlock(&oom_lock);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1ecc0bc..9e2ca62 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -668,6 +668,8 @@ bool out_of_memory(struct oom_control *oc)
 		return true;
 	}
 
+	/* Disable preemption in order to send SIGKILL as soon as possible. */
+	preempt_disable();
 	/*
 	 * Check if there were limitations on the allocation (only relevant for
 	 * NUMA) that may require different handling.
@@ -683,6 +685,7 @@ bool out_of_memory(struct oom_control *oc)
 		get_task_struct(current);
 		oom_kill_process(oc, current, 0, totalpages, NULL,
 				 "Out of memory (oom_kill_allocating_task)");
+		preempt_enable();
 		return true;
 	}
 
@@ -695,12 +698,15 @@ bool out_of_memory(struct oom_control *oc)
 	if (p && p != (void *)-1UL) {
 		oom_kill_process(oc, p, points, totalpages, NULL,
 				 "Out of memory");
+		preempt_enable();
 		/*
 		 * Give the killed process a good chance to exit before trying
 		 * to allocate memory again.
 		 */
 		schedule_timeout_killable(1);
+		return true;
 	}
+	preempt_enable();
 	return true;
 }
 
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm, oom: Disable preemption during OOM-kill operation.
  2015-09-19  7:05 [PATCH] mm, oom: Disable preemption during OOM-kill operation Tetsuo Handa
@ 2015-09-22 16:55 ` Michal Hocko
  2015-09-23 14:26   ` Tetsuo Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2015-09-22 16:55 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: rientjes, hannes, linux-mm

On Sat 19-09-15 16:05:12, Tetsuo Handa wrote:
> Well, this seems to be a problem which prevents me from testing various
> patches that tries to address OOM livelock problem.
> 
> ---------- rcu-stall.c start ----------
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sched.h>
> 
> static int dummy(void *fd)
> {
> 	char c;
> 	/* Wait until the first child thread is killed by the OOM killer. */
> 	read(* (int *) fd, &c, 1);
> 	/* Try to consume as much CPU time as possible via preemption. */
> 	while (1);

You would kill the system by this alone. Having 1000 busy loops just
kills your machine from doing anything useful and you are basically
DoS-ed. I am not sure sprinkling preempt_{enable,disable} all around the
oom path makes much difference. If anything having a kernel high
priority kernel thread sounds like a better approach.

[...]

> 	for (i = 0; i < 1000; i++) {
> 		clone(dummy, malloc(1024) + 1024, CLONE_SIGHAND | CLONE_VM,
> 		      &pipe_fd[0]);
> 		if (!i)
> 			close(pipe_fd[1]);
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm, oom: Disable preemption during OOM-kill operation.
  2015-09-22 16:55 ` Michal Hocko
@ 2015-09-23 14:26   ` Tetsuo Handa
  2015-09-23 20:23     ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2015-09-23 14:26 UTC (permalink / raw)
  To: mhocko; +Cc: rientjes, hannes, linux-mm

Michal Hocko wrote:
> On Sat 19-09-15 16:05:12, Tetsuo Handa wrote:
> > Well, this seems to be a problem which prevents me from testing various
> > patches that tries to address OOM livelock problem.
> > 
> > ---------- rcu-stall.c start ----------
> > #define _GNU_SOURCE
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> > #include <sys/types.h>
> > #include <sys/stat.h>
> > #include <fcntl.h>
> > #include <sched.h>
> > 
> > static int dummy(void *fd)
> > {
> > 	char c;
> > 	/* Wait until the first child thread is killed by the OOM killer. */
> > 	read(* (int *) fd, &c, 1);
> > 	/* Try to consume as much CPU time as possible via preemption. */
> > 	while (1);
> 
> You would kill the system by this alone. Having 1000 busy loops just
> kills your machine from doing anything useful and you are basically
> DoS-ed. I am not sure sprinkling preempt_{enable,disable} all around the
> oom path makes much difference. If anything having a kernel high
> priority kernel thread sounds like a better approach.

Of course, this is not a reproducer which I'm using when I'm bothered by
this problem. I used 1000 in rcu-stall just as an extreme example. I'm
bothered by this problem when there are probably only a few runnable tasks.

If this patch is not applied on preemptive kernels, the OOM-kill operation
by rcu-stall took 20 minutes. On the other hand, if this patch is applied
on preemptive kernels, or the kernel is not preemptive from the beginning,
the OOM-kill operation by rcu-stall took only 3 seconds.

The delay in OOM-kill operation in preemptive kernels varies depending on
number of runnable tasks (on a CPU which is executing the oom path) and
their priority.

Sprinkling preempt_{enable,disable} all around the oom path can temporarily
slow down threads with higher priority. But doing so can guarantee that
the oom path is not delayed indefinitely. Imagine a scenario where a task
with idle priority called the oom path and other tasks with normal or
realtime priority preempt. How long will we hold oom_lock and keep the
system under oom?

So, I think it makes sense to disable preemption during OOM-kill
operation.

By the way, I'm not familiar with cgroups. If CPU resource the task which
called the oom path is allowed to use only one percent of single CPU, is
the delay multiplied by 100 (e.g. 1 second -> 100 seconds)?

> 
> [...]
> 
> > 	for (i = 0; i < 1000; i++) {
> > 		clone(dummy, malloc(1024) + 1024, CLONE_SIGHAND | CLONE_VM,
> > 		      &pipe_fd[0]);
> > 		if (!i)
> > 			close(pipe_fd[1]);
> -- 
> Michal Hocko
> SUSE Labs
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm, oom: Disable preemption during OOM-kill operation.
  2015-09-23 14:26   ` Tetsuo Handa
@ 2015-09-23 20:23     ` Michal Hocko
  2015-09-27  5:51       ` Tetsuo Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2015-09-23 20:23 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: rientjes, hannes, linux-mm

On Wed 23-09-15 23:26:35, Tetsuo Handa wrote:
[...]
> Sprinkling preempt_{enable,disable} all around the oom path can temporarily
> slow down threads with higher priority. But doing so can guarantee that
> the oom path is not delayed indefinitely. Imagine a scenario where a task
> with idle priority called the oom path and other tasks with normal or
> realtime priority preempt. How long will we hold oom_lock and keep the
> system under oom?

What I've tried to say is that the OOM killer context might get priority
boost to make sure it makes sufficient progress. This would be much more
systematic approach IMO than sprinkling preempt_{enable,disable} all over
the place.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm, oom: Disable preemption during OOM-kill operation.
  2015-09-23 20:23     ` Michal Hocko
@ 2015-09-27  5:51       ` Tetsuo Handa
  0 siblings, 0 replies; 5+ messages in thread
From: Tetsuo Handa @ 2015-09-27  5:51 UTC (permalink / raw)
  To: mhocko; +Cc: rientjes, hannes, linux-mm, oleg

(Added Oleg, for he might want to combine memory unmapper kernel thread
and this OOM killer thread shown in this post.)

Michal Hocko wrote:
> On Wed 23-09-15 23:26:35, Tetsuo Handa wrote:
> [...]
> > Sprinkling preempt_{enable,disable} all around the oom path can temporarily
> > slow down threads with higher priority. But doing so can guarantee that
> > the oom path is not delayed indefinitely. Imagine a scenario where a task
> > with idle priority called the oom path and other tasks with normal or
> > realtime priority preempt. How long will we hold oom_lock and keep the
> > system under oom?
> 
> What I've tried to say is that the OOM killer context might get priority
> boost to make sure it makes sufficient progress. This would be much more
> systematic approach IMO than sprinkling preempt_{enable,disable} all over
> the place.

Unlike boosting priority of fatal_signal_pending() OOM victim threads,
we need to undo it after returning from out_of_memory(). And the priority
of current thread which is calling out_of_memory() can be manipulated by
other threads. In order to avoid loosing new priority by restoring old
priority after returning from out_of_memory(), a dedicated kernel thread
will be needed. I think we will use a kernel thread named OOM kiiler.
So, did you mean something like below?

------------------------------------------------------------
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 03e6257..29d6190a 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -31,6 +31,13 @@ struct oom_control {
 	 * for display purposes.
 	 */
 	const int order;
+
+	/* Used for comunicating with OOM-killer kernel thread */
+	struct list_head list;
+	struct task_struct *task;
+	unsigned long totalpages;
+	int cpu;
+	bool done;
 };
 
 /*
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 03b612b..3b8edd0 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -35,6 +35,8 @@
 #include <linux/freezer.h>
 #include <linux/ftrace.h>
 #include <linux/ratelimit.h>
+#include <linux/kthread.h>
+#include <linux/utsname.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/oom.h>
@@ -386,14 +388,23 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
 static void dump_header(struct oom_control *oc, struct task_struct *p,
 			struct mem_cgroup *memcg)
 {
-	task_lock(current);
+	struct task_struct *task = oc->task;
+	task_lock(task);
 	pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
 		"oom_score_adj=%hd\n",
-		current->comm, oc->gfp_mask, oc->order,
-		current->signal->oom_score_adj);
-	cpuset_print_task_mems_allowed(current);
-	task_unlock(current);
-	dump_stack();
+		task->comm, oc->gfp_mask, oc->order,
+		task->signal->oom_score_adj);
+	cpuset_print_task_mems_allowed(task);
+	task_unlock(task);
+	/* dump_lock logic is missing here. */
+	printk(KERN_DEFAULT "CPU: %d PID: %d Comm: %.20s %s %s %.*s\n",
+	       oc->cpu, task->pid, task->comm,
+	       print_tainted(), init_utsname()->release,
+	       (int)strcspn(init_utsname()->version, " "),
+	       init_utsname()->version);
+	/* "Hardware name: " line is missing here. */
+	print_worker_info(KERN_DEFAULT, task);
+	show_stack(task, NULL);
 	if (memcg)
 		mem_cgroup_print_oom_info(memcg, p);
 	else
@@ -408,7 +419,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p,
 static atomic_t oom_victims = ATOMIC_INIT(0);
 static DECLARE_WAIT_QUEUE_HEAD(oom_victims_wait);
 
-bool oom_killer_disabled __read_mostly;
+bool oom_killer_disabled __read_mostly = true;
 
 /**
  * mark_oom_victim - mark the given task as OOM victim
@@ -647,6 +658,68 @@ int unregister_oom_notifier(struct notifier_block *nb)
 }
 EXPORT_SYMBOL_GPL(unregister_oom_notifier);
 
+static DECLARE_WAIT_QUEUE_HEAD(oom_request_wait);
+static DECLARE_WAIT_QUEUE_HEAD(oom_response_wait);
+static LIST_HEAD(oom_request_list);
+static DEFINE_SPINLOCK(oom_request_list_lock);
+
+static int oom_killer(void *unused)
+{
+	struct task_struct *p;
+	unsigned int uninitialized_var(points);
+	struct oom_control *oc;
+
+	/* Boost priority in order to send SIGKILL as soon as possible. */
+	set_user_nice(current, MIN_NICE);
+
+ start:
+	wait_event(oom_request_wait, !list_empty(&oom_request_list));
+	oc = NULL;
+	spin_lock(&oom_request_list_lock);
+	if (!list_empty(&oom_request_list))
+		oc = list_first_entry(&oom_request_list, struct oom_control, list);
+	spin_unlock(&oom_request_list_lock);
+	if (!oc)
+		goto start;
+	p = oc->task;
+
+	/* Disable preemption in order to send SIGKILL as soon as possible. */
+	preempt_disable();
+
+	if (sysctl_oom_kill_allocating_task && p->mm &&
+	    !oom_unkillable_task(p, NULL, oc->nodemask) &&
+	    p->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
+		get_task_struct(p);
+		oom_kill_process(oc, p, 0, oc->totalpages, NULL,
+				 "Out of memory (oom_kill_allocating_task)");
+		goto end;
+	}
+
+	p = select_bad_process(oc, &points, oc->totalpages);
+	/* Found nothing?!?! Either we hang forever, or we panic. */
+	if (!p && !is_sysrq_oom(oc)) {
+		dump_header(oc, NULL, NULL);
+		panic("Out of memory and no killable processes...\n");
+	}
+	if (p && p != (void *)-1UL)
+		oom_kill_process(oc, p, points, oc->totalpages, NULL,
+				 "Out of memory");
+ end:
+	preempt_enable();
+	oc->done = true;
+	wake_up_all(&oom_response_wait);
+	goto start;
+}
+
+static int __init run_oom_killer(void)
+{
+	struct task_struct *task = kthread_run(oom_killer, NULL, "OOM-killer");
+	BUG_ON(IS_ERR(task));
+	oom_killer_disabled = false;
+	return 0;
+}
+postcore_initcall(run_oom_killer);
+
 /**
  * out_of_memory - kill the "best" process when we run out of memory
  * @oc: pointer to struct oom_control
@@ -658,10 +731,8 @@ EXPORT_SYMBOL_GPL(unregister_oom_notifier);
  */
 bool out_of_memory(struct oom_control *oc)
 {
-	struct task_struct *p;
 	unsigned long totalpages;
 	unsigned long freed = 0;
-	unsigned int uninitialized_var(points);
 	enum oom_constraint constraint = CONSTRAINT_NONE;
 
 	if (oom_killer_disabled)
@@ -672,6 +743,7 @@ bool out_of_memory(struct oom_control *oc)
 		/* Got some memory back in the last second. */
 		return true;
 
+	oc->task = current;
 	/*
 	 * If current has a pending SIGKILL or is exiting, then automatically
 	 * select it.  The goal is to allow it to allocate so that it may
@@ -695,30 +767,23 @@ bool out_of_memory(struct oom_control *oc)
 		oc->nodemask = NULL;
 	check_panic_on_oom(oc, constraint, NULL);
 
-	if (sysctl_oom_kill_allocating_task && current->mm &&
-	    !oom_unkillable_task(current, NULL, oc->nodemask) &&
-	    current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
-		get_task_struct(current);
-		oom_kill_process(oc, current, 0, totalpages, NULL,
-				 "Out of memory (oom_kill_allocating_task)");
-		return true;
-	}
-
-	p = select_bad_process(oc, &points, totalpages);
-	/* Found nothing?!?! Either we hang forever, or we panic. */
-	if (!p && !is_sysrq_oom(oc)) {
-		dump_header(oc, NULL, NULL);
-		panic("Out of memory and no killable processes...\n");
-	}
-	if (p && p != (void *)-1UL) {
-		oom_kill_process(oc, p, points, totalpages, NULL,
-				 "Out of memory");
-		/*
-		 * Give the killed process a good chance to exit before trying
-		 * to allocate memory again.
-		 */
-		schedule_timeout_killable(1);
-	}
+	/* OK. Let's wait for OOM killer. */
+	oc->cpu = raw_smp_processor_id();
+	oc->totalpages = totalpages;		
+	oc->done = false;
+	spin_lock(&oom_request_list_lock);
+	list_add(&oc->list, &oom_request_list);
+	spin_unlock(&oom_request_list_lock);
+	wake_up(&oom_request_wait);
+	wait_event(oom_response_wait, oc->done);
+	spin_lock(&oom_request_list_lock);
+	list_del(&oc->list);
+	spin_unlock(&oom_request_list_lock);
+	/*
+	 * Give the killed process a good chance to exit before trying
+	 * to allocate memory again.
+	 */
+	schedule_timeout_killable(1);
 	return true;
 }
 
------------------------------------------------------------

By the way, I think that we might want to omit dump_header() call
if the OOM victim's mm was already reported by previous OOM events
because output by show_mem() and dump_tasks() in dump_header() is
noisy. All OOM events between uptime 110 and 303 of
http://I-love.SAKURA.ne.jp/tmp/serial-20150927.txt.xz are choosing
the same mm. Even if the first OOM event completed within a few
seconds by disabling preemption, subsequent OOM events which
sequentially choose OOM victims without TIF_MEMDIE consumed many
seconds after all.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-09-27  5:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-19  7:05 [PATCH] mm, oom: Disable preemption during OOM-kill operation Tetsuo Handa
2015-09-22 16:55 ` Michal Hocko
2015-09-23 14:26   ` Tetsuo Handa
2015-09-23 20:23     ` Michal Hocko
2015-09-27  5:51       ` Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox