strange oom behaviour on 3.10

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* strange oom behaviour on 3.10
@ 2013-10-09 15:54 William Dauchy
  2013-10-10  0:24 ` Johannes Weiner
  0 siblings, 1 reply; 4+ messages in thread
From: William Dauchy @ 2013-10-09 15:54 UTC (permalink / raw)
  To: Johannes Weiner, cgroups; +Cc: linux-mm

Hi,

I have been through a strange issue with cgroups on v3.10.x.
The oom is triggered for a cgroups wich has reached the memory limit.
I'm getting several:

Task in /lxc/VM_A killed as a result of limit of /lxc/VM_A
memory: usage 262144kB, limit 262144kB, failcnt 44742

which is quite normal.
The last one is:
Task in / killed as a result of limit of /lxc/VM_A
memory: usage 128420kB, limit 262144kB, failcnt 44749

Why do I have a oom kill is this case since the memory usage is ok?
Why is it choosing a task in / instead of in /lxc/VM_A?

Details of last trace is:


CPU: 28 PID: 22783 Comm: mysqld Not tainted 3.10 #1
Hardware name: Dell Inc. PowerEdge C8220/0TDN55, BIOS 1.1.17 01/09/2013
ffffffff815160a7 0000000000000000 ffffffff815136fc 0000000000000000
0000000100000010 0000000000000000 ffff88207fffbd80 0000000100000000
0000000000000000 0000000000000001 ffffffff810b7718 0000000000000001
Call Trace:
[<ffffffff815160a7>] ? dump_stack+0xd/0x17
[<ffffffff815136fc>] ? dump_header+0x78/0x21a
[<ffffffff810b7718>] ? find_lock_task_mm+0x28/0x80
[<ffffffff81103c8b>] ? mem_cgroup_same_or_subtree+0x2b/0x50
[<ffffffff810b7bd0>] ? oom_kill_process+0x270/0x400
[<ffffffff8104a6ec>] ? has_ns_capability_noaudit+0x4c/0x70
[<ffffffff81104f91>] ? __mem_cgroup_try_charge+0x9e1/0xa10
[<ffffffff810f00df>] ? alloc_pages_vma+0xaf/0x1d0
[<ffffffff8110560b>] ? mem_cgroup_charge_common+0x4b/0xa0
[<ffffffff810d7cd4>] ? handle_pte_fault+0x6f4/0x990
[<ffffffff810d92c5>] ? handle_mm_fault+0x355/0x710
[<ffffffff8151212a>] ? mm_fault_error+0xd4/0x1e8
[<ffffffff81028b0e>] ? __do_page_fault+0x17e/0x570
[<ffffffff811f7acb>] ? blk_finish_plug+0xb/0x40
[<ffffffff810d3b7e>] ? SyS_madvise+0x2ae/0x860
[<ffffffff8110b308>] ? SyS_faccessat+0x208/0x230
[<ffffffff8151abe8>] ? page_fault+0x38/0x40
Task in / killed as a result of limit of /lxc/VM_A
memory: usage 128420kB, limit 262144kB, failcnt 44749
memory+swap: usage 128420kB, limit 524288kB, failcnt 0
kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
Memory cgroup stats for /lxc/VM_A: cache:65588KB rss:66752KB
rss_huge:12288KB mapped_file:256KB swap:0KB inactive_anon:4372KB
active_anon:127900KB inactive_file:8KB active_file:0KB unevictable:0KB
[ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[ 1418]     0  1418     4441      427      14        0             0 start
[ 1622]  5101  1622    65868    10170      62        0         -1000 mysqld
[ 2221]  5000  2221    89139     1857     121        0             0 php5-fpm
[ 2235]  5001  2235    24212      951      52        0             0 apache2
[32334]     0 32334     1023       80       8        0             0 sleep
[32337]  5001 32337   193388     2897     124        0             0 apache2
[14138]  5000 14138    93086     6582     129        0             0 php5-fpm
[22853]  5000 22853    89887     2773     124        0             0 php5-fpm
Memory cgroup out of memory: Kill process 1458 (php5-fpm) score 705 or
sacrifice child

I even don't have the usual last line "Killed process [...]"

After that I have all the details of stalls tasks before complete
machine freeze.

INFO: rcu_preempt detected stalls on CPUs/tasks: { 12} (detected by 1,
t=15015 jiffies, g=10207183, c=10207182, q=412)
sending NMI to all CPUs:
NMI backtrace for cpu 0
CPU: 0 PID: 21642 Comm: php5-fpm Not tainted 3.10 #1
Hardware name: Dell Inc. PowerEdge C8220/0TDN55, BIOS 1.1.17 01/09/2013
task: ffff880f18128fe0 ti: ffff880f18129470 task.ti: ffff880f18129470
RIP: 0010:[<ffffffff8122786a>]  [<ffffffff8122786a>]
__write_lock_failed+0x1a/0x40
RSP: 0018:ffff880ff258be98  EFLAGS: 00000087
RAX: ffff880f18129470 RBX: ffff880f18129580 RCX: ffff880ff258bee8
RDX: 0000000000000058 RSI: 0000000000000001 RDI: ffffffff81a04040
RBP: ffff881023116900 R08: 0000000000000037 R09: 0000000000000000
R10: 000000000000001c R11: 0000000000000000 R12: ffff881023116970
R13: ffff880f18128fe0 R14: 0000000000000000 R15: ffff880f18128fe0
FS:  0000000000000000(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000383d57dd300 CR3: 0000000001526000 CR4: 00000000000607f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffffffff8151a547 ffffffff81042a9f ffff881990bd4e40 ffff880ff258bee8
ffff882029086520 000000018110e2db 0000000000000002 ffff880f1584a810
000003d805aeddb8 8000000000000000 ffff880ff258bee8 ffff880ff258bee8
Call Trace:
[<ffffffff8151a547>] ? _raw_write_lock_irq+0x27/0x30
[<ffffffff81042a9f>] ? do_exit+0x30f/0xab0
[<ffffffff810432b8>] ? do_group_exit+0x38/0xa0
[<ffffffff81043332>] ? SyS_exit_group+0x12/0x20
[<ffffffff8151b3be>] ? system_call_fastpath+0x18/0x1d
Code: 48 0f ba 2c 24 3f c3 90 90 90 90 90 90 90 90 90 90 f0 81 07 00
00 10 00 71 09 f0 81 2f 00 00 10 00 cd 04 f3 90 81 3f 00 00 10 00 <75>
f6 f0 81 2f 00 00 10 00 71 09 f0 81 07 00 00 10 00 cd 04 75


My 3.10.x build includes these additional patches:
609838c mm: invoke oom-killer from remaining unconverted page fault handlers
94bce45 arch: mm: remove obsolete init OOM protection
8713410 arch: mm: do not invoke OOM killer on kernel fault OOM
759496b arch: mm: pass userspace fault flag to generic fault handler
3a13c4d x86: finish user fault error path with fatal signal
519e524 mm: memcg: enable memcg OOM killer only for user faults
fb2a6fc mm: memcg: rework and document OOM waiting and wakeup
3812c8c mm: memcg: do not trap chargers with full callstack on OOM
658b72c memcg: check for proper lock held in mem_cgroup_update_page_stat

and also last patches from Johannes Weiner:
mm: memcg: handle non-error OOM situations more gracefully
fs: buffer: move allocation failure loop into the allocator

Any hint? Am I missing something?

Best regards,
-- 
William

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: strange oom behaviour on 3.10
  2013-10-09 15:54 strange oom behaviour on 3.10 William Dauchy
@ 2013-10-10  0:24 ` Johannes Weiner
  2013-10-10 12:30   ` William Dauchy
  2013-10-10 20:47   ` William Dauchy
  0 siblings, 2 replies; 4+ messages in thread
From: Johannes Weiner @ 2013-10-10  0:24 UTC (permalink / raw)
  To: William Dauchy; +Cc: cgroups, linux-mm

Hi William,

On Wed, Oct 09, 2013 at 05:54:20PM +0200, William Dauchy wrote:
> Hi,
> 
> I have been through a strange issue with cgroups on v3.10.x.
> The oom is triggered for a cgroups wich has reached the memory limit.
> I'm getting several:
> 
> Task in /lxc/VM_A killed as a result of limit of /lxc/VM_A
> memory: usage 262144kB, limit 262144kB, failcnt 44742
> 
> which is quite normal.
> The last one is:
> Task in / killed as a result of limit of /lxc/VM_A
> memory: usage 128420kB, limit 262144kB, failcnt 44749
> 
> Why do I have a oom kill is this case since the memory usage is ok?

I suspect a task's OOM context is set up but not handled, so later on
when another task triggers an OOM the OOM killer is invoked on
whatever memcg that OOM context was pointing to.

> Why is it choosing a task in / instead of in /lxc/VM_A?

The memcg in the OOM context could have been freed and corrupted at
that point.

Can you try this patch on top of what you have right now?

---
 mm/memcontrol.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ba3051a..d60f560 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2706,6 +2706,9 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
 	if (unlikely(task_in_memcg_oom(current)))
 		goto bypass;
 
+	if (gfp_mask & __GFP_NOFAIL)
+		oom = false;
+
 	/*
 	 * We always charge the cgroup the mm_struct belongs to.
 	 * The mm_struct's mem_cgroup changes on task migration if the
@@ -2803,10 +2806,10 @@ done:
 	*ptr = memcg;
 	return 0;
 nomem:
-	*ptr = NULL;
-	if (gfp_mask & __GFP_NOFAIL)
-		return 0;
-	return -ENOMEM;
+	if (!(gfp_mask & __GFP_NOFAIL)) {
+		*ptr = NULL;
+		return -ENOMEM;
+	}
 bypass:
 	*ptr = root_mem_cgroup;
 	return -EINTR;
-- 
1.8.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: strange oom behaviour on 3.10
  2013-10-10  0:24 ` Johannes Weiner
@ 2013-10-10 12:30   ` William Dauchy
  2013-10-10 20:47   ` William Dauchy
  1 sibling, 0 replies; 4+ messages in thread
From: William Dauchy @ 2013-10-10 12:30 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: cgroups, linux-mm

Hi Johannes,

Thank you for you quick reply. and the details given for my questions.

On Thu, Oct 10, 2013 at 2:24 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> Can you try this patch on top of what you have right now?

Ok; let me some time to test it.

Regards,
-- 
William

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: strange oom behaviour on 3.10
  2013-10-10  0:24 ` Johannes Weiner
  2013-10-10 12:30   ` William Dauchy
@ 2013-10-10 20:47   ` William Dauchy
  1 sibling, 0 replies; 4+ messages in thread
From: William Dauchy @ 2013-10-10 20:47 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: cgroups, linux-mm

Hi Johannes,

On Thu, Oct 10, 2013 at 2:24 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> Can you try this patch on top of what you have right now?
>
> ---
>  mm/memcontrol.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ba3051a..d60f560 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2706,6 +2706,9 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
>         if (unlikely(task_in_memcg_oom(current)))
>                 goto bypass;
>
> +       if (gfp_mask & __GFP_NOFAIL)
> +               oom = false;
> +
>         /*
>          * We always charge the cgroup the mm_struct belongs to.
>          * The mm_struct's mem_cgroup changes on task migration if the
> @@ -2803,10 +2806,10 @@ done:
>         *ptr = memcg;
>         return 0;
>  nomem:
> -       *ptr = NULL;
> -       if (gfp_mask & __GFP_NOFAIL)
> -               return 0;
> -       return -ENOMEM;
> +       if (!(gfp_mask & __GFP_NOFAIL)) {
> +               *ptr = NULL;
> +               return -ENOMEM;
> +       }
>  bypass:
>         *ptr = root_mem_cgroup;
>         return -EINTR;

Unfortunately, I'm getting the same result with your additional patch:

mysqld invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=-1000
mysqld cpuset=VM_A mems_allowed=0-1
CPU: 15 PID: 4414 Comm: mysqld Not tainted 3.10 #1
Hardware name: Dell Inc. PowerEdge C8220/0TDN55, BIOS 1.1.19 02/25/2013
ffffffff81515f50 0000000000000000 ffffffff815135a5 0101881000000000
ffff88201ddd3800 ffffc9001d2ac040 0000000000000000 0000000000000000
ffffffff81d236f8 ffff88201ddd3800 ffffffff810b7698 0000000000000001
Call Trace:
[<ffffffff81515f50>] ? dump_stack+0xd/0x17
[<ffffffff815135a5>] ? dump_header+0x78/0x21a
[<ffffffff810b7698>] ? find_lock_task_mm+0x28/0x80
[<ffffffff81103bbb>] ? mem_cgroup_same_or_subtree+0x2b/0x50
[<ffffffff810b7b50>] ? oom_kill_process+0x270/0x400
[<ffffffff8104a6fc>] ? has_ns_capability_noaudit+0x4c/0x70
[<ffffffff81105d2e>] ? mem_cgroup_oom_synchronize+0x53e/0x560
[<ffffffff81105150>] ? mem_cgroup_charge_common+0xa0/0xa0
[<ffffffff810b837b>] ? pagefault_out_of_memory+0xb/0x80
[<ffffffff81028e27>] ? __do_page_fault+0x497/0x580
[<ffffffff81158d3e>] ? read_events+0x27e/0x2e0
[<ffffffff81062f20>] ? abort_exclusive_wait+0xb0/0xb0
[<ffffffff81065830>] ? update_rmtp+0x190/0x190
[<ffffffff8151aaa8>] ? page_fault+0x38/0x40
Task in / killed as a result of limit of /lxc/VM_A
memory: usage 53192kB, limit 262144kB, failcnt 99902
memory+swap: usage 53192kB, limit 524288kB, failcnt 0
kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
Memory cgroup stats for /lxc/VM_A: cache:18092KB rss:34988KB
rss_huge:14336KB mapped_file:100KB swap:0KB inactive_anon:4344KB
active_anon:48720KB inactive_file:4KB active_file:0KB unevictable:0KB
[ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[ 4359]     0  4359     4446      233      14        0             0 start
[ 4410]  5101  4410    63969     6404      56        0         -1000 mysqld
[ 4515]  5000  4515    89140     1490     123        0             0 php5-fpm
[ 4520]  5001  4520    24212      959      51        0             0 apache2
[24794]     0 24794     1023       80       8        0             0 sleep
[24795]  5001 24795   176565     2785     121        0             0 apache2
[31892]  5000 31892    89135     1474     118        0             0 php5-fpm
Memory cgroup out of memory: Kill process 31826 (php5-fpm) score 895
or sacrifice child

Do you have some more ideas?

Regards,
-- 
William

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-10-10 20:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-09 15:54 strange oom behaviour on 3.10 William Dauchy
2013-10-10  0:24 ` Johannes Weiner
2013-10-10 12:30   ` William Dauchy
2013-10-10 20:47   ` William Dauchy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox