From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f172.google.com (mail-ig0-f172.google.com [209.85.213.172]) by kanga.kvack.org (Postfix) with ESMTP id C328C6B0009 for ; Tue, 19 Jan 2016 05:29:45 -0500 (EST) Received: by mail-ig0-f172.google.com with SMTP id z14so74119802igp.1 for ; Tue, 19 Jan 2016 02:29:45 -0800 (PST) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [2001:e42:101:1:202:181:97:72]) by mx.google.com with ESMTPS id l8si33579104igv.85.2016.01.19.02.29.44 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 19 Jan 2016 02:29:44 -0800 (PST) Subject: Re: [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab References: <569D06F8.4040209@redhat.com> From: Tetsuo Handa Message-ID: <569E1010.2070806@I-love.SAKURA.ne.jp> Date: Tue, 19 Jan 2016 19:29:36 +0900 MIME-Version: 1.0 In-Reply-To: <569D06F8.4040209@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jan Stancek , linux-mm@kvack.org Cc: ltp@lists.linux.it Jan Stancek wrote: > I'm seeing system occasionally hanging after "oom01" testcase > from LTP triggers OOM. > > Here's a console log obtained from v4.4-8606 (shows oom, followed > by blocked task messages, followed by me triggering sysrq-t): > http://jan.stancek.eu/tmp/oom_hangs/oom_hang_v4.4-8606.txt > http://jan.stancek.eu/tmp/oom_hangs/config-v4.4-8606.txt I think this console log reports an OOM livelock. dump_tasks() shows that there are 2 oom1 tasks (10482 and 10528) but show_state() shows that there are 8 oom1 tasks whose parent is 10482 (10528 10529 10530 10531 10532 10533 10534 10535). Thus, I guess oom1 process called fork() and the child process created 7 threads. ---------- [ 495.322098] [10482] 0 10482 2692 10 11 3 27 0 oom01 [ 495.341231] [10528] 0 10528 8667787 3731327 11002 28 1890661 0 oom01 [ 1736.414809] oom01 S ffff880453dd3e48 13008 10482 10479 0x00000080 [ 1737.025258] oom01 x ffff88044ff9bc98 13080 10528 10482 0x00000084 [ 1737.129553] oom01 R running task 11616 10529 10482 0x00000084 [ 1737.309520] oom01 R running task 11544 10530 10482 0x00000084 [ 1737.417212] oom01 R running task 11616 10531 10482 0x00000084 [ 1737.551904] oom01 R running task 11616 10532 10482 0x00000084 [ 1737.725174] oom01 R running task 11616 10533 10482 0x00000084 [ 1737.898452] oom01 R running task 11616 10534 10482 0x00000084 [ 1738.078431] oom01 R running task 11616 10535 10482 0x00000084 ---------- 10528 got both SIGKILL and TIF_MEMDIE at uptime = 495 and has exited by uptime = 1737. ---------- [ 495.350845] Out of memory: Kill process 10528 (oom01) score 952 or sacrifice child [ 495.359301] Killed process 10528 (oom01) total-vm:34671148kB, anon-rss:14925308kB, file-rss:0kB, shmem-rss:0kB ---------- Since 10529 to 10535 got only SIGKILL, all of them are looping inside __alloc_pages_slowpath(). ---------- [ 1737.129553] oom01 R running task 11616 10529 10482 0x00000084 [ 1737.172049] [] _cond_resched+0x1c/0x30 [ 1737.178072] [] shrink_slab.part.42+0x2cf/0x540 [ 1737.184873] [] ? rcu_read_lock_held+0x5e/0x70 [ 1737.191577] [] shrink_zone+0x30a/0x330 [ 1737.197602] [] do_try_to_free_pages+0x174/0x440 [ 1737.204499] [] try_to_free_pages+0x100/0x2c0 [ 1737.211107] [] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a [ 1737.309520] oom01 R running task 11544 10530 10482 0x00000084 [ 1737.352024] [] _cond_resched+0x1c/0x30 [ 1737.358049] [] __alloc_pages_slowpath.constprop.85+0x3f8/0x74a [ 1737.417212] oom01 R running task 11616 10531 10482 0x00000084 [ 1737.459715] [] _cond_resched+0x1c/0x30 [ 1737.465739] [] shrink_slab.part.42+0x2cf/0x540 [ 1737.472540] [] ? rcu_read_lock_held+0x5e/0x70 [ 1737.479242] [] shrink_zone+0x30a/0x330 [ 1737.485266] [] do_try_to_free_pages+0x174/0x440 [ 1737.492162] [] try_to_free_pages+0x100/0x2c0 [ 1737.498768] [] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a [ 1737.551904] oom01 R running task 11616 10532 10482 0x00000084 [ 1737.594403] [] _cond_resched+0x1c/0x30 [ 1737.600428] [] shrink_slab.part.42+0x210/0x540 [ 1737.607228] [] shrink_zone+0x30a/0x330 [ 1737.613252] [] do_try_to_free_pages+0x174/0x440 [ 1737.620150] [] try_to_free_pages+0x100/0x2c0 [ 1737.626755] [] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a [ 1737.725174] oom01 R running task 11616 10533 10482 0x00000084 [ 1737.767682] [] _cond_resched+0x1c/0x30 [ 1737.773705] [] shrink_slab.part.42+0x210/0x540 [ 1737.780506] [] shrink_zone+0x30a/0x330 [ 1737.786532] [] do_try_to_free_pages+0x174/0x440 [ 1737.793428] [] try_to_free_pages+0x100/0x2c0 [ 1737.800037] [] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a [ 1737.898452] oom01 R running task 11616 10534 10482 0x00000084 [ 1737.940961] [] _cond_resched+0x1c/0x30 [ 1737.946985] [] shrink_slab.part.42+0x2cf/0x540 [ 1737.953785] [] ? rcu_read_lock_held+0x5e/0x70 [ 1737.960488] [] shrink_zone+0x30a/0x330 [ 1737.966512] [] do_try_to_free_pages+0x174/0x440 [ 1737.973408] [] try_to_free_pages+0x100/0x2c0 [ 1737.980015] [] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a [ 1738.078431] oom01 R running task 11616 10535 10482 0x00000084 [ 1738.120938] [] _cond_resched+0x1c/0x30 [ 1738.126961] [] __alloc_pages_slowpath.constprop.85+0x3f8/0x74a ---------- As a result, 10482 remains sleeping at wait(). ---------- [ 1736.414809] oom01 S ffff880453dd3e48 13008 10482 10479 0x00000080 [ 1736.450320] [] schedule+0x3c/0x90 [ 1736.455859] [] do_wait+0x213/0x2f0 [ 1736.461495] [] SyS_wait4+0x80/0x100 ---------- Also, 10519 is a typical trace which is looping inside __alloc_pages_slowpath() which can be observed when we hit OOM livelock. Since this allocation request includes neither __GFP_NOFAIL nor __GFP_FS, out_of_memory() cannot be called. ---------- [ 1736.716427] kworker/2:1 R running task 12616 10519 2 0x00000088 [ 1736.724319] Workqueue: events_freezable_power_ disk_events_workfn [ 1736.731128] 00000000ca948786 ffff88035997b8e8 0000000000000000 0000000000000000 [ 1736.739427] 0000000000000000 0000000000000000 ffff88035997b850 ffffffff811ec4ba [ 1736.747725] ffff88035997b9d0 0000000000000000 0000000000000000 0001000000000000 [ 1736.756022] Call Trace: [ 1736.758749] [] ? shrink_zone+0x30a/0x330 [ 1736.764969] [] ? do_try_to_free_pages+0x174/0x440 [ 1736.772060] [] ? try_to_free_pages+0x100/0x2c0 [ 1736.778861] [] ? __alloc_pages_slowpath.constprop.85+0x3c6/0x74a [ 1736.787408] [] ? __alloc_pages_nodemask+0x456/0x460 [ 1736.794690] [] ? __lock_is_held+0x54/0x70 [ 1736.801004] [] ? alloc_pages_current+0x97/0x1b0 [ 1736.807902] [] ? bio_copy_kern+0xc9/0x180 [ 1736.814217] [] ? blk_rq_map_kern+0x75/0x130 [ 1736.820726] [] ? scsi_execute+0x132/0x160 [ 1736.827041] [] ? scsi_execute_req_flags+0x8e/0xf0 [ 1736.834133] [] ? sr_check_events+0xb7/0x2a0 [sr_mod] [ 1736.841514] [] ? cdrom_check_events+0x18/0x30 [cdrom] [ 1736.848992] [] ? sr_block_check_events+0x2a/0x30 [sr_mod] [ 1736.856859] [] ? disk_check_events+0x60/0x170 [ 1736.863561] [] ? disk_events_workfn+0x1c/0x20 [ 1736.870264] [] ? process_one_work+0x215/0x650 [ 1736.876967] [] ? process_one_work+0x181/0x650 [ 1736.883671] [] ? worker_thread+0x125/0x4a0 [ 1736.890082] [] ? process_one_work+0x650/0x650 [ 1736.896785] [] ? kthread+0x101/0x120 [ 1736.902615] [] ? trace_hardirqs_on_caller+0xf9/0x1c0 [ 1736.909997] [] ? kthread_create_on_node+0x250/0x250 [ 1736.917281] [] ? ret_from_fork+0x3f/0x70 [ 1736.923499] [] ? kthread_create_on_node+0x250/0x250 ---------- What is strange is, 10529 to 10535 should be able to get TIF_MEMDIE and exit __alloc_pages_slowpath(). There are three possibilities. First is that they are too unlucky to take oom_lock mutex which is needed for calling out_of_memory(), but it should not continue failing to take oom_lock mutex for such long time. Second is that their allocation requests include neither __GFP_NOFAIL nor __GFP_FS which is needed for calling out_of_memory(). Third is that, although I couldn't find evidence that mlock() and madvice() are related with this hangup, something is preventing __GFP_NOFAIL or __GFP_FS allocation requests from calling out_of_memory(). If you can reproduce with kmallocwd patch at https://lkml.kernel.org/r/201511250024.AAE78692.QVOtFFOSFOMLJH@I-love.SAKURA.ne.jp applied, I think you will be able to know which possibility you are hitting. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org