From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f50.google.com (mail-la0-f50.google.com [209.85.215.50]) by kanga.kvack.org (Postfix) with ESMTP id 923966B006E for ; Mon, 9 Feb 2015 04:46:21 -0500 (EST) Received: by lams18 with SMTP id s18so6237567lam.11 for ; Mon, 09 Feb 2015 01:46:21 -0800 (PST) Received: from mail-lb0-x22c.google.com (mail-lb0-x22c.google.com. [2a00:1450:4010:c04::22c]) by mx.google.com with ESMTPS id z1si8440347lag.152.2015.02.09.01.46.18 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 09 Feb 2015 01:46:19 -0800 (PST) Received: by mail-lb0-f172.google.com with SMTP id l4so28606502lbv.3 for ; Mon, 09 Feb 2015 01:46:18 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <54D87FA8.60408@suse.cz> References: <54D87FA8.60408@suse.cz> Date: Mon, 9 Feb 2015 13:46:18 +0400 Message-ID: Subject: Re: BUG: stuck on mmap_sem in 3.18.6 From: Konstantin Khlebnikov Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: "linux-mm@kvack.org" , Linux Kernel Mailing List On Mon, Feb 9, 2015 at 12:36 PM, Vlastimil Babka wrote: > On 02/09/2015 08:14 AM, Konstantin Khlebnikov wrote: >> Python was running under ptrace-based sandbox "sydbox" used exherbo >> chroot. Kernel: 3.18.6 + my patch "mm: prevent endless growth of >> anon_vma hierarchy" (patch seems stable). >> >> [ 4674.087780] INFO: task python:25873 blocked for more than 120 seconds. >> [ 4674.087793] Tainted: G U 3.18.6-zurg+ #158 >> [ 4674.087797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [ 4674.087801] python D ffff88041e2d2000 14176 25873 25630 0x00000102 >> [ 4674.087817] ffff880286247b68 0000000000000086 ffff8803d5fe6b40 >> 0000000000012000 >> [ 4674.087824] ffff880286247fd8 0000000000012000 ffff88040c16eb40 >> ffff8803d5fe6b40 >> [ 4674.087830] 0000000300000003 ffff8803d5fe6b40 ffff880362888e78 >> ffff880362888e60 >> [ 4674.087836] Call Trace: >> [ 4674.087854] [] schedule+0x29/0x70 >> [ 4674.087865] [] rwsem_down_write_failed+0x1d5/0x2f0 >> [ 4674.087873] [] call_rwsem_down_write_failed+0x13/0x20 >> [ 4674.087881] [] ? down_write+0x31/0x50 >> [ 4674.087891] [] do_coredump+0x144/0xee0 >> [ 4674.087900] [] ? pick_next_task_fair+0x397/0x450 >> [ 4674.087909] [] ? __switch_to+0x1d6/0x5f0 >> [ 4674.087915] [] ? __schedule+0x3a6/0x880 >> [ 4674.087924] [] ? klist_remove+0x40/0xd0 >> [ 4674.087932] [] get_signal+0x298/0x6b0 >> [ 4674.087940] [] do_signal+0x28/0xbb0 >> [ 4674.087946] [] ? do_send_sig_info+0x5d/0x80 >> [ 4674.087955] [] do_notify_resume+0x69/0xb0 >> [ 4674.087963] [] int_signal+0x12/0x17 >> >> Maybe this guy did something wrong? > > Well he has do_coredump on stack, so he did something wrong in userspace? But > here he's just waiting on down_write. Unless there's some bug in do_coredump > that would lock for read and then for write, without an unlock in between? I mean khugepaged. This code looks really messy. Maybe it already has mmap_sem locked for read and tries to lock it again: [ 5153.460186] INFO: task khugepaged:262 blocked for more than 120 seconds. [ 5153.460198] Tainted: G U 3.18.6-zurg+ #158 [ 5153.460201] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5153.460206] khugepaged D ffff88041e292000 14496 262 2 0x00000000 [ 5153.460220] ffff88040b99bcb0 0000000000000046 ffff88040b994a40 0000000000012000 [ 5153.460227] ffff88040b99bfd8 0000000000012000 ffff88040c16e300 ffff88040b994a40 [ 5153.460233] ffffffff810d5c1b ffff88040b994a40 ffff880362888e60 ffffffffffffffff [ 5153.460240] Call Trace: [ 5153.460255] [] ? lock_timer_base.isra.41+0x2b/0x50 [ 5153.460264] [] schedule+0x29/0x70 [ 5153.460272] [] rwsem_down_read_failed+0xd5/0x120 [ 5153.460280] [] call_rwsem_down_read_failed+0x14/0x30 [ 5153.460287] [] ? down_read+0x24/0x30 [ 5153.460297] [] khugepaged+0x381/0x13f0 [ 5153.460309] [] ? abort_exclusive_wait+0xb0/0xb0 [ 5153.460316] [] ? maybe_pmd_mkwrite+0x30/0x30 [ 5153.460325] [] kthread+0xdb/0x100 [ 5153.460332] [] ? kthread_create_on_node+0x170/0x170 [ 5153.460340] [] ret_from_fork+0x7c/0xb0 [ 5153.460347] [] ? kthread_create_on_node+0x170/0x170 > >> Looks like mmap_sem is locked for read: > > So we have the python waiting for write, blocking all new readers (that's how > read/write locks work, right?), but itself waiting for a prior reader to finish. > The question is, who is/was the reader? You could search the mmap_sem or mm > address in the rest of the processes' stacks, and maybe you'll find him? > I haven't found anything suspicious. Kernel was build without any debug so it's hard to tell who holds mmap_sem locked, maybe that task already exited. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org