From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f181.google.com (mail-pf0-f181.google.com [209.85.192.181]) by kanga.kvack.org (Postfix) with ESMTP id A30556B0278 for ; Mon, 7 Dec 2015 19:01:07 -0500 (EST) Received: by pfu207 with SMTP id 207so1885904pfu.2 for ; Mon, 07 Dec 2015 16:01:07 -0800 (PST) Received: from ale.deltatee.com (ale.deltatee.com. [207.54.116.67]) by mx.google.com with ESMTPS id a10si748314pat.195.2015.12.07.16.01.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2015 16:01:05 -0800 (PST) References: <20151010005522.17221.87557.stgit@dwillia2-desk3.jf.intel.com> <562AA15E.3010403@deltatee.com> <565F6A7A.4040302@deltatee.com> <566244CC.5080107@deltatee.com> From: Logan Gunthorpe Message-ID: <56661DBA.5000302@deltatee.com> Date: Mon, 7 Dec 2015 17:00:58 -0700 MIME-Version: 1.0 In-Reply-To: <566244CC.5080107@deltatee.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PATCH v2 00/20] get_user_pages() for dax mappings Sender: owner-linux-mm@kvack.org List-ID: To: Dan Williams Cc: Stephen Bates , Linux MM , "linux-nvdimm@lists.01.org" Hi Dan, I've done a bit of digging and here's some more information: * The crash occurs in ext4_end_io_unwritten when it tries to dereference bh->b_assoc_map which is not necessarily NULL. * That function is called by __dax_pmd_fault, as the argument complete_unwritten. * Looking in __dax_pmd_fault, the bug occurs if we hit either of the first two 'goto fallback' lines. (In my case, it's hitting the first one.) * After the fallback code, it goes back to 'out', then checks '&bh' for the unwritten flag. But bh hasn't been initialized yet and, on my setup, the unwritten flag happens to be set. So, it then calls complete_unwritten with a garbage bh and crashes. If I move the memset(&bh) up in the code, before the goto fallbacks can occur, I can fix the crash. I don't know if this is really the best way to fix the problem though. -- However, unfortunately, fixing the above just uncovered another issue. Now the MR de-registration seems to have completed but the task hangs when it's trying to munmap the memory. (Stack trace at the end of this email.) It looks like the i_mmap_lock_write is hanging in unlink_file_vma. I'm not really sure how to go about debugging this lock issue. If you have any steps I can try to get you more information let me know. I'm also happy to re-test if you have any other changes you'd like me to try. Thanks, Logan > [ 240.520522] INFO: task client:1997 blocked for more than 120 seconds. > [ 240.520638] Tainted: G O 4.4.0-rc3+donard2.5+ #87 > [ 240.520741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 240.520847] client D ffff88047fd14800 0 1997 1912 0x00000004 > [ 240.520856] ffff88026bc7b240 0000000000000000 ffff88026bd38000 ffff88026bd37d30 > [ 240.520861] fffffffeffffffff ffff88026bc7b240 00007f4297513000 ffff880473aba240 > [ 240.520866] ffffffff81422896 ffff880470b34e40 ffffffff814242f1 ffff880476deddc0 > [ 240.520871] Call Trace: > [ 240.520886] [] ? schedule+0x6c/0x79 > [ 240.520893] [] ? rwsem_down_write_failed+0x285/0x2cb > [ 240.520903] [] ? call_rwsem_down_write_failed+0x13/0x20 > [ 240.520907] [] ? call_rwsem_down_write_failed+0x13/0x20 > [ 240.520913] [] ? down_write+0x24/0x33 > [ 240.520923] [] ? unlink_file_vma+0x28/0x4b > [ 240.520928] [] ? free_pgtables+0x3c/0xba > [ 240.520933] [] ? unmap_region+0xa4/0xc1 > [ 240.520941] [] ? pick_next_task_fair+0x11b/0x347 > [ 240.520947] [] ? vma_gap_callbacks_propagate+0x16/0x2c > [ 240.520951] [] ? vma_rb_erase+0x161/0x18f > [ 240.520957] [] ? do_munmap+0x271/0x2e6 > [ 240.520962] [] ? vm_munmap+0x37/0x4f > [ 240.520967] [] ? SyS_munmap+0x1a/0x1f > [ 240.520971] [] ? entry_SYSCALL_64_fastpath+0x12/0x6a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org