From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCFC8C63699 for ; Sat, 14 Nov 2020 06:52:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 742BE22280 for ; Sat, 14 Nov 2020 06:52:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="MLbSZfVL" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 742BE22280 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 18AE56B0096; Sat, 14 Nov 2020 01:52:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1615A6B0098; Sat, 14 Nov 2020 01:52:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 029A06B0099; Sat, 14 Nov 2020 01:52:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0066.hostedemail.com [216.40.44.66]) by kanga.kvack.org (Postfix) with ESMTP id C6C056B0096 for ; Sat, 14 Nov 2020 01:52:25 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6D8028249980 for ; Sat, 14 Nov 2020 06:52:25 +0000 (UTC) X-FDA: 77482105050.01.sort28_2609ba827315 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 4E35A1004CE5A for ; Sat, 14 Nov 2020 06:52:25 +0000 (UTC) X-HE-Tag: sort28_2609ba827315 X-Filterd-Recvd-Size: 4996 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Sat, 14 Nov 2020 06:52:24 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 792B622277; Sat, 14 Nov 2020 06:52:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1605336744; bh=ECtxgYvwoT4pbPImwGMFk+vSEuJ7PChH8z47HraQPjc=; h=Date:From:To:Subject:In-Reply-To:From; b=MLbSZfVLUYdhUwExR6NhqZ3vlgLgXvaZ33lBWS6uDahBZKPOlisPqhbG6OIBTSMVO a8qczJz2HXC0T4WCr16t1c6SKEBv8IBmZMbElBvzYVByEVGFyuFNoVgWP1nk+89VjU 0TJ9ZqWh0vnIW9BdV316rbuXUh5zG6st4MjFCS2I= Date: Fri, 13 Nov 2020 22:52:23 -0800 From: Andrew Morton To: akpm@linux-foundation.org, gechangwei@live.cn, ghe@suse.com, jlbec@evilplan.org, joseph.qi@linux.alibaba.com, junxiao.bi@oracle.com, linux-mm@kvack.org, mark@fasheh.com, mm-commits@vger.kernel.org, piaojun@huawei.com, stable@vger.kernel.org, torvalds@linux-foundation.org, wen.gang.wang@oracle.com Subject: [patch 14/14] ocfs2: initialize ip_next_orphan Message-ID: <20201114065223.JN1eernhY%akpm@linux-foundation.org> In-Reply-To: <20201113225115.b24faebc85f710d5aff55aa7@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Wengang Wang Subject: ocfs2: initialize ip_next_orphan Though problem if found on a lower 4.1.12 kernel, I think upstream has same issue. In one node in the cluster, there is the following callback trace: # cat /proc/21473/stack [] __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2] [] ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2] [] ocfs2_evict_inode+0x152/0x820 [ocfs2] [] evict+0xae/0x1a0 [] iput+0x1c6/0x230 [] ocfs2_orphan_filldir+0x5d/0x100 [ocfs2] [] ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2] [] ocfs2_dir_foreach+0x29/0x30 [ocfs2] [] ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2] [] ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2] [] process_one_work+0x169/0x4a0 [] worker_thread+0x5b/0x560 [] kthread+0xcb/0xf0 [] ret_from_fork+0x61/0x90 [] 0xffffffffffffffff The above stack is not reasonable, the final iput shouldn't happen in ocfs2_orphan_filldir() function. Looking at the code, 2067 /* Skip inodes which are already added to recover list, since dio may 2068 * happen concurrently with unlink/rename */ 2069 if (OCFS2_I(iter)->ip_next_orphan) { 2070 iput(iter); 2071 return 0; 2072 } 2073 The logic thinks the inode is already in recover list on seeing ip_next_orphan is non-NULL, so it skip this inode after dropping a reference which incremented in ocfs2_iget(). While, if the inode is already in recover list, it should have another reference and the iput() at line 2070 should not be the final iput (dropping the last reference). So I don't think the inode is really in the recover list (no vmcore to confirm). Note that ocfs2_queue_orphans(), though not shown up in the call back trace, is holding cluster lock on the orphan directory when looking up for unlinked inodes. The on disk inode eviction could involve a lot of IOs which may need long time to finish. That means this node could hold the cluster lock for very long time, that can lead to the lock requests (from other nodes) to the orhpan directory hang for long time. Looking at more on ip_next_orphan, I found it's not initialized when allocating a new ocfs2_inode_info structure. This causes te reflink operations from some nodes hang for very long time waiting for the cluster lock on the orphan directory. Fix: initialize ip_next_orphan as NULL. Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Cc: Signed-off-by: Andrew Morton --- fs/ocfs2/super.c | 1 + 1 file changed, 1 insertion(+) --- a/fs/ocfs2/super.c~ocfs2-initialize-ip_next_orphan +++ a/fs/ocfs2/super.c @@ -1713,6 +1713,7 @@ static void ocfs2_inode_init_once(void * oi->ip_blkno = 0ULL; oi->ip_clusters = 0; + oi->ip_next_orphan = NULL; ocfs2_resv_init_once(&oi->ip_la_data_resv); _