From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1908C433FE for ; Fri, 4 Nov 2022 08:06:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 14C1E8E0001; Fri, 4 Nov 2022 04:06:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D3946B0073; Fri, 4 Nov 2022 04:06:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB5698E0001; Fri, 4 Nov 2022 04:06:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D65156B0071 for ; Fri, 4 Nov 2022 04:06:52 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8C136404E8 for ; Fri, 4 Nov 2022 08:06:52 +0000 (UTC) X-FDA: 80095028664.22.089A80F Received: from r3-20.sinamail.sina.com.cn (r3-20.sinamail.sina.com.cn [202.108.3.20]) by imf05.hostedemail.com (Postfix) with ESMTP id 1A571100007 for ; Fri, 4 Nov 2022 08:06:49 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([114.249.57.238]) by sina.com (172.16.97.35) with ESMTP id 6364C794000173EE; Fri, 4 Nov 2022 16:04:37 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 84994115073705 From: Hillf Danton To: Jan Kara Cc: LKML , Thomas Gleixner , Steven Rostedt , Sebastian Andrzej Siewior , linux-mm@kvack.org, Mel Gorman Subject: Re: Crash with PREEMPT_RT on aarch64 machine Date: Fri, 4 Nov 2022 16:06:37 +0800 Message-Id: <20221104080637.626-1-hdanton@sina.com> In-Reply-To: <20221103115444.m2rjglbkubydidts@quack3> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.20 as permitted sender) smtp.mailfrom=hdanton@sina.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667549212; a=rsa-sha256; cv=none; b=nmXl83sUeAea3tVWtXoqq3VIae3qxkcdlFHkibizIFz5TSfI7zVqUPxkouhpm7mSv2u9Cd rpOg1vWln3vZBPIVNiifdM5Fgu2MGLGWMvePh/KeIZMdvvE5AThDa+1BPMgV/IeWlZG8dL W1B0v1K7MIPOYV7Gd3sJL0yc91WuSe0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667549212; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6c5f3cIly+O41lYS4Zl20mMVYh0Jjb9lSL9NyCyNAAw=; b=D+V4f8ZNhN3H9WxUxHCcNpxfy+igDdsMkCfQBI00uRLDURYJC9yLla75dX1IcU5FiNQdYS KBYR25qHXsXWZoU3x+tVjmNR2j6T8ShDneYQLS8IS8pc5OELl1IKnU6oseghJIsWaOmAzW 8E8SYmzQNScPO0qc4d1QO5qUZY9KThU= X-Stat-Signature: ycnhxoind6na4y7s376ycdprxmqoetmb X-Rspamd-Queue-Id: 1A571100007 Authentication-Results: imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.20 as permitted sender) smtp.mailfrom=hdanton@sina.com; dmarc=none X-Rspamd-Server: rspam03 X-Rspam-User: X-HE-Tag: 1667549209-836720 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3 Nov 2022 12:54:44 +0100 Jan Kara > Hello, > > I was tracking down the following crash with 6.0 kernel with > patch-6.0.5-rt14.patch applied: > > [ T6611] ------------[ cut here ]------------ > [ T6611] kernel BUG at fs/inode.c:625! > [ T6611] Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP > [ T6611] Modules linked in: xfs(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) arm_spe_pmu(E) mlx5_core(E) sunrpc(E) mlxfw(E) pci_hyperv_intf(E) nls_iso8859_1(E) acpi_ipmi(E) nls_cp437(E) ipmi_ssif(E) vfat(E) ipmi_devintf(E) tls(E) igb(E) psample(E) button(E) arm_cmn(E) arm_dmc620_pmu(E) ipmi_msghandler(E) fat(E) cppc_cpufreq(E) arm_dsu_pmu(E) fuse(E) ip_tables(E) x_tables(E) ast(E) i2c_algo_bit(E) drm_vram_helper(E) aes_ce_blk(E) aes_ce_cipher(E) crct10dif_ce(E) ghash_ce(E) gf128mul(E) nvme(E) drm_kms_helper(E) sha2_ce(E) syscopyarea(E) sha256_arm64(E) sysfillrect(E) xhci_pci(E) sha1_ce(E) sysimgblt(E) nvme_core(E) xhci_pci_renesas(E) fb_sys_fops(E) nvme_common(E) drm_ttm_helper(E) sbsa_gwdt(E) t10_pi(E) ttm(E) xhci_hcd(E) crc64_rocksoft_generic(E) crc64_rocksoft(E) usbcore(E) crc64(E) drm(E) usb_common(E) i2c_designware_platform(E) i2c_designware_core(E) btrfs(E) blake2b_generic(E) libcrc32c(E) xor(E) xor_neon(E) > [ T6611] raid6_pq(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) > [ T6611] CPU: 11 PID: 6611 Comm: dbench Tainted: G E 6.0.0-rt14-rt+ #1 4a18df02c109f1e703cf2ff86b77cf9cd9d5a188 > [ T6611] Hardware name: GIGABYTE R272-P30-JG/MP32-AR0-JG, BIOS F16f (SCP: 1.06.20210615) 07/01/2021 > [ T6611] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ T6611] pc : clear_inode+0xa0/0xc0 > [ T6611] lr : clear_inode+0x38/0xc0 > [ T6611] sp : ffff80000f4f3cd0 > [ T6611] x29: ffff80000f4f3cd0 x28: ffff07ff92142000 x27: 0000000000000000 > [ T6611] x26: ffff08012aef6058 x25: 0000000000000002 x24: ffffb657395e8000 > [ T6611] x23: ffffb65739072008 x22: ffffb656e0bed0a8 x21: ffff08012aef6190 > [ T6611] x20: ffff08012aef61f8 x19: ffff08012aef6058 x18: 0000000000000014 > [ T6611] x17: 00000000f0d86255 x16: ffffb65737dfdb00 x15: 0100000004000000 > [ T6611] x14: 644d000008090000 x13: 644d000008090000 x12: ffff80000f4f3b20 > [ T6611] x11: 0000000000000002 x10: ffff083f5ffbe1c0 x9 : ffffb657388284a4 > [ T6611] x8 : fffffffffffffffe x7 : ffff80000f4f3b20 x6 : ffff80000f4f3b20 > [ T6611] x5 : ffff08012aef6210 x4 : ffff08012aef6210 x3 : 0000000000000000 > [ T6611] x2 : ffff08012aef62d8 x1 : ffff07ff8fbbf690 x0 : ffff08012aef61a0 > [ T6611] Call trace: > [ T6611] clear_inode+0xa0/0xc0 > [ T6611] evict+0x160/0x180 > [ T6611] iput+0x154/0x240 > [ T6611] do_unlinkat+0x184/0x300 > [ T6611] __arm64_sys_unlinkat+0x48/0xc0 > [ T6611] el0_svc_common.constprop.4+0xe4/0x2c0 > [ T6611] do_el0_svc+0xac/0x100 > [ T6611] el0_svc+0x78/0x200 > [ T6611] el0t_64_sync_handler+0x9c/0xc0 > [ T6611] el0t_64_sync+0x19c/0x1a0 > [ T6611] Code: d4210000 d503201f d4210000 d503201f (d4210000) > [ T6611] ---[ end trace 0000000000000000 ]--- > > The machine is aarch64 architecture, kernel config is attached. I have seen > the crashes also with 5.14-rt kernel so it is not a new thing. The crash is > triggered relatively reliably (on two different aarch64 machines) by our > performance testing framework when running dbench benchmark against an XFS > filesystem. > > Now originally I thought this is some problem with XFS or writeback code > but after debugging this for some time I don't think that anymore. > clear_inode() complains about inode->i_wb_list being non-empty. In fact > looking at the list_head, I can see it is corrupted. In all the occurences > of the problem ->prev points back to the list_head itself but ->next points > to some list_head that used to be part of the sb->s_inodes_wb list (or > actually that list spliced in wait_sb_inodes() because I've seen a pointer to > the stack as ->next pointer as well). > > This is not just some memory ordering issue with the check in > clear_inode(). If I add sb->s_inode_wblist_lock locking around the check in > clear_inode(), the problem still reproduces. > > If I enable CONFIG_DEBUG_LIST or if I convert sb->s_inode_wblist_lock to > raw_spinlock_t, the problem disappears. > > Finally, I'd note that the list is modified from three places which makes > audit relatively simple. sb_mark_inode_writeback(), > sb_clear_inode_writeback(), and wait_sb_inodes(). All these places hold > sb->s_inode_wblist_lock when modifying the list. So at this point I'm at > loss what could be causing this. As unlikely as it seems to me I've started > wondering whether it is not some subtle issue with RT spinlocks on aarch64 > possibly in combination with interrupts (because sb_clear_inode_writeback() > may be called from an interrupt). > > Any ideas? Feel free to collect debug info ONLY in your spare cycles, given your relatively reliable reproducer. Only for thoughts. Hillf +++ b/fs/fs-writeback.c @@ -1256,6 +1256,7 @@ void sb_mark_inode_writeback(struct inod if (list_empty(&inode->i_wb_list)) { spin_lock_irqsave(&sb->s_inode_wblist_lock, flags); if (list_empty(&inode->i_wb_list)) { + ihold(inode); list_add_tail(&inode->i_wb_list, &sb->s_inodes_wb); trace_sb_mark_inode_writeback(inode); } @@ -1272,12 +1273,19 @@ void sb_clear_inode_writeback(struct ino unsigned long flags; if (!list_empty(&inode->i_wb_list)) { + int put = 0; spin_lock_irqsave(&sb->s_inode_wblist_lock, flags); if (!list_empty(&inode->i_wb_list)) { + put = 1; list_del_init(&inode->i_wb_list); trace_sb_clear_inode_writeback(inode); } spin_unlock_irqrestore(&sb->s_inode_wblist_lock, flags); + if (put) { + ihold(inode); + iput(inode); + iput(inode); + } } }