From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73A41C25B77 for ; Tue, 21 May 2024 00:59:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 05B8A6B008C; Mon, 20 May 2024 20:59:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 00B266B0092; Mon, 20 May 2024 20:59:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E3BE26B0093; Mon, 20 May 2024 20:59:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C91836B008C for ; Mon, 20 May 2024 20:59:12 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 51DF11A0F2C for ; Tue, 21 May 2024 00:59:12 +0000 (UTC) X-FDA: 82140594144.03.F99365F Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com [209.85.208.174]) by imf23.hostedemail.com (Postfix) with ESMTP id 5D34E140019 for ; Tue, 21 May 2024 00:59:10 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lvvh3Dqo; spf=pass (imf23.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716253150; a=rsa-sha256; cv=none; b=Rv4jqjmH2Ss+k3PIWvYXAso6tSkQLuW7WbzbEFk2s1hLz5JuT/2NxAKyBMzLKfu0sYvGn5 KCDjsx5iktf7VxXrSdxia80zF+lT3pFJNh3TeNw5PdCnvlI9dk2CoCki/KbSre9DwqyzpZ t9yek4Zk5exk30UFUKS2QehcFwZtA6A= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lvvh3Dqo; spf=pass (imf23.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716253150; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VvNdZk+gxWXeeZw5d0RJK4qIooBpCMBFlGJNmV6SngA=; b=wUfdG+g00bcDHtCM40yqychKswl17FMRaSv0pTS0reuJImxdGgZnN1zbXMpusjTZxNHWeJ 4d4MV/0URFyBZ3Q4KITzWsPkpI7Jurf8clzLu07ABTkkl/60KTN6tjN9dW3PU9iM8DYpV0 +PVyX/buEoW1j9H/pyPvpo6yyqoLPcM= Received: by mail-lj1-f174.google.com with SMTP id 38308e7fff4ca-2e724bc466fso21139861fa.3 for ; Mon, 20 May 2024 17:59:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716253148; x=1716857948; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VvNdZk+gxWXeeZw5d0RJK4qIooBpCMBFlGJNmV6SngA=; b=Lvvh3DqoZu95yu+Cgc4bCCXKK4maBy1h7suFiRqnbsKODNpLSXEU18A+5HRZj0YEEU nETiTTbfESL/l9CNTS+OaAWNEw1cPxcoyhuzyez7B1EAInXPAxBNpfGwMNh3Ii7lC1mu UqTQTZYxElU7kocQN1u8hAZ6fCZKH2Q+td0BKOqUCbA7SC0EFcfLepsMe37ufoD3mJwc pBy4R6yqKo+jdihyhgkuzgPxOxqmY3sNsD2ipAxSr2V1QHHTXIBnRwiGjbVq3/hO4ILK YKfh6k5YzEaZVCqiDPuWDXl4tWB979AfqVijtZvfdRbmxKzNZr8P6e5UU4W8k7XhZM6e pwdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716253148; x=1716857948; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VvNdZk+gxWXeeZw5d0RJK4qIooBpCMBFlGJNmV6SngA=; b=qMOdj+NHBL6ILdUw59yzXaKVdYtCyWgnvw9SCH/60JhjoP1HTf7/jWqNGGTN4N5oPW u97u3TVNiRLL4ULimK44gEeqwwYStiVAY3KqN35BdKOy4z3en47pu4sozPAjrDoSv9Y8 R9JAopAsITmtk/+tqakcYkl02ceUUIRen+95ckSq0BumEXF4otUiCRQCBfPZ+iY+PuuZ N7Q3+KnT+Yjm3vp5g35xCIlbu0euRQ7aGDJssFdmgLo4UQPBYDUy7c3powhibEjggfBB WVDCCkVGaT0UZwAJXUk3Kj27ldrisfHZUwjP4uAR2ap1wfC01MsYz5goNZTuBmnRLl+d qw4Q== X-Forwarded-Encrypted: i=1; AJvYcCWnfHI0/ogeBQu52JDsMa8TuUNH+jew+Sw/g+QCBWIHXirkR5n5+KmMbdhjK1pfncxRW5tCDaHLTpOrd6VouaDaTFw= X-Gm-Message-State: AOJu0YzTmKFl66JBGJNy/6YeHvvtveiGYndx+T9Gy+ZpEg/99DYopIHz cQ575iZ0Dv8ZyiHElR9zJmZTIooXAGEFh1FnXW6r+3P0HvfxpGE+F/Ciy+fsVPHSMqyac0jthH/ OUztusBLLylnvqMZI/M+8UTuo1vA= X-Google-Smtp-Source: AGHT+IFvdyCpWDyG/Bra9yJlBXDcu2OUK5upKN+lfZiMS94ErfAfEQDMt6X6jUCx8aLVfpqzFI09wfpxwXEYebT9gJo= X-Received: by 2002:a05:6512:3e01:b0:515:d1a6:1b07 with SMTP id 2adb3069b0e04-5220fc7bdc8mr37243858e87.15.1716253148295; Mon, 20 May 2024 17:59:08 -0700 (PDT) MIME-Version: 1.0 References: <20240412064353.133497-1-zhaoyang.huang@unisoc.com> <20240412143457.5c6c0ae8f6df0f647d7cf0be@linux-foundation.org> In-Reply-To: From: Zhaoyang Huang Date: Tue, 21 May 2024 08:58:56 +0800 Message-ID: Subject: Re: [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock during migration To: Marcin Wanat Cc: Dave Chinner , Andrew Morton , "zhaoyang.huang" , Alex Shi , "Kirill A . Shutemov" , Hugh Dickins , Baolin Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org, steve.kang@unisoc.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5D34E140019 X-Stat-Signature: i39j4p6k64nedpuezhpk44qxut4m7ktk X-HE-Tag: 1716253150-71607 X-HE-Meta: U2FsdGVkX1/NCnpyuj4mzGp5u9Zouh/TsX8Mw3Xxuy+PFB/UPd4EM7NCDAwglrxRjl1vo6N8VqDOSYGFrK+pAuWBTQLZXxw51Td6MJBJ8oUQefhJnLabL/6GckZnd8n5F8KtzpHnUkurXscYaH+VHxa94jxSgdfeVGk0govPz/MwA+dTv5yjbhxiKAGOKMFY+lu2FC8RZTZ+oRSqnhjZuXrnP8EsoJFTTDr06M/ae+b4wMoq8/CKhwTUK/B14cvKYXPh7Ca41bNohAbTXpKc/arjaiTV04eCpsDLMiUclO3eTMCPiRf1MLuZeGeFVDs7z+ojCZVmvBFyicm04a6Nz4WzUMYg0NZ8bXg/0bDxozGvZS617WHXeH9HRobQLzyPWB6Ry1A8X7pef56TpicDsJ96wY5yfnTHYTHcP6OOr0PT/nWtZHYnMJC9e+Ifnnl457lUzYh5F1kUkiD7vgVoCDZ/Ig3lWhYc5G1aHnfwpEBHJ32zo6C+6vNkusg/9piLoJ0owmAl5JGoTUeg6fCzVJzv7r5JmtkF7wDX9AhrvcxDlX96nYBLVezjJLukvbxQUX7rehFcfuiGkPELwgxP/27+hjHcqM0dGGEKVVh/WTpkUoKFW6UB8PdMko2K9ShNKhmf3cXda8qq7AyVjPBgK55d0k2d0gfTCCPXrAccrfFAISzYHODJRaIriBhXqqbpUuBJZ/eAWNDsNYfBwejWSx4q7wOnPbI3e1ftBiEFf7AjNeRibQIpXkMGK6LCseQT6zLNXB3JRsuslfrTSiMStKB7sih8HA36BK41T2nRcP7TZFaaEKUHjYg0YxUn8S/QH75v2v2YJlT0J103nr9cGQuT945LiTL5UAfAeqwgIcP8gJ7SPWNjRwxToPgi+YS+OyuotaukwoH7oKeu2DeYZ1ncwmTrJtoeBFR+ntVKT8qPnMsfctBQmtIRdK7k8/tFXn86z/0wGAqMeuPYwvM xLw0oZgS Avvd/OZysj2fyaS86IRGOMR61I4m/LHW/8XM87gbRrCS3ybENzU2wgfGZqAGJEq9dpSqvd6Qf6hXedm1rjnkV5B12YkXwM8I7cK7Ihq3RK3jGTTSRYyhxUJrtkl4vr9nNEfcickfDkvG3NK/+Erm4fJpQhGi4Eym+prYpkd4LF9ONdZe+04ku7rKBLCSJZmS/2PVoX1qoxSEutxSd0k8VsQRa7y4lSR5rjyYsdnPujdYh3lbcapgNw+nxoB71BM7lCWhbQTyizaWdH+tzdUyraMYx3n1boB3cEN4StN31vy6SLw7w14y+fRvpImo9vp6F6RDM5zmIU3y66EOA7KQeVt2pHeBdsL1ECAAUuk99P0T5BJL+9wdhnOXKUe/F3QYhPzOnjI6sbyEKBf2+BtS0RHrQhF+t0s4fvO79ryRkSrLyfiPZKwR7TRMtKitrc8s5y8jPU1IctbB1u4pG/GnlCwOMaIaKMnfzFR9eQP4icxX0jMyKoXmMw9H5atRqqfbQkIoVX4/Paeog9Ms= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 21, 2024 at 3:42=E2=80=AFAM Marcin Wanat wrote: > > On 15.04.2024 03:50, Zhaoyang Huang wrote: > > On Mon, Apr 15, 2024 at 8:09=E2=80=AFAM Dave Chinner > wrote: >> >> On Sat, Apr 13, 2024 at 10:01:27AM +0800, Zhaoyang > Huang wrote: >>> loop Dave, since he has ever helped set up an > reproducer in >>> https://lore.kernel.org/linux- >>> > mm/20221101071721.GV2703033@dread.disaster.area/ @Dave Chinner , >>> I > would like to ask for your kindly help on if you can verify >>> this > patch on your environment if convenient. Thanks a lot. >> >> I don't > have the test environment from 18 months ago available any >> more. > Also, I haven't seen this problem since that specific test >> > environment tripped over the issue. Hence I don't have any way of >> > confirming that the problem is fixed, either, because first I'd >> have > to reproduce it... > Thanks for the information. I noticed that you > reported another soft > lockup which is related to xas_load since > NOV.2023. This patch is > supposed to be helpful for this. With regard > to the version timing, > this commit is actually a revert of narrow lru locking> > b6769834aac1d467fa1c71277d15688efcbb4d76 which is > merged before > v5.15. > > For saving your time, a brief description > below. IMO, b6769834aa > introduce a potential stall between freeze the > folio's refcnt and > store it back to 2, which have the > xas_load->folio_try_get_rcu loops > as livelock if it stalls the > lru_lock's holder. > > b6769834aa split_huge_page_to_list - > spin_lock(lru_lock) > xas_split(&xas, folio,order) > folio_refcnt_freeze(folio, 1 + > folio_nr_pages(folio0) + > spin_lock(lru_lock) xas_store(&xas, > offset++, head+i) > page_ref_add(head, 2) spin_unlock(lru_lock) > > Sorry in advance if the > above doesn't make sense, I am just a > developer who is also suffering > from this bug and trying to fix it > I am experiencing a similar error on dozens of hosts, with stack traces > that are all similar: > > [627163.727746] watchdog: BUG: soft lockup - CPU#77 stuck for 22s! > [file_get:953301] > [627163.727778] Modules linked in: xt_set ip_set_hash_net ip_set xt_CT > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat > nf_tables nfnetlink sr_mod cdrom rfkill vfat fat intel_rapl_msr > intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common > isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal > intel_powerclamp coretemp ipmi_ssif kvm_intel kvm irqbypass mlx5_ib rapl > iTCO_wdt intel_cstate intel_pmc_bxt ib_uverbs iTCO_vendor_support > dell_smbios dcdbas i2c_i801 intel_uncore uas ses mei_me ib_core > dell_wmi_descriptor wmi_bmof pcspkr enclosure lpc_ich usb_storage > i2c_smbus acpi_ipmi mei intel_pch_thermal ipmi_si ipmi_devintf > ipmi_msghandler acpi_power_meter joydev tcp_bbr fuse xfs libcrc32c raid1 > sd_mod sg mlx5_core crct10dif_pclmul crc32_pclmul crc32c_intel > polyval_clmulni mgag200 polyval_generic drm_kms_helper mlxfw > drm_shmem_helper ahci nvme mpt3sas tls libahci ghash_clmulni_intel > nvme_core psample drm igb t10_pi raid_class pci_hyperv_intf dca libata > scsi_transport_sas i2c_algo_bit wmi > [627163.727841] CPU: 77 PID: 953301 Comm: file_get Kdump: loaded > Tainted: G L 6.6.30.el9 #2 > [627163.727844] Hardware name: Dell Inc. PowerEdge R740xd/08D89F, BIOS > 2.21.2 02/19/2024 > [627163.727847] RIP: 0010:xas_descend+0x1b/0x70 > [627163.727857] Code: 57 10 48 89 07 48 c1 e8 20 48 89 57 08 c3 cc 0f b6 > 0e 48 8b 47 08 48 d3 e8 48 89 c1 83 e1 3f 89 c8 48 83 c0 04 48 8b 44 c6 > 08 <48> 89 77 18 48 89 c2 83 e2 03 48 83 fa 02 74 0a 88 4f 12 c3 48 83 > [627163.727859] RSP: 0018:ffffc90034a67978 EFLAGS: 00000206 > [627163.727861] RAX: ffff888e4f971242 RBX: ffffc90034a67a98 RCX: > 0000000000000020 > [627163.727863] RDX: 0000000000000002 RSI: ffff88a454546d80 RDI: > ffffc90034a67990 > [627163.727865] RBP: fffffffffffffffe R08: fffffffffffffffe R09: > 0000000000008820 > [627163.727867] R10: 0000000000008820 R11: 0000000000000000 R12: > ffffc90034a67a20 > [627163.727868] R13: ffffc90034a67a18 R14: ffffea00873e8000 R15: > ffffc90034a67a18 > [627163.727870] FS: 00007fc5e503b740(0000) GS:ffff88bfefd80000(0000) > knlGS:0000000000000000 > [627163.727871] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [627163.727873] CR2: 000000005fb87b6e CR3: 00000022875e8006 CR4: > 00000000007706e0 > [627163.727875] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [627163.727876] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [627163.727878] PKRU: 55555554 > [627163.727879] Call Trace: > [627163.727882] > [627163.727886] ? watchdog_timer_fn+0x22a/0x2a0 > [627163.727892] ? softlockup_fn+0x70/0x70 > [627163.727895] ? __hrtimer_run_queues+0x10f/0x2a0 > [627163.727903] ? hrtimer_interrupt+0x106/0x240 > [627163.727906] ? __sysvec_apic_timer_interrupt+0x68/0x170 > [627163.727913] ? sysvec_apic_timer_interrupt+0x9d/0xd0 > [627163.727917] > [627163.727918] > [627163.727920] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 > [627163.727927] ? xas_descend+0x1b/0x70 > [627163.727930] xas_load+0x2c/0x40 > [627163.727933] xas_find+0x161/0x1a0 > [627163.727937] find_get_entries+0x77/0x1d0 > [627163.727944] truncate_inode_pages_range+0x244/0x3f0 > [627163.727950] truncate_pagecache+0x44/0x60 > [627163.727955] xfs_setattr_size+0x168/0x490 [xfs] > [627163.728074] xfs_vn_setattr+0x78/0x140 [xfs] > [627163.728153] notify_change+0x34f/0x4f0 > [627163.728158] ? _raw_spin_lock+0x13/0x30 > [627163.728165] ? do_truncate+0x80/0xd0 > [627163.728169] do_truncate+0x80/0xd0 > [627163.728172] do_open+0x2ce/0x400 > [627163.728177] path_openat+0x10d/0x280 > [627163.728181] do_filp_open+0xb2/0x150 > [627163.728186] ? check_heap_object+0x34/0x190 > [627163.728189] ? __check_object_size.part.0+0x5a/0x130 > [627163.728194] do_sys_openat2+0x92/0xc0 > [627163.728197] __x64_sys_openat+0x53/0x90 > [627163.728200] do_syscall_64+0x35/0x80 > [627163.728206] entry_SYSCALL_64_after_hwframe+0x4b/0xb5 > [627163.728210] RIP: 0033:0x7fc5e493e7fb > [627163.728213] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 > 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f > 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14 25 > [627163.728215] RSP: 002b:00007ffdd4e300e0 EFLAGS: 00000246 ORIG_RAX: > 0000000000000101 > [627163.728218] RAX: ffffffffffffffda RBX: 00007ffdd4e30180 RCX: > 00007fc5e493e7fb > [627163.728220] RDX: 0000000000000241 RSI: 00007ffdd4e30180 RDI: > 00000000ffffff9c > [627163.728221] RBP: 00007ffdd4e30180 R08: 00007fc5e4600040 R09: > 0000000000000001 > [627163.728223] R10: 00000000000001b6 R11: 0000000000000246 R12: > 0000000000000241 > [627163.728224] R13: 0000000000000000 R14: 00007fc5e4662fa8 R15: > 0000000000000000 > [627163.728227] > > I have around 50 hosts handling high I/O (each with 20Gbps+ uplinks > and multiple NVMe drives), running RockyLinux 8/9. The stock RHEL > kernel 8/9 is NOT affected, and the long-term kernel 5.15.X is NOT affect= ed. > However, with long-term kernels 6.1.XX and 6.6.XX, > (tested at least 10 different versions), this lockup always appears > after 2-30 days, similar to the report in the original thread. > The more load (for example, copying a lot of local files while > serving 20Gbps traffic), the higher the chance that the bug will appear. > > I haven't been able to reproduce this during synthetic tests, > but it always occurs in production on 6.1.X and 6.6.X within 2-30 days. > If anyone can provide a patch, I can test it on multiple machines > over the next few days. Could you please try this one which could be applied on 6.6 directly. Thank= you! > > Regards, > Marcin