From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4ECEC19F38 for ; Thu, 27 Feb 2025 07:04:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4DE306B0089; Thu, 27 Feb 2025 02:04:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 48D976B008A; Thu, 27 Feb 2025 02:04:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 356236B008C; Thu, 27 Feb 2025 02:04:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 18CDE6B0089 for ; Thu, 27 Feb 2025 02:04:44 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9798080660 for ; Thu, 27 Feb 2025 07:04:43 +0000 (UTC) X-FDA: 83164836846.27.9196CA4 Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) by imf30.hostedemail.com (Postfix) with ESMTP id 209BE8000E for ; Thu, 27 Feb 2025 07:04:38 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf30.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=liushixin2@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740639881; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zpjHXSlgyzAQG/2vlkLYRJMZiVxJ/BhoyoM2dCaihBA=; b=oJ4VV5AKFmSl3kisuIuWUJwxCJWOBvSMILVfghSw1A4u/YXLdz7QrcNhnPhRm/M9zknoU2 9NswB/HREp/CTxKBnEc86dKjptmydEwAYdBmemO9CyFQqcGwnUiqK66nMXzyZNcWWg3238 uL7eih+qJWrgdoDY/ktBL2giwb9f8/c= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf30.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=liushixin2@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740639881; a=rsa-sha256; cv=none; b=CWiKUMEdtisS5cnN2xCqeXV9aZ07n8OeozSWd7aNV2wwzlQGtUCweowNy3HRGj7lVtRfzO DN53F5DeDyiy7A3oTuaLLllIOWGpI42zTiGYlol2hRX1um0Nx0N/tBk1XnbXMcgYx6UpEa uiJPt7Z4WIuA3s0oHsr8IjZWubCquBM= Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4Z3Mg13vpHz1dyn1; Thu, 27 Feb 2025 15:00:29 +0800 (CST) Received: from kwepemg200013.china.huawei.com (unknown [7.202.181.64]) by mail.maildlp.com (Postfix) with ESMTPS id D8AA7140154; Thu, 27 Feb 2025 15:04:34 +0800 (CST) Received: from [10.174.179.24] (10.174.179.24) by kwepemg200013.china.huawei.com (7.202.181.64) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 27 Feb 2025 15:04:33 +0800 Subject: Re: Softlockup when test shmem swapout-swapin and compaction To: Baolin Wang , "linux-mm@kvack.org" , Linux Kernel Mailing List References: <28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com> CC: Barry Song , David Hildenbrand , Hugh Dickins , Kefeng Wang , Lance Yang , Matthew Wilcox , Ryan Roberts , Andrew Morton , Zi Yan From: Liu Shixin Message-ID: Date: Thu, 27 Feb 2025 15:04:33 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.179.24] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemg200013.china.huawei.com (7.202.181.64) X-Rspam-User: X-Stat-Signature: nxo1rs6ujq6sw1uhr6y7o45m7yczef6h X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 209BE8000E X-HE-Tag: 1740639878-105718 X-HE-Meta: U2FsdGVkX1//hvwa3EhJWHC0U0LgV53+m4zR7f8CGkm8IQdzI79elNPP57xIUbPa1k/wifejNxZlbRUmFUYC9wST+ECUf4k4R/8PSKixshj1FKU+o3tq8YL+8mhhM+ofLZiftRlKW2y3lhIThP8X85Uy/9uBa8AeLSZYWGaLqDKTv7YrNRGnR0Wxnpu/nxGOO1MxkHcgR6quTvaG4C8eXrOkY9VGScIUyLNoF0W+He74cHziawZJVw/u+ZeOj9mJh6zBEWs+UR3DTHOpo3FIbLcRoHQqAqjFBwTX049t+53i2HmhpfwJA9ptcafVCgQk7VBttQV14RjIMSRgPyDRalQbwHkuhddLxdS1wGxTFEHQkG/I5htjbNe8eA9y8iZOxeO0CtlACyUMl1wA/X5jTJ93KbOD6raqqPjCh2aWOA506UPxdKns+GR5NLBhKET2oEVCceJOiA2Uw5u8kr5ls5J3PnIvebQSsu0qPPHk3N0SF00nQ5wdHhpw+Sq2i5VLyfXVVKSl8MG1RHOTDSnn0rxo5ivSG3VIJ8l/0Uogsq/8nrW8QVW6ZB4Lz9oNXHIX22+CFPHc1rXzAyH4nb4wrAKTlzn6Z2yq5o4J42lFiPmsw7xJG6ebemmSBDDpkrOLE74TGtCyNArAwRCbgAeu24wT+n6BfABgFVxbgwTxg9jbTpL9/l+XT/kT+m4UR1BlbHyyM0YpnQRsWo6Sdsz3Phq2EdovwdmU9vYLwDIA6TiPMPedr0Q+Gfb60MmkZ5PVjD6Vib2RcScLIiGekGPKCQZa3ABtsspKbhlfkhr5MkxiC8lF+Ah7OaBHoBg1hixG5IGYqei0ezSG8+t7tfpOkpqXpZw7FxZHV5m1uZ3KhOs2hedFZOWc0qqDrTbupUfVWd93XQOElnPcze8BwPOyylkYu8StPUtGET1jxcuVMeDqW77Wc8xBR947IPGqWv2xMdrlS0aK7uHxS5UxlRd FWvM/Uae UjnZryTBeftvfsT/FWRBtYZnt21JtdJmk5sda80fFEi60/2O5oDOj/jDO+9K+kjpCuKrrM1/Ww/yAJvRCaIOlNufTVv44sZnzULsl0D7914MGjlE/yGfXYqtRBiJEmaELIpfzIC8wmIj4lDCpJpj9u9oz048xnqg8kR9WtW2vDXzq0WzlvfnvVX6O12ufTxlupPn94U81UqQZg0cjPSWrLaPTmQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/2/26 15:22, Baolin Wang wrote: > Add Zi. > > On 2025/2/26 15:03, Liu Shixin wrote: >> Hi all, >> >> I found a softlockup when testing shmem large folio swapout-swapin and compaction: >> >> watchdog: BUG: soft lockup - CPU#30 stuck for 179s! [folio_swap:4714] >> Modules linked in: zram xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter ip_tantel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel kvm rapl cixt4 mbcache jbd2 sr_mod cdrom ata_generic ata_piix virtio_net net_failover ghash_clmulni_intel libata sha512_ssse3 >> CPU: 30 UID: 0 PID: 4714 Comm: folio_swap Kdump: loaded Tainted: G L 6.14.0-rc4-next-20250225+ #2 >> Tainted: [L]=SOFTLOCKUP >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 >> RIP: 0010:xas_load+0x5d/0xc0 >> Code: 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d >> RSP: 0000:ffffadf142f1ba60 EFLAGS: 00000293 >> RAX: ffffe524cc4f6700 RBX: ffffadf142f1ba90 RCX: 0000000000000000 >> RDX: 0000000000000011 RSI: ffff9a3e058acb68 RDI: ffffadf142f1ba90 >> RBP: fffffffffffffffe R08: ffffadf142f1bb50 R09: 0000000000000392 >> R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000011 >> R13: ffffadf142f1bb48 R14: ffff9a3e04e9c588 R15: 0000000000000000 >> FS: 00007fd957666740(0000) GS:ffff9a41ac0e5000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007fd922860000 CR3: 000000025c360001 CR4: 0000000000772ef0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> PKRU: 55555554 >> Call Trace: >> >> ? watchdog_timer_fn+0x1c9/0x250 >> ? __pfx_watchdog_timer_fn+0x10/0x10 >> ? __hrtimer_run_queues+0x10e/0x250 >> ? hrtimer_interrupt+0xfb/0x240 >> ? __sysvec_apic_timer_interrupt+0x4e/0xe0 >> ? sysvec_apic_timer_interrupt+0x68/0x90 >> >> >> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 >> ? xas_load+0x5d/0xc0 >> xas_find+0x153/0x1a0 >> find_get_entries+0x73/0x280 >> shmem_undo_range+0x1fc/0x640 >> shmem_evict_inode+0x109/0x270 >> evict+0x107/0x240 >> ? fsnotify_destroy_marks+0x25/0x180 >> ? _atomic_dec_and_lock+0x35/0x50 >> __dentry_kill+0x71/0x190 >> dput+0xd1/0x190 >> __fput+0x128/0x2a0 >> task_work_run+0x57/0x90 >> syscall_exit_to_user_mode+0x1cb/0x1e0 >> do_syscall_64+0x67/0x170 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> RIP: 0033:0x7fd95776eb8b >> >> If CONFIG_DEBUG_VM is enabled, we will meet VM_BUG_ON_FOLIO(!folio_test_locked(folio)) in >> shmem_add_to_page_cache() too. It seems that the problem is related to memory migration or >> compaction which is necessary for reproduction, although without a clear why. >> >> To reproduce the problem, we need firstly a zram device as swap backend, and then run the >> reproduction program. The reproduction program consists of three parts: >> 1. A process constantly changes the status of shmem large folio by these interfaces: >> /sys/kernel/mm/transparent_hugepage/hugepages-/shmem_enabled >> 2. A process constantly echo 1 > /proc/sys/vm/compact_memory >> 3. A process constantly alloc/free/swapout/swapin shmem large folios. >> >> I'm not sure whether the first process is necessary but the second and third are. In addition, >> I tried hacking to modify compaction_alloc to return NULL, and the problem disappeared, >> so I guess the problem is in migration. >> >> The problem is different with https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/ >> since I have confirmed this porblem still existed after merge the fixed patch. > > Could you check if your version includes Zi's fix[1]? Not sure if it's related to the shmem large folio split. > > [1] https://lore.kernel.org/all/AF487A7A-F685-485D-8D74-756C843D6F0A@nvidia.com/ > . > Already include this patch when test.