From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE8D4C19F32 for ; Fri, 28 Feb 2025 03:39:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4613D280013; Thu, 27 Feb 2025 22:39:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 40FED280004; Thu, 27 Feb 2025 22:39:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 325EE280013; Thu, 27 Feb 2025 22:39:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 13967280004 for ; Thu, 27 Feb 2025 22:39:40 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C4942120864 for ; Fri, 28 Feb 2025 03:39:39 +0000 (UTC) X-FDA: 83167948878.13.36006DF Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf10.hostedemail.com (Postfix) with ESMTP id D954CC0005 for ; Fri, 28 Feb 2025 03:39:31 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=hN47Uifw; spf=pass (imf10.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740713978; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QVfILHvRitpyM5pVXiaXZIFJCp09+H8y0qF8KXX/yFs=; b=PVEzMrdC3+LliIOhTWuze8RzaPvPhzODeWi8wBa77/mBHnzX0J8GF9w7HvtUYcu5XQPdF5 jby96rkLMkS+ktiZ6jme4DVOKmnFdePFWQsWiwRJuQmiclRL4NW2lAEvHHlGAG3uKAHTEa 2xcBrOZv0lu2TgYfFWazfJJFyJeiU1I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740713978; a=rsa-sha256; cv=none; b=Gq94+R7SfmDneVv/Tf/C7S1OqUpyZ0p7abML6IK2AFW/Jf39i5ehOliLke43Zv9YldURrx bIWjDjZyI8YEmASOzH1G1SoCHRuPT4lKqJBX05CI0CvZF/hoakf2SCaM+GSXFP282FG/5X AuOvSd2sqbM2W+h0dFDstL0HyLo4tPA= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=hN47Uifw; spf=pass (imf10.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1740713967; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=QVfILHvRitpyM5pVXiaXZIFJCp09+H8y0qF8KXX/yFs=; b=hN47UifwfYYFe+Fpt6spo13Dof1r72d/40V5zHZvFL0WoP7W+xMzJ24eC1alwJy6+2TbXVqft8uszjSSAPprZAchYzxmpYuFe3LzOPjVLRumzkuxGjhbQSF0u39k4vv91lUvGE7YXTUEl6RTvj3RMn6U9jzhjf6io0d+D5w+1Ws= Received: from 30.221.80.187(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WQO30ym_1740713961 cluster:ay36) by smtp.aliyun-inc.com; Fri, 28 Feb 2025 11:39:22 +0800 Message-ID: <7367bb73-3358-4925-ac9d-e2b90904d15a@linux.alibaba.com> Date: Fri, 28 Feb 2025 11:39:21 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Softlockup when test shmem swapout-swapin and compaction To: Zi Yan , Liu Shixin Cc: linux-mm@kvack.org, Linux Kernel Mailing List , Barry Song , David Hildenbrand , Hugh Dickins , Kefeng Wang , Lance Yang , Matthew Wilcox , Ryan Roberts , Andrew Morton References: <28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com> <696E1819-D7E3-42BD-B3F0-8B3AC67A8ADB@nvidia.com> From: Baolin Wang In-Reply-To: <696E1819-D7E3-42BD-B3F0-8B3AC67A8ADB@nvidia.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: bg7fii6oga6e5qbpjg566ri3yhjdk7ah X-Rspamd-Queue-Id: D954CC0005 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740713971-769451 X-HE-Meta: U2FsdGVkX19+JerJ+LRq6ThbBrd+0kcPnBWUPt++PN2sN4gfGYEGWfClC0AnSkhAdKetP+MVi2W+NX3WEq926mbItRIESOTuWBcM+jyJwfgUe4zz5YFD2PyX017DhO8gQeW0Ej9ppyZllbszqe5lNMwHNXmm8m1pRLeVGoGkSizDWPDYcnI/K/9sDkDVrv7LTxItRBokNCc/MKqERG89QE3o0SOfGYZu1OJGx6lMoepV5eMvWY7Kx2QNxkyClpLpgKdsTzLPQUGJfVgaBDxkbnYM88t/PiUgJ0A4jh5LB7qdHHxjRZwY6z5EGIssLw1wN/CD9QXLsyrl87YUrkjsONzieJZ7v2ZK2m0FXtNu/KMMw8uteC4dDJ7wT64uk/hvuHHVkBppjCe/ASMsOanSSisuPLvpYg+B+ArJPicLFOImfZfoqjc6D51RoohnWpgZwEcpK2gZ977KIQOiLYLKZi3jelH0Nyd+2SGBV6fimZc0A3KcZtf/MFGzHuNBskT+Sl9Ougb87wiznBpY0zxZlF1cAV9q/HXLWwlBmI1O9ehnLU0MOD3SM8v5uBSjrnNQRBBItKmPBJKhXyj2oso0c6G/KeJATTCrAww8P8m+IIsxKMUKaF5zS7JijGRWbAehJbT56voGaIptwPwnIKv5VTVm8FezEnXSCmRMpt5EsSMQOuYYxLZ/mpBk3F9Uysa8GOEHdPeUrcQvvZPMAOXAS6OBa2UkGDk/85lMkfdAjQOtBp9kz2q0RCp0TQkvS7yFiWCFdEJCidhwf7DaimveyR9IVSSJ8bTE5PJzSFZ6wob6tcCvjnDi0ohynchbTD/zEfNT1mObTRgY7KxAjAbTIUsCHp3ZRxPBWQx11Xy73xu4HgRKBTzyJp3y8Ul07gIKeLectrrMNY0JqQprkPWcV8MY8a9UZjgEXo8c0iDm39XCkpv7rwdhIubAZwslItAjbdHaSc465W/JuoDqdZO QB33p921 WfooQd7Sd0lg/0Cpy1ulkoya5LEniB65fYz9I7ekdvEZYB4vnjf0YdceBjgrUH/fQ9wOkfN0XH0VSsMRuynW2vXk0j6M5FvCgEea83DeGxYlV6l1+L07r78lUjiiyrq/3NMiIsSXghGmLEaxeXsu6Y0HUQQqlRW7K9TatiLE0ub9gyfVTxDPHyMCKz082DpMZQVTWc2XEsre4XkwPbxks5TqtAfgUnUGQwdiWWrC4YlNIA9LO6SKloov0tcXHhLNxtN3E80fOvyCcv4QcO9Vpab932QSPV9lYtTs/97uw5vF1/qETguEzB1Mc9DuZMJ6WSJ88CA1FoP/Ueso= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/2/28 07:43, Zi Yan wrote: > On 27 Feb 2025, at 2:04, Liu Shixin wrote: > >> On 2025/2/26 15:22, Baolin Wang wrote: >>> Add Zi. >>> >>> On 2025/2/26 15:03, Liu Shixin wrote: >>>> Hi all, >>>> >>>> I found a softlockup when testing shmem large folio swapout-swapin and compaction: >>>> >>>> watchdog: BUG: soft lockup - CPU#30 stuck for 179s! [folio_swap:4714] >>>> Modules linked in: zram xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter ip_tantel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel kvm rapl cixt4 mbcache jbd2 sr_mod cdrom ata_generic ata_piix virtio_net net_failover ghash_clmulni_intel libata sha512_ssse3 >>>> CPU: 30 UID: 0 PID: 4714 Comm: folio_swap Kdump: loaded Tainted: G L 6.14.0-rc4-next-20250225+ #2 >>>> Tainted: [L]=SOFTLOCKUP >>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 >>>> RIP: 0010:xas_load+0x5d/0xc0 >>>> Code: 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d >>>> RSP: 0000:ffffadf142f1ba60 EFLAGS: 00000293 >>>> RAX: ffffe524cc4f6700 RBX: ffffadf142f1ba90 RCX: 0000000000000000 >>>> RDX: 0000000000000011 RSI: ffff9a3e058acb68 RDI: ffffadf142f1ba90 >>>> RBP: fffffffffffffffe R08: ffffadf142f1bb50 R09: 0000000000000392 >>>> R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000011 >>>> R13: ffffadf142f1bb48 R14: ffff9a3e04e9c588 R15: 0000000000000000 >>>> FS: 00007fd957666740(0000) GS:ffff9a41ac0e5000(0000) knlGS:0000000000000000 >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> CR2: 00007fd922860000 CR3: 000000025c360001 CR4: 0000000000772ef0 >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> PKRU: 55555554 >>>> Call Trace: >>>> >>>> ? watchdog_timer_fn+0x1c9/0x250 >>>> ? __pfx_watchdog_timer_fn+0x10/0x10 >>>> ? __hrtimer_run_queues+0x10e/0x250 >>>> ? hrtimer_interrupt+0xfb/0x240 >>>> ? __sysvec_apic_timer_interrupt+0x4e/0xe0 >>>> ? sysvec_apic_timer_interrupt+0x68/0x90 >>>> >>>> >>>> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 >>>> ? xas_load+0x5d/0xc0 >>>> xas_find+0x153/0x1a0 >>>> find_get_entries+0x73/0x280 >>>> shmem_undo_range+0x1fc/0x640 >>>> shmem_evict_inode+0x109/0x270 >>>> evict+0x107/0x240 >>>> ? fsnotify_destroy_marks+0x25/0x180 >>>> ? _atomic_dec_and_lock+0x35/0x50 >>>> __dentry_kill+0x71/0x190 >>>> dput+0xd1/0x190 >>>> __fput+0x128/0x2a0 >>>> task_work_run+0x57/0x90 >>>> syscall_exit_to_user_mode+0x1cb/0x1e0 >>>> do_syscall_64+0x67/0x170 >>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e >>>> RIP: 0033:0x7fd95776eb8b >>>> >>>> If CONFIG_DEBUG_VM is enabled, we will meet VM_BUG_ON_FOLIO(!folio_test_locked(folio)) in >>>> shmem_add_to_page_cache() too. It seems that the problem is related to memory migration or >>>> compaction which is necessary for reproduction, although without a clear why. >>>> >>>> To reproduce the problem, we need firstly a zram device as swap backend, and then run the >>>> reproduction program. The reproduction program consists of three parts: >>>> 1. A process constantly changes the status of shmem large folio by these interfaces: >>>> /sys/kernel/mm/transparent_hugepage/hugepages-/shmem_enabled >>>> 2. A process constantly echo 1 > /proc/sys/vm/compact_memory >>>> 3. A process constantly alloc/free/swapout/swapin shmem large folios. >>>> >>>> I'm not sure whether the first process is necessary but the second and third are. In addition, >>>> I tried hacking to modify compaction_alloc to return NULL, and the problem disappeared, >>>> so I guess the problem is in migration. >>>> >>>> The problem is different with https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/ >>>> since I have confirmed this porblem still existed after merge the fixed patch. >>> >>> Could you check if your version includes Zi's fix[1]? Not sure if it's related to the shmem large folio split. >>> >>> [1] https://lore.kernel.org/all/AF487A7A-F685-485D-8D74-756C843D6F0A@nvidia.com/ >>> . >>> >> Already include this patch when test. > > Hi Shixin, > > Can you try the diff below? It fixed my local repro. > > The issue is that after Baolin’s patch, shmem folios now use high-order > entry, so the migration code should not update multiple xarray slots. It is not after my patches. After converting shmem to use folio, shmem mapping will store large order, but during swap, the shmem large folio will be split (whereas my patches allow shmem large folio swap without splitting). > Hi Baolin, > > Is your patch affecting anonymous swapping out? If yes, we can remove No. > the for loop of updating xarray in __folio_migrate_mapping(). I think the issue is introduced by commit fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly"), which did not handle shmem folio correctly. > diff --git a/mm/migrate.c b/mm/migrate.c > index 365c6daa8d1b..be77932596b3 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -44,6 +44,7 @@ > #include > #include > #include > +#include > > #include > > @@ -524,7 +525,11 @@ static int __folio_migrate_mapping(struct address_space *mapping, > folio_set_swapcache(newfolio); > newfolio->private = folio_get_private(folio); > } > - entries = nr; > + /* shmem now uses high-order entry */ > + if (folio->mapping && shmem_mapping(folio->mapping)) Nit: we've already checked the 'mapping', and we can simplify it to 'shmem_mapping(mapping)'. > + entries = 1; > + else > + entries = nr; > } else { > VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); > entries = 1; Good catch. The fix look good to me. Thanks.