Softlockup when test shmem swapout-swapin and compaction

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Softlockup when test shmem swapout-swapin and compaction
@ 2025-02-26  7:03 Liu Shixin
  2025-02-26  7:22 ` Baolin Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Liu Shixin @ 2025-02-26  7:03 UTC (permalink / raw)
  To: Baolin Wang, linux-mm, Linux Kernel Mailing List
  Cc: Barry Song, David Hildenbrand, Hugh Dickins, Kefeng Wang,
	Lance Yang, Matthew Wilcox, Ryan Roberts, Andrew Morton

Hi all,

I found a softlockup when testing shmem large folio swapout-swapin and compaction:

 watchdog: BUG: soft lockup - CPU#30 stuck for 179s! [folio_swap:4714]
 Modules linked in: zram xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter ip_tantel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel kvm rapl cixt4 mbcache jbd2 sr_mod cdrom ata_generic ata_piix virtio_net net_failover ghash_clmulni_intel libata sha512_ssse3
 CPU: 30 UID: 0 PID: 4714 Comm: folio_swap Kdump: loaded Tainted: G             L     6.14.0-rc4-next-20250225+ #2
 Tainted: [L]=SOFTLOCKUP
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
 RIP: 0010:xas_load+0x5d/0xc0
 Code: 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d
 RSP: 0000:ffffadf142f1ba60 EFLAGS: 00000293
 RAX: ffffe524cc4f6700 RBX: ffffadf142f1ba90 RCX: 0000000000000000
 RDX: 0000000000000011 RSI: ffff9a3e058acb68 RDI: ffffadf142f1ba90
 RBP: fffffffffffffffe R08: ffffadf142f1bb50 R09: 0000000000000392
 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000011
 R13: ffffadf142f1bb48 R14: ffff9a3e04e9c588 R15: 0000000000000000
 FS:  00007fd957666740(0000) GS:ffff9a41ac0e5000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fd922860000 CR3: 000000025c360001 CR4: 0000000000772ef0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <IRQ>
  ? watchdog_timer_fn+0x1c9/0x250
  ? __pfx_watchdog_timer_fn+0x10/0x10
  ? __hrtimer_run_queues+0x10e/0x250
  ? hrtimer_interrupt+0xfb/0x240
  ? __sysvec_apic_timer_interrupt+0x4e/0xe0
  ? sysvec_apic_timer_interrupt+0x68/0x90
  </IRQ>
  <TASK>
  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
  ? xas_load+0x5d/0xc0
  xas_find+0x153/0x1a0
  find_get_entries+0x73/0x280
  shmem_undo_range+0x1fc/0x640
  shmem_evict_inode+0x109/0x270
  evict+0x107/0x240
  ? fsnotify_destroy_marks+0x25/0x180
  ? _atomic_dec_and_lock+0x35/0x50
  __dentry_kill+0x71/0x190
  dput+0xd1/0x190
  __fput+0x128/0x2a0
  task_work_run+0x57/0x90
  syscall_exit_to_user_mode+0x1cb/0x1e0
  do_syscall_64+0x67/0x170
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7fd95776eb8b

If CONFIG_DEBUG_VM is enabled, we will meet VM_BUG_ON_FOLIO(!folio_test_locked(folio)) in
shmem_add_to_page_cache() too.  It seems that the problem is related to memory migration or
compaction which is necessary for reproduction,  although without a clear why.

To reproduce the problem, we need firstly a zram device as swap backend, and then run the
reproduction program. The reproduction program consists of three parts:
 1. A process constantly changes the status of shmem large folio by these interfaces:
        /sys/kernel/mm/transparent_hugepage/hugepages-<size>/shmem_enabled
 2. A process constantly echo 1 > /proc/sys/vm/compact_memory
 3. A process constantly alloc/free/swapout/swapin shmem large folios.

I'm not sure whether the first process is necessary but the second and third are. In addition,
I tried hacking to modify compaction_alloc to return NULL, and the problem disappeared,
so I guess the problem is in migration.

The problem is different with https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
since I have confirmed this porblem still existed after merge the fixed patch.

Thanks,
Liu Shixin.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Softlockup when test shmem swapout-swapin and compaction
  2025-02-26  7:03 Softlockup when test shmem swapout-swapin and compaction Liu Shixin
@ 2025-02-26  7:22 ` Baolin Wang
  2025-02-27  7:04   ` Liu Shixin
  2025-03-05 16:27   ` Zi Yan
  0 siblings, 2 replies; 10+ messages in thread
From: Baolin Wang @ 2025-02-26  7:22 UTC (permalink / raw)
  To: Liu Shixin, linux-mm, Linux Kernel Mailing List
  Cc: Barry Song, David Hildenbrand, Hugh Dickins, Kefeng Wang,
	Lance Yang, Matthew Wilcox, Ryan Roberts, Andrew Morton, Zi Yan

Add Zi.

On 2025/2/26 15:03, Liu Shixin wrote:
> Hi all,
> 
> I found a softlockup when testing shmem large folio swapout-swapin and compaction:
> 
>   watchdog: BUG: soft lockup - CPU#30 stuck for 179s! [folio_swap:4714]
>   Modules linked in: zram xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter ip_tantel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel kvm rapl cixt4 mbcache jbd2 sr_mod cdrom ata_generic ata_piix virtio_net net_failover ghash_clmulni_intel libata sha512_ssse3
>   CPU: 30 UID: 0 PID: 4714 Comm: folio_swap Kdump: loaded Tainted: G             L     6.14.0-rc4-next-20250225+ #2
>   Tainted: [L]=SOFTLOCKUP
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
>   RIP: 0010:xas_load+0x5d/0xc0
>   Code: 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d
>   RSP: 0000:ffffadf142f1ba60 EFLAGS: 00000293
>   RAX: ffffe524cc4f6700 RBX: ffffadf142f1ba90 RCX: 0000000000000000
>   RDX: 0000000000000011 RSI: ffff9a3e058acb68 RDI: ffffadf142f1ba90
>   RBP: fffffffffffffffe R08: ffffadf142f1bb50 R09: 0000000000000392
>   R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000011
>   R13: ffffadf142f1bb48 R14: ffff9a3e04e9c588 R15: 0000000000000000
>   FS:  00007fd957666740(0000) GS:ffff9a41ac0e5000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007fd922860000 CR3: 000000025c360001 CR4: 0000000000772ef0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   PKRU: 55555554
>   Call Trace:
>    <IRQ>
>    ? watchdog_timer_fn+0x1c9/0x250
>    ? __pfx_watchdog_timer_fn+0x10/0x10
>    ? __hrtimer_run_queues+0x10e/0x250
>    ? hrtimer_interrupt+0xfb/0x240
>    ? __sysvec_apic_timer_interrupt+0x4e/0xe0
>    ? sysvec_apic_timer_interrupt+0x68/0x90
>    </IRQ>
>    <TASK>
>    ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>    ? xas_load+0x5d/0xc0
>    xas_find+0x153/0x1a0
>    find_get_entries+0x73/0x280
>    shmem_undo_range+0x1fc/0x640
>    shmem_evict_inode+0x109/0x270
>    evict+0x107/0x240
>    ? fsnotify_destroy_marks+0x25/0x180
>    ? _atomic_dec_and_lock+0x35/0x50
>    __dentry_kill+0x71/0x190
>    dput+0xd1/0x190
>    __fput+0x128/0x2a0
>    task_work_run+0x57/0x90
>    syscall_exit_to_user_mode+0x1cb/0x1e0
>    do_syscall_64+0x67/0x170
>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>   RIP: 0033:0x7fd95776eb8b
> 
> If CONFIG_DEBUG_VM is enabled, we will meet VM_BUG_ON_FOLIO(!folio_test_locked(folio)) in
> shmem_add_to_page_cache() too.  It seems that the problem is related to memory migration or
> compaction which is necessary for reproduction,  although without a clear why.
> 
> To reproduce the problem, we need firstly a zram device as swap backend, and then run the
> reproduction program. The reproduction program consists of three parts:
>   1. A process constantly changes the status of shmem large folio by these interfaces:
>          /sys/kernel/mm/transparent_hugepage/hugepages-<size>/shmem_enabled
>   2. A process constantly echo 1 > /proc/sys/vm/compact_memory
>   3. A process constantly alloc/free/swapout/swapin shmem large folios.
> 
> I'm not sure whether the first process is necessary but the second and third are. In addition,
> I tried hacking to modify compaction_alloc to return NULL, and the problem disappeared,
> so I guess the problem is in migration.
> 
> The problem is different with https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
> since I have confirmed this porblem still existed after merge the fixed patch.

Could you check if your version includes Zi's fix[1]? Not sure if it's 
related to the shmem large folio split.

[1] 
https://lore.kernel.org/all/AF487A7A-F685-485D-8D74-756C843D6F0A@nvidia.com/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Softlockup when test shmem swapout-swapin and compaction
  2025-02-26  7:22 ` Baolin Wang
@ 2025-02-27  7:04   ` Liu Shixin
  2025-02-27 23:43     ` Zi Yan
  2025-03-05 16:27   ` Zi Yan
  1 sibling, 1 reply; 10+ messages in thread
From: Liu Shixin @ 2025-02-27  7:04 UTC (permalink / raw)
  To: Baolin Wang, linux-mm, Linux Kernel Mailing List
  Cc: Barry Song, David Hildenbrand, Hugh Dickins, Kefeng Wang,
	Lance Yang, Matthew Wilcox, Ryan Roberts, Andrew Morton, Zi Yan



On 2025/2/26 15:22, Baolin Wang wrote:
> Add Zi.
>
> On 2025/2/26 15:03, Liu Shixin wrote:
>> Hi all,
>>
>> I found a softlockup when testing shmem large folio swapout-swapin and compaction:
>>
>>   watchdog: BUG: soft lockup - CPU#30 stuck for 179s! [folio_swap:4714]
>>   Modules linked in: zram xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter ip_tantel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel kvm rapl cixt4 mbcache jbd2 sr_mod cdrom ata_generic ata_piix virtio_net net_failover ghash_clmulni_intel libata sha512_ssse3
>>   CPU: 30 UID: 0 PID: 4714 Comm: folio_swap Kdump: loaded Tainted: G             L     6.14.0-rc4-next-20250225+ #2
>>   Tainted: [L]=SOFTLOCKUP
>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
>>   RIP: 0010:xas_load+0x5d/0xc0
>>   Code: 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d
>>   RSP: 0000:ffffadf142f1ba60 EFLAGS: 00000293
>>   RAX: ffffe524cc4f6700 RBX: ffffadf142f1ba90 RCX: 0000000000000000
>>   RDX: 0000000000000011 RSI: ffff9a3e058acb68 RDI: ffffadf142f1ba90
>>   RBP: fffffffffffffffe R08: ffffadf142f1bb50 R09: 0000000000000392
>>   R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000011
>>   R13: ffffadf142f1bb48 R14: ffff9a3e04e9c588 R15: 0000000000000000
>>   FS:  00007fd957666740(0000) GS:ffff9a41ac0e5000(0000) knlGS:0000000000000000
>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   CR2: 00007fd922860000 CR3: 000000025c360001 CR4: 0000000000772ef0
>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>   PKRU: 55555554
>>   Call Trace:
>>    <IRQ>
>>    ? watchdog_timer_fn+0x1c9/0x250
>>    ? __pfx_watchdog_timer_fn+0x10/0x10
>>    ? __hrtimer_run_queues+0x10e/0x250
>>    ? hrtimer_interrupt+0xfb/0x240
>>    ? __sysvec_apic_timer_interrupt+0x4e/0xe0
>>    ? sysvec_apic_timer_interrupt+0x68/0x90
>>    </IRQ>
>>    <TASK>
>>    ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>>    ? xas_load+0x5d/0xc0
>>    xas_find+0x153/0x1a0
>>    find_get_entries+0x73/0x280
>>    shmem_undo_range+0x1fc/0x640
>>    shmem_evict_inode+0x109/0x270
>>    evict+0x107/0x240
>>    ? fsnotify_destroy_marks+0x25/0x180
>>    ? _atomic_dec_and_lock+0x35/0x50
>>    __dentry_kill+0x71/0x190
>>    dput+0xd1/0x190
>>    __fput+0x128/0x2a0
>>    task_work_run+0x57/0x90
>>    syscall_exit_to_user_mode+0x1cb/0x1e0
>>    do_syscall_64+0x67/0x170
>>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>   RIP: 0033:0x7fd95776eb8b
>>
>> If CONFIG_DEBUG_VM is enabled, we will meet VM_BUG_ON_FOLIO(!folio_test_locked(folio)) in
>> shmem_add_to_page_cache() too.  It seems that the problem is related to memory migration or
>> compaction which is necessary for reproduction,  although without a clear why.
>>
>> To reproduce the problem, we need firstly a zram device as swap backend, and then run the
>> reproduction program. The reproduction program consists of three parts:
>>   1. A process constantly changes the status of shmem large folio by these interfaces:
>>          /sys/kernel/mm/transparent_hugepage/hugepages-<size>/shmem_enabled
>>   2. A process constantly echo 1 > /proc/sys/vm/compact_memory
>>   3. A process constantly alloc/free/swapout/swapin shmem large folios.
>>
>> I'm not sure whether the first process is necessary but the second and third are. In addition,
>> I tried hacking to modify compaction_alloc to return NULL, and the problem disappeared,
>> so I guess the problem is in migration.
>>
>> The problem is different with https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
>> since I have confirmed this porblem still existed after merge the fixed patch.
>
> Could you check if your version includes Zi's fix[1]? Not sure if it's related to the shmem large folio split.
>
> [1] https://lore.kernel.org/all/AF487A7A-F685-485D-8D74-756C843D6F0A@nvidia.com/
> .
>
Already include this patch when test.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Softlockup when test shmem swapout-swapin and compaction
  2025-02-27  7:04   ` Liu Shixin
@ 2025-02-27 23:43     ` Zi Yan
  2025-02-28  1:52       ` Liu Shixin
  2025-02-28  3:39       ` Baolin Wang
  0 siblings, 2 replies; 10+ messages in thread
From: Zi Yan @ 2025-02-27 23:43 UTC (permalink / raw)
  To: Liu Shixin
  Cc: Baolin Wang, linux-mm, Linux Kernel Mailing List, Barry Song,
	David Hildenbrand, Hugh Dickins, Kefeng Wang, Lance Yang,
	Matthew Wilcox, Ryan Roberts, Andrew Morton

On 27 Feb 2025, at 2:04, Liu Shixin wrote:

> On 2025/2/26 15:22, Baolin Wang wrote:
>> Add Zi.
>>
>> On 2025/2/26 15:03, Liu Shixin wrote:
>>> Hi all,
>>>
>>> I found a softlockup when testing shmem large folio swapout-swapin and compaction:
>>>
>>>   watchdog: BUG: soft lockup - CPU#30 stuck for 179s! [folio_swap:4714]
>>>   Modules linked in: zram xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter ip_tantel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel kvm rapl cixt4 mbcache jbd2 sr_mod cdrom ata_generic ata_piix virtio_net net_failover ghash_clmulni_intel libata sha512_ssse3
>>>   CPU: 30 UID: 0 PID: 4714 Comm: folio_swap Kdump: loaded Tainted: G             L     6.14.0-rc4-next-20250225+ #2
>>>   Tainted: [L]=SOFTLOCKUP
>>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
>>>   RIP: 0010:xas_load+0x5d/0xc0
>>>   Code: 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d
>>>   RSP: 0000:ffffadf142f1ba60 EFLAGS: 00000293
>>>   RAX: ffffe524cc4f6700 RBX: ffffadf142f1ba90 RCX: 0000000000000000
>>>   RDX: 0000000000000011 RSI: ffff9a3e058acb68 RDI: ffffadf142f1ba90
>>>   RBP: fffffffffffffffe R08: ffffadf142f1bb50 R09: 0000000000000392
>>>   R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000011
>>>   R13: ffffadf142f1bb48 R14: ffff9a3e04e9c588 R15: 0000000000000000
>>>   FS:  00007fd957666740(0000) GS:ffff9a41ac0e5000(0000) knlGS:0000000000000000
>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>   CR2: 00007fd922860000 CR3: 000000025c360001 CR4: 0000000000772ef0
>>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>   PKRU: 55555554
>>>   Call Trace:
>>>    <IRQ>
>>>    ? watchdog_timer_fn+0x1c9/0x250
>>>    ? __pfx_watchdog_timer_fn+0x10/0x10
>>>    ? __hrtimer_run_queues+0x10e/0x250
>>>    ? hrtimer_interrupt+0xfb/0x240
>>>    ? __sysvec_apic_timer_interrupt+0x4e/0xe0
>>>    ? sysvec_apic_timer_interrupt+0x68/0x90
>>>    </IRQ>
>>>    <TASK>
>>>    ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>>>    ? xas_load+0x5d/0xc0
>>>    xas_find+0x153/0x1a0
>>>    find_get_entries+0x73/0x280
>>>    shmem_undo_range+0x1fc/0x640
>>>    shmem_evict_inode+0x109/0x270
>>>    evict+0x107/0x240
>>>    ? fsnotify_destroy_marks+0x25/0x180
>>>    ? _atomic_dec_and_lock+0x35/0x50
>>>    __dentry_kill+0x71/0x190
>>>    dput+0xd1/0x190
>>>    __fput+0x128/0x2a0
>>>    task_work_run+0x57/0x90
>>>    syscall_exit_to_user_mode+0x1cb/0x1e0
>>>    do_syscall_64+0x67/0x170
>>>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>   RIP: 0033:0x7fd95776eb8b
>>>
>>> If CONFIG_DEBUG_VM is enabled, we will meet VM_BUG_ON_FOLIO(!folio_test_locked(folio)) in
>>> shmem_add_to_page_cache() too.  It seems that the problem is related to memory migration or
>>> compaction which is necessary for reproduction,  although without a clear why.
>>>
>>> To reproduce the problem, we need firstly a zram device as swap backend, and then run the
>>> reproduction program. The reproduction program consists of three parts:
>>>   1. A process constantly changes the status of shmem large folio by these interfaces:
>>>          /sys/kernel/mm/transparent_hugepage/hugepages-<size>/shmem_enabled
>>>   2. A process constantly echo 1 > /proc/sys/vm/compact_memory
>>>   3. A process constantly alloc/free/swapout/swapin shmem large folios.
>>>
>>> I'm not sure whether the first process is necessary but the second and third are. In addition,
>>> I tried hacking to modify compaction_alloc to return NULL, and the problem disappeared,
>>> so I guess the problem is in migration.
>>>
>>> The problem is different with https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
>>> since I have confirmed this porblem still existed after merge the fixed patch.
>>
>> Could you check if your version includes Zi's fix[1]? Not sure if it's related to the shmem large folio split.
>>
>> [1] https://lore.kernel.org/all/AF487A7A-F685-485D-8D74-756C843D6F0A@nvidia.com/
>> .
>>
> Already include this patch when test.

Hi Shixin,

Can you try the diff below? It fixed my local repro.

The issue is that after Baolin’s patch, shmem folios now use high-order
entry, so the migration code should not update multiple xarray slots.

Hi Baolin,

Is your patch affecting anonymous swapping out? If yes, we can remove
the for loop of updating xarray in __folio_migrate_mapping().

diff --git a/mm/migrate.c b/mm/migrate.c
index 365c6daa8d1b..be77932596b3 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -44,6 +44,7 @@
 #include <linux/sched/sysctl.h>
 #include <linux/memory-tiers.h>
 #include <linux/pagewalk.h>
+#include <linux/shmem_fs.h>

 #include <asm/tlbflush.h>

@@ -524,7 +525,11 @@ static int __folio_migrate_mapping(struct address_space *mapping,
 			folio_set_swapcache(newfolio);
 			newfolio->private = folio_get_private(folio);
 		}
-		entries = nr;
+		/* shmem now uses high-order entry */
+		if (folio->mapping && shmem_mapping(folio->mapping))
+			entries = 1;
+		else
+			entries = nr;
 	} else {
 		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
 		entries = 1;


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Softlockup when test shmem swapout-swapin and compaction
  2025-02-27 23:43     ` Zi Yan
@ 2025-02-28  1:52       ` Liu Shixin
  2025-02-28  3:39       ` Baolin Wang
  1 sibling, 0 replies; 10+ messages in thread
From: Liu Shixin @ 2025-02-28  1:52 UTC (permalink / raw)
  To: Zi Yan
  Cc: Baolin Wang, linux-mm, Linux Kernel Mailing List, Barry Song,
	David Hildenbrand, Hugh Dickins, Kefeng Wang, Lance Yang,
	Matthew Wilcox, Ryan Roberts, Andrew Morton



On 2025/2/28 7:43, Zi Yan wrote:
> On 27 Feb 2025, at 2:04, Liu Shixin wrote:
>
>> On 2025/2/26 15:22, Baolin Wang wrote:
>>> Add Zi.
>>>
>>> On 2025/2/26 15:03, Liu Shixin wrote:
>>>> Hi all,
>>>>
>>>> I found a softlockup when testing shmem large folio swapout-swapin and compaction:
>>>>
>>>>   watchdog: BUG: soft lockup - CPU#30 stuck for 179s! [folio_swap:4714]
>>>>   Modules linked in: zram xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter ip_tantel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel kvm rapl cixt4 mbcache jbd2 sr_mod cdrom ata_generic ata_piix virtio_net net_failover ghash_clmulni_intel libata sha512_ssse3
>>>>   CPU: 30 UID: 0 PID: 4714 Comm: folio_swap Kdump: loaded Tainted: G             L     6.14.0-rc4-next-20250225+ #2
>>>>   Tainted: [L]=SOFTLOCKUP
>>>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
>>>>   RIP: 0010:xas_load+0x5d/0xc0
>>>>   Code: 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d
>>>>   RSP: 0000:ffffadf142f1ba60 EFLAGS: 00000293
>>>>   RAX: ffffe524cc4f6700 RBX: ffffadf142f1ba90 RCX: 0000000000000000
>>>>   RDX: 0000000000000011 RSI: ffff9a3e058acb68 RDI: ffffadf142f1ba90
>>>>   RBP: fffffffffffffffe R08: ffffadf142f1bb50 R09: 0000000000000392
>>>>   R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000011
>>>>   R13: ffffadf142f1bb48 R14: ffff9a3e04e9c588 R15: 0000000000000000
>>>>   FS:  00007fd957666740(0000) GS:ffff9a41ac0e5000(0000) knlGS:0000000000000000
>>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>   CR2: 00007fd922860000 CR3: 000000025c360001 CR4: 0000000000772ef0
>>>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>   PKRU: 55555554
>>>>   Call Trace:
>>>>    <IRQ>
>>>>    ? watchdog_timer_fn+0x1c9/0x250
>>>>    ? __pfx_watchdog_timer_fn+0x10/0x10
>>>>    ? __hrtimer_run_queues+0x10e/0x250
>>>>    ? hrtimer_interrupt+0xfb/0x240
>>>>    ? __sysvec_apic_timer_interrupt+0x4e/0xe0
>>>>    ? sysvec_apic_timer_interrupt+0x68/0x90
>>>>    </IRQ>
>>>>    <TASK>
>>>>    ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>>>>    ? xas_load+0x5d/0xc0
>>>>    xas_find+0x153/0x1a0
>>>>    find_get_entries+0x73/0x280
>>>>    shmem_undo_range+0x1fc/0x640
>>>>    shmem_evict_inode+0x109/0x270
>>>>    evict+0x107/0x240
>>>>    ? fsnotify_destroy_marks+0x25/0x180
>>>>    ? _atomic_dec_and_lock+0x35/0x50
>>>>    __dentry_kill+0x71/0x190
>>>>    dput+0xd1/0x190
>>>>    __fput+0x128/0x2a0
>>>>    task_work_run+0x57/0x90
>>>>    syscall_exit_to_user_mode+0x1cb/0x1e0
>>>>    do_syscall_64+0x67/0x170
>>>>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>   RIP: 0033:0x7fd95776eb8b
>>>>
>>>> If CONFIG_DEBUG_VM is enabled, we will meet VM_BUG_ON_FOLIO(!folio_test_locked(folio)) in
>>>> shmem_add_to_page_cache() too.  It seems that the problem is related to memory migration or
>>>> compaction which is necessary for reproduction,  although without a clear why.
>>>>
>>>> To reproduce the problem, we need firstly a zram device as swap backend, and then run the
>>>> reproduction program. The reproduction program consists of three parts:
>>>>   1. A process constantly changes the status of shmem large folio by these interfaces:
>>>>          /sys/kernel/mm/transparent_hugepage/hugepages-<size>/shmem_enabled
>>>>   2. A process constantly echo 1 > /proc/sys/vm/compact_memory
>>>>   3. A process constantly alloc/free/swapout/swapin shmem large folios.
>>>>
>>>> I'm not sure whether the first process is necessary but the second and third are. In addition,
>>>> I tried hacking to modify compaction_alloc to return NULL, and the problem disappeared,
>>>> so I guess the problem is in migration.
>>>>
>>>> The problem is different with https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
>>>> since I have confirmed this porblem still existed after merge the fixed patch.
>>> Could you check if your version includes Zi's fix[1]? Not sure if it's related to the shmem large folio split.
>>>
>>> [1] https://lore.kernel.org/all/AF487A7A-F685-485D-8D74-756C843D6F0A@nvidia.com/
>>> .
>>>
>> Already include this patch when test.
> Hi Shixin,
>
> Can you try the diff below? It fixed my local repro.
>
> The issue is that after Baolin’s patch, shmem folios now use high-order
> entry, so the migration code should not update multiple xarray slots.
Looks reasonable. I'll try it, thanks.
>
> Hi Baolin,
>
> Is your patch affecting anonymous swapping out? If yes, we can remove
> the for loop of updating xarray in __folio_migrate_mapping().
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 365c6daa8d1b..be77932596b3 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -44,6 +44,7 @@
>  #include <linux/sched/sysctl.h>
>  #include <linux/memory-tiers.h>
>  #include <linux/pagewalk.h>
> +#include <linux/shmem_fs.h>
>
>  #include <asm/tlbflush.h>
>
> @@ -524,7 +525,11 @@ static int __folio_migrate_mapping(struct address_space *mapping,
>  			folio_set_swapcache(newfolio);
>  			newfolio->private = folio_get_private(folio);
>  		}
> -		entries = nr;
> +		/* shmem now uses high-order entry */
> +		if (folio->mapping && shmem_mapping(folio->mapping))
> +			entries = 1;
> +		else
> +			entries = nr;
>  	} else {
>  		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
>  		entries = 1;
>
>
> Best Regards,
> Yan, Zi
> .
>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Softlockup when test shmem swapout-swapin and compaction
  2025-02-27 23:43     ` Zi Yan
  2025-02-28  1:52       ` Liu Shixin
@ 2025-02-28  3:39       ` Baolin Wang
  2025-02-28 15:19         ` Zi Yan
  1 sibling, 1 reply; 10+ messages in thread
From: Baolin Wang @ 2025-02-28  3:39 UTC (permalink / raw)
  To: Zi Yan, Liu Shixin
  Cc: linux-mm, Linux Kernel Mailing List, Barry Song,
	David Hildenbrand, Hugh Dickins, Kefeng Wang, Lance Yang,
	Matthew Wilcox, Ryan Roberts, Andrew Morton



On 2025/2/28 07:43, Zi Yan wrote:
> On 27 Feb 2025, at 2:04, Liu Shixin wrote:
> 
>> On 2025/2/26 15:22, Baolin Wang wrote:
>>> Add Zi.
>>>
>>> On 2025/2/26 15:03, Liu Shixin wrote:
>>>> Hi all,
>>>>
>>>> I found a softlockup when testing shmem large folio swapout-swapin and compaction:
>>>>
>>>>    watchdog: BUG: soft lockup - CPU#30 stuck for 179s! [folio_swap:4714]
>>>>    Modules linked in: zram xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter ip_tantel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel kvm rapl cixt4 mbcache jbd2 sr_mod cdrom ata_generic ata_piix virtio_net net_failover ghash_clmulni_intel libata sha512_ssse3
>>>>    CPU: 30 UID: 0 PID: 4714 Comm: folio_swap Kdump: loaded Tainted: G             L     6.14.0-rc4-next-20250225+ #2
>>>>    Tainted: [L]=SOFTLOCKUP
>>>>    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
>>>>    RIP: 0010:xas_load+0x5d/0xc0
>>>>    Code: 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d
>>>>    RSP: 0000:ffffadf142f1ba60 EFLAGS: 00000293
>>>>    RAX: ffffe524cc4f6700 RBX: ffffadf142f1ba90 RCX: 0000000000000000
>>>>    RDX: 0000000000000011 RSI: ffff9a3e058acb68 RDI: ffffadf142f1ba90
>>>>    RBP: fffffffffffffffe R08: ffffadf142f1bb50 R09: 0000000000000392
>>>>    R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000011
>>>>    R13: ffffadf142f1bb48 R14: ffff9a3e04e9c588 R15: 0000000000000000
>>>>    FS:  00007fd957666740(0000) GS:ffff9a41ac0e5000(0000) knlGS:0000000000000000
>>>>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>    CR2: 00007fd922860000 CR3: 000000025c360001 CR4: 0000000000772ef0
>>>>    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>    PKRU: 55555554
>>>>    Call Trace:
>>>>     <IRQ>
>>>>     ? watchdog_timer_fn+0x1c9/0x250
>>>>     ? __pfx_watchdog_timer_fn+0x10/0x10
>>>>     ? __hrtimer_run_queues+0x10e/0x250
>>>>     ? hrtimer_interrupt+0xfb/0x240
>>>>     ? __sysvec_apic_timer_interrupt+0x4e/0xe0
>>>>     ? sysvec_apic_timer_interrupt+0x68/0x90
>>>>     </IRQ>
>>>>     <TASK>
>>>>     ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>>>>     ? xas_load+0x5d/0xc0
>>>>     xas_find+0x153/0x1a0
>>>>     find_get_entries+0x73/0x280
>>>>     shmem_undo_range+0x1fc/0x640
>>>>     shmem_evict_inode+0x109/0x270
>>>>     evict+0x107/0x240
>>>>     ? fsnotify_destroy_marks+0x25/0x180
>>>>     ? _atomic_dec_and_lock+0x35/0x50
>>>>     __dentry_kill+0x71/0x190
>>>>     dput+0xd1/0x190
>>>>     __fput+0x128/0x2a0
>>>>     task_work_run+0x57/0x90
>>>>     syscall_exit_to_user_mode+0x1cb/0x1e0
>>>>     do_syscall_64+0x67/0x170
>>>>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>    RIP: 0033:0x7fd95776eb8b
>>>>
>>>> If CONFIG_DEBUG_VM is enabled, we will meet VM_BUG_ON_FOLIO(!folio_test_locked(folio)) in
>>>> shmem_add_to_page_cache() too.  It seems that the problem is related to memory migration or
>>>> compaction which is necessary for reproduction,  although without a clear why.
>>>>
>>>> To reproduce the problem, we need firstly a zram device as swap backend, and then run the
>>>> reproduction program. The reproduction program consists of three parts:
>>>>    1. A process constantly changes the status of shmem large folio by these interfaces:
>>>>           /sys/kernel/mm/transparent_hugepage/hugepages-<size>/shmem_enabled
>>>>    2. A process constantly echo 1 > /proc/sys/vm/compact_memory
>>>>    3. A process constantly alloc/free/swapout/swapin shmem large folios.
>>>>
>>>> I'm not sure whether the first process is necessary but the second and third are. In addition,
>>>> I tried hacking to modify compaction_alloc to return NULL, and the problem disappeared,
>>>> so I guess the problem is in migration.
>>>>
>>>> The problem is different with https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
>>>> since I have confirmed this porblem still existed after merge the fixed patch.
>>>
>>> Could you check if your version includes Zi's fix[1]? Not sure if it's related to the shmem large folio split.
>>>
>>> [1] https://lore.kernel.org/all/AF487A7A-F685-485D-8D74-756C843D6F0A@nvidia.com/
>>> .
>>>
>> Already include this patch when test.
> 
> Hi Shixin,
> 
> Can you try the diff below? It fixed my local repro.
> 
> The issue is that after Baolin’s patch, shmem folios now use high-order
> entry, so the migration code should not update multiple xarray slots.

It is not after my patches. After converting shmem to use folio, shmem 
mapping will store large order, but during swap, the shmem large folio 
will be split (whereas my patches allow shmem large folio swap without 
splitting).

> Hi Baolin,
> 
> Is your patch affecting anonymous swapping out? If yes, we can remove

No.

> the for loop of updating xarray in __folio_migrate_mapping().

I think the issue is introduced by commit fc346d0a70a1 ("mm: migrate 
high-order folios in swap cache correctly"), which did not handle shmem 
folio correctly.

> diff --git a/mm/migrate.c b/mm/migrate.c
> index 365c6daa8d1b..be77932596b3 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -44,6 +44,7 @@
>   #include <linux/sched/sysctl.h>
>   #include <linux/memory-tiers.h>
>   #include <linux/pagewalk.h>
> +#include <linux/shmem_fs.h>
> 
>   #include <asm/tlbflush.h>
> 
> @@ -524,7 +525,11 @@ static int __folio_migrate_mapping(struct address_space *mapping,
>   			folio_set_swapcache(newfolio);
>   			newfolio->private = folio_get_private(folio);
>   		}
> -		entries = nr;
> +		/* shmem now uses high-order entry */
> +		if (folio->mapping && shmem_mapping(folio->mapping))

Nit: we've already checked the 'mapping', and we can simplify it to 
'shmem_mapping(mapping)'.

> +			entries = 1;
> +		else
> +			entries = nr;
>   	} else {
>   		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
>   		entries = 1;

Good catch. The fix look good to me. Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Softlockup when test shmem swapout-swapin and compaction
  2025-02-28  3:39       ` Baolin Wang
@ 2025-02-28 15:19         ` Zi Yan
  0 siblings, 0 replies; 10+ messages in thread
From: Zi Yan @ 2025-02-28 15:19 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Liu Shixin, linux-mm, Linux Kernel Mailing List, Barry Song,
	David Hildenbrand, Hugh Dickins, Kefeng Wang, Lance Yang,
	Matthew Wilcox, Ryan Roberts, Andrew Morton

On 27 Feb 2025, at 22:39, Baolin Wang wrote:

> On 2025/2/28 07:43, Zi Yan wrote:
>> On 27 Feb 2025, at 2:04, Liu Shixin wrote:
>>
>>> On 2025/2/26 15:22, Baolin Wang wrote:
>>>> Add Zi.
>>>>
>>>> On 2025/2/26 15:03, Liu Shixin wrote:
>>>>> Hi all,
>>>>>
>>>>> I found a softlockup when testing shmem large folio swapout-swapin and compaction:
>>>>>
>>>>>    watchdog: BUG: soft lockup - CPU#30 stuck for 179s! [folio_swap:4714]
>>>>>    Modules linked in: zram xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter ip_tantel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel kvm rapl cixt4 mbcache jbd2 sr_mod cdrom ata_generic ata_piix virtio_net net_failover ghash_clmulni_intel libata sha512_ssse3
>>>>>    CPU: 30 UID: 0 PID: 4714 Comm: folio_swap Kdump: loaded Tainted: G             L     6.14.0-rc4-next-20250225+ #2
>>>>>    Tainted: [L]=SOFTLOCKUP
>>>>>    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
>>>>>    RIP: 0010:xas_load+0x5d/0xc0
>>>>>    Code: 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d
>>>>>    RSP: 0000:ffffadf142f1ba60 EFLAGS: 00000293
>>>>>    RAX: ffffe524cc4f6700 RBX: ffffadf142f1ba90 RCX: 0000000000000000
>>>>>    RDX: 0000000000000011 RSI: ffff9a3e058acb68 RDI: ffffadf142f1ba90
>>>>>    RBP: fffffffffffffffe R08: ffffadf142f1bb50 R09: 0000000000000392
>>>>>    R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000011
>>>>>    R13: ffffadf142f1bb48 R14: ffff9a3e04e9c588 R15: 0000000000000000
>>>>>    FS:  00007fd957666740(0000) GS:ffff9a41ac0e5000(0000) knlGS:0000000000000000
>>>>>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>    CR2: 00007fd922860000 CR3: 000000025c360001 CR4: 0000000000772ef0
>>>>>    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>    PKRU: 55555554
>>>>>    Call Trace:
>>>>>     <IRQ>
>>>>>     ? watchdog_timer_fn+0x1c9/0x250
>>>>>     ? __pfx_watchdog_timer_fn+0x10/0x10
>>>>>     ? __hrtimer_run_queues+0x10e/0x250
>>>>>     ? hrtimer_interrupt+0xfb/0x240
>>>>>     ? __sysvec_apic_timer_interrupt+0x4e/0xe0
>>>>>     ? sysvec_apic_timer_interrupt+0x68/0x90
>>>>>     </IRQ>
>>>>>     <TASK>
>>>>>     ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>>>>>     ? xas_load+0x5d/0xc0
>>>>>     xas_find+0x153/0x1a0
>>>>>     find_get_entries+0x73/0x280
>>>>>     shmem_undo_range+0x1fc/0x640
>>>>>     shmem_evict_inode+0x109/0x270
>>>>>     evict+0x107/0x240
>>>>>     ? fsnotify_destroy_marks+0x25/0x180
>>>>>     ? _atomic_dec_and_lock+0x35/0x50
>>>>>     __dentry_kill+0x71/0x190
>>>>>     dput+0xd1/0x190
>>>>>     __fput+0x128/0x2a0
>>>>>     task_work_run+0x57/0x90
>>>>>     syscall_exit_to_user_mode+0x1cb/0x1e0
>>>>>     do_syscall_64+0x67/0x170
>>>>>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>    RIP: 0033:0x7fd95776eb8b
>>>>>
>>>>> If CONFIG_DEBUG_VM is enabled, we will meet VM_BUG_ON_FOLIO(!folio_test_locked(folio)) in
>>>>> shmem_add_to_page_cache() too.  It seems that the problem is related to memory migration or
>>>>> compaction which is necessary for reproduction,  although without a clear why.
>>>>>
>>>>> To reproduce the problem, we need firstly a zram device as swap backend, and then run the
>>>>> reproduction program. The reproduction program consists of three parts:
>>>>>    1. A process constantly changes the status of shmem large folio by these interfaces:
>>>>>           /sys/kernel/mm/transparent_hugepage/hugepages-<size>/shmem_enabled
>>>>>    2. A process constantly echo 1 > /proc/sys/vm/compact_memory
>>>>>    3. A process constantly alloc/free/swapout/swapin shmem large folios.
>>>>>
>>>>> I'm not sure whether the first process is necessary but the second and third are. In addition,
>>>>> I tried hacking to modify compaction_alloc to return NULL, and the problem disappeared,
>>>>> so I guess the problem is in migration.
>>>>>
>>>>> The problem is different with https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
>>>>> since I have confirmed this porblem still existed after merge the fixed patch.
>>>>
>>>> Could you check if your version includes Zi's fix[1]? Not sure if it's related to the shmem large folio split.
>>>>
>>>> [1] https://lore.kernel.org/all/AF487A7A-F685-485D-8D74-756C843D6F0A@nvidia.com/
>>>> .
>>>>
>>> Already include this patch when test.
>>
>> Hi Shixin,
>>
>> Can you try the diff below? It fixed my local repro.
>>
>> The issue is that after Baolin’s patch, shmem folios now use high-order
>> entry, so the migration code should not update multiple xarray slots.
>
> It is not after my patches. After converting shmem to use folio, shmem mapping will store large order, but during swap, the shmem large folio will be split (whereas my patches allow shmem large folio swap without splitting).

Got it. Sorry for pointing the wrong source.

>
>> Hi Baolin,
>>
>> Is your patch affecting anonymous swapping out? If yes, we can remove
>
> No.
>
>> the for loop of updating xarray in __folio_migrate_mapping().
>
> I think the issue is introduced by commit fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly"), which did not handle shmem folio correctly.

I agree. What a mess. So right now only swap cache used by anonymous
memory does not use multi-index entry.

>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 365c6daa8d1b..be77932596b3 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -44,6 +44,7 @@
>>   #include <linux/sched/sysctl.h>
>>   #include <linux/memory-tiers.h>
>>   #include <linux/pagewalk.h>
>> +#include <linux/shmem_fs.h>
>>
>>   #include <asm/tlbflush.h>
>>
>> @@ -524,7 +525,11 @@ static int __folio_migrate_mapping(struct address_space *mapping,
>>   			folio_set_swapcache(newfolio);
>>   			newfolio->private = folio_get_private(folio);
>>   		}
>> -		entries = nr;
>> +		/* shmem now uses high-order entry */
>> +		if (folio->mapping && shmem_mapping(folio->mapping))
>
> Nit: we've already checked the 'mapping', and we can simplify it to 'shmem_mapping(mapping)'.

Sure.
>
>> +			entries = 1;
>> +		else
>> +			entries = nr;
>>   	} else {
>>   		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
>>   		entries = 1;
>
> Good catch. The fix look good to me. Thanks.

Thank you for checking the patch. I will send a proper one out.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Softlockup when test shmem swapout-swapin and compaction
  2025-02-26  7:22 ` Baolin Wang
  2025-02-27  7:04   ` Liu Shixin
@ 2025-03-05 16:27   ` Zi Yan
  2025-03-05 18:44     ` Zi Yan
  1 sibling, 1 reply; 10+ messages in thread
From: Zi Yan @ 2025-03-05 16:27 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Liu Shixin, linux-mm, Linux Kernel Mailing List, Barry Song,
	David Hildenbrand, Hugh Dickins, Kefeng Wang, Lance Yang,
	Matthew Wilcox, Ryan Roberts, Andrew Morton

On 26 Feb 2025, at 2:22, Baolin Wang wrote:

> Add Zi.
>
> On 2025/2/26 15:03, Liu Shixin wrote:
>> Hi all,
>>
>> I found a softlockup when testing shmem large folio swapout-swapin and compaction:
>>
>>   watchdog: BUG: soft lockup - CPU#30 stuck for 179s! [folio_swap:4714]
>>   Modules linked in: zram xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter ip_tantel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel kvm rapl cixt4 mbcache jbd2 sr_mod cdrom ata_generic ata_piix virtio_net net_failover ghash_clmulni_intel libata sha512_ssse3
>>   CPU: 30 UID: 0 PID: 4714 Comm: folio_swap Kdump: loaded Tainted: G             L     6.14.0-rc4-next-20250225+ #2
>>   Tainted: [L]=SOFTLOCKUP
>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
>>   RIP: 0010:xas_load+0x5d/0xc0
>>   Code: 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d
>>   RSP: 0000:ffffadf142f1ba60 EFLAGS: 00000293
>>   RAX: ffffe524cc4f6700 RBX: ffffadf142f1ba90 RCX: 0000000000000000
>>   RDX: 0000000000000011 RSI: ffff9a3e058acb68 RDI: ffffadf142f1ba90
>>   RBP: fffffffffffffffe R08: ffffadf142f1bb50 R09: 0000000000000392
>>   R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000011
>>   R13: ffffadf142f1bb48 R14: ffff9a3e04e9c588 R15: 0000000000000000
>>   FS:  00007fd957666740(0000) GS:ffff9a41ac0e5000(0000) knlGS:0000000000000000
>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   CR2: 00007fd922860000 CR3: 000000025c360001 CR4: 0000000000772ef0
>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>   PKRU: 55555554
>>   Call Trace:
>>    <IRQ>
>>    ? watchdog_timer_fn+0x1c9/0x250
>>    ? __pfx_watchdog_timer_fn+0x10/0x10
>>    ? __hrtimer_run_queues+0x10e/0x250
>>    ? hrtimer_interrupt+0xfb/0x240
>>    ? __sysvec_apic_timer_interrupt+0x4e/0xe0
>>    ? sysvec_apic_timer_interrupt+0x68/0x90
>>    </IRQ>
>>    <TASK>
>>    ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>>    ? xas_load+0x5d/0xc0
>>    xas_find+0x153/0x1a0
>>    find_get_entries+0x73/0x280
>>    shmem_undo_range+0x1fc/0x640
>>    shmem_evict_inode+0x109/0x270
>>    evict+0x107/0x240
>>    ? fsnotify_destroy_marks+0x25/0x180
>>    ? _atomic_dec_and_lock+0x35/0x50
>>    __dentry_kill+0x71/0x190
>>    dput+0xd1/0x190
>>    __fput+0x128/0x2a0
>>    task_work_run+0x57/0x90
>>    syscall_exit_to_user_mode+0x1cb/0x1e0
>>    do_syscall_64+0x67/0x170
>>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>   RIP: 0033:0x7fd95776eb8b
>>
>> If CONFIG_DEBUG_VM is enabled, we will meet VM_BUG_ON_FOLIO(!folio_test_locked(folio)) in
>> shmem_add_to_page_cache() too.  It seems that the problem is related to memory migration or
>> compaction which is necessary for reproduction,  although without a clear why.
>>
>> To reproduce the problem, we need firstly a zram device as swap backend, and then run the
>> reproduction program. The reproduction program consists of three parts:
>>   1. A process constantly changes the status of shmem large folio by these interfaces:
>>          /sys/kernel/mm/transparent_hugepage/hugepages-<size>/shmem_enabled
>>   2. A process constantly echo 1 > /proc/sys/vm/compact_memory
>>   3. A process constantly alloc/free/swapout/swapin shmem large folios.
>>
>> I'm not sure whether the first process is necessary but the second and third are. In addition,
>> I tried hacking to modify compaction_alloc to return NULL, and the problem disappeared,
>> so I guess the problem is in migration.
>>
>> The problem is different with https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
>> since I have confirmed this porblem still existed after merge the fixed patch.
>
> Could you check if your version includes Zi's fix[1]? Not sure if it's related to the shmem large folio split.
>
> [1] https://lore.kernel.org/all/AF487A7A-F685-485D-8D74-756C843D6F0A@nvidia.com/

Hi Baolin,

With step 2 and 3 above, I hit the error below on mm-everything-2025-03-05-03-54.
It hits VM_BUG_ON_FOLIO(!folio_contains(folio, xas.xa_index), folio).
mm-everything-2025-03-05-03-54 does not have my folio_split() nor xas_try_split()
patches. Can you take a look at it?

What I did:

1. run “while true; do echo 1 | sudo tee /proc/sys/vm/compact_memory >/dev/null; done” to trigger compaction;
2. use zswap with “echo Y | sudo tee /sys/module/zswap/parameters/enabled”;
3. running shmem_aligned_swapin.c you gave me with 128MB SIZE and using
memset(shared_memory, 0xab, SIZE) at the end instead of just setting 0s
(this should not make a difference from your original program).


[   37.591314] page: refcount:2 mapcount:0 mapping:000000000c53d1a6 index:0x6687 pfn:0x169420
[   37.592552] memcg:ffff8881781c1800
[   37.593046] aops:shmem_aops ino:c0f
[   37.593586] flags: 0x2ffff000002013d(locked|referenced|uptodate|dirty|lru|active|swapbacked|node=0|zone=2|lastcpupid=0xffff)
[   37.595138] raw: 02ffff000002013d ffffea00058a6b08 ffffea0005a50408 ffff88817951cd68
[   37.596207] raw: 0000000000006687 0000000000000000 00000002ffffffff ffff8881781c1800
[   37.597305] page dumped because: VM_BUG_ON_FOLIO(!folio_contains(folio, xas.xa_index))
[   37.598464] ------------[ cut here ]------------
[   37.599119] kernel BUG at mm/filemap.c:2134!
[   37.599729] Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   37.600494] CPU: 4 UID: 1000 PID: 1922 Comm: shmem_aligned_s Not tainted 6.14.0-rc3-mm-everything-2024-01-29-07-19-no-mglru+ #49
[   37.602267] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   37.603557] RIP: 0010:find_lock_entries+0x352/0x360
[   37.604269] Code: 44 24 10 48 8b 4c 24 18 eb b5 48 c7 c6 08 bb 99 82 4c 89 ff e8 cf e6 03 00 0f 0b 48 c7 c6 70 a3 9c 82 4c 89 ff e8 be e6 03 00 <0f> 0b e8 a7 69 ca 00 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 9
0
[   37.606838] RSP: 0018:ffffc900029bba68 EFLAGS: 00010246
[   37.607566] RAX: 000000000000004a RBX: fffffffffffffffe RCX: 0000000000000000
[   37.608550] RDX: 0000000000000000 RSI: ffff8882b5d1d900 RDI: ffff8882b5d1d900
[   37.609580] RBP: ffffc900029bba88 R08: 00000000ffffdfff R09: ffffffff82d238c8
[   37.610566] R10: ffffffff82c73920 R11: 6d75642065676170 R12: ffffc900029bbb48
[   37.611551] R13: ffffc900029bbc48 R14: ffff88817951cd68 R15: ffffea0005a50800
[   37.612535] FS:  00007f40107cc740(0000) GS:ffff8882b5d00000(0000) knlGS:0000000000000000
[   37.613683] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.614483] CR2: 00007f40107bf000 CR3: 0000000101c00004 CR4: 0000000000370ef0
[   37.615472] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   37.616459] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   37.617481] Call Trace:
[   37.617838]  <TASK>
[   37.618183]  ? __die_body.cold+0x19/0x26
[   37.618746]  ? die+0x2a/0x50
[   37.619150]  ? do_trap+0xc6/0x110
[   37.619610]  ? do_error_trap+0x6a/0x90
[   37.620126]  ? find_lock_entries+0x352/0x360
[   37.620711]  ? exc_invalid_op+0x4c/0x60
[   37.621354]  ? find_lock_entries+0x352/0x360
[   37.622075]  ? asm_exc_invalid_op+0x16/0x20
[   37.622647]  ? find_lock_entries+0x352/0x360
[   37.623268]  ? find_lock_entries+0x352/0x360
[   37.623946]  shmem_undo_range+0xf8/0x6a0
[   37.624547]  shmem_evict_inode+0x117/0x2b0
[   37.625145]  evict+0x10b/0x290
[   37.625607]  ? fsnotify_grab_connector+0x42/0x80
[   37.626257]  ? fsnotify_destroy_marks+0x26/0x1a0
[   37.626903]  ? do_raw_spin_unlock+0x49/0xb0
[   37.627489]  __dentry_kill+0x71/0x1a0
[   37.628009]  dput+0xf8/0x1c0
[   37.628427]  __fput+0x143/0x2b0
[   37.628877]  task_work_run+0x58/0xa0
[   37.629428]  syscall_exit_to_user_mode+0x168/0x170
[   37.630099]  do_syscall_64+0x57/0x110
[   37.630619]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   37.631324] RIP: 0033:0x7f40108dc1f7
[   37.631844] Code: 73 01 c3 48 8b 0d 29 9c 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f9 9b 0d 00 f7 d8 64 89 01 4
8
[   37.634424] RSP: 002b:00007ffcad4bd468 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
[   37.635466] RAX: 0000000000000000 RBX: 00007ffcad4bd598 RCX: 00007f40108dc1f7
[   37.636450] RDX: 0000000008000000 RSI: 0000000008000000 RDI: 00007f40087c0000
[   37.637477] RBP: 00007ffcad4bd480 R08: 0000000000000064 R09: 00000000fffffffd
[   37.638460] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000000000
[   37.639442] R13: 00007ffcad4bd5a8 R14: 00007f4010a10000 R15: 000055694d923dd8
[   37.640422]  </TASK>
[   37.640750] Modules linked in:
[   37.641249] ---[ end trace 0000000000000000 ]---


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Softlockup when test shmem swapout-swapin and compaction
  2025-03-05 16:27   ` Zi Yan
@ 2025-03-05 18:44     ` Zi Yan
  2025-03-05 18:51       ` Matthew Wilcox
  0 siblings, 1 reply; 10+ messages in thread
From: Zi Yan @ 2025-03-05 18:44 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Liu Shixin, linux-mm, Linux Kernel Mailing List, Barry Song,
	David Hildenbrand, Hugh Dickins, Kefeng Wang, Lance Yang,
	Matthew Wilcox, Ryan Roberts, Andrew Morton

On 5 Mar 2025, at 11:27, Zi Yan wrote:

> On 26 Feb 2025, at 2:22, Baolin Wang wrote:
>
>> Add Zi.
>>
>> On 2025/2/26 15:03, Liu Shixin wrote:
>>> Hi all,
>>>
>>> I found a softlockup when testing shmem large folio swapout-swapin and compaction:
>>>
>>>   watchdog: BUG: soft lockup - CPU#30 stuck for 179s! [folio_swap:4714]
>>>   Modules linked in: zram xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter ip_tantel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel kvm rapl cixt4 mbcache jbd2 sr_mod cdrom ata_generic ata_piix virtio_net net_failover ghash_clmulni_intel libata sha512_ssse3
>>>   CPU: 30 UID: 0 PID: 4714 Comm: folio_swap Kdump: loaded Tainted: G             L     6.14.0-rc4-next-20250225+ #2
>>>   Tainted: [L]=SOFTLOCKUP
>>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
>>>   RIP: 0010:xas_load+0x5d/0xc0
>>>   Code: 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 48 3d
>>>   RSP: 0000:ffffadf142f1ba60 EFLAGS: 00000293
>>>   RAX: ffffe524cc4f6700 RBX: ffffadf142f1ba90 RCX: 0000000000000000
>>>   RDX: 0000000000000011 RSI: ffff9a3e058acb68 RDI: ffffadf142f1ba90
>>>   RBP: fffffffffffffffe R08: ffffadf142f1bb50 R09: 0000000000000392
>>>   R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000011
>>>   R13: ffffadf142f1bb48 R14: ffff9a3e04e9c588 R15: 0000000000000000
>>>   FS:  00007fd957666740(0000) GS:ffff9a41ac0e5000(0000) knlGS:0000000000000000
>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>   CR2: 00007fd922860000 CR3: 000000025c360001 CR4: 0000000000772ef0
>>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>   PKRU: 55555554
>>>   Call Trace:
>>>    <IRQ>
>>>    ? watchdog_timer_fn+0x1c9/0x250
>>>    ? __pfx_watchdog_timer_fn+0x10/0x10
>>>    ? __hrtimer_run_queues+0x10e/0x250
>>>    ? hrtimer_interrupt+0xfb/0x240
>>>    ? __sysvec_apic_timer_interrupt+0x4e/0xe0
>>>    ? sysvec_apic_timer_interrupt+0x68/0x90
>>>    </IRQ>
>>>    <TASK>
>>>    ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>>>    ? xas_load+0x5d/0xc0
>>>    xas_find+0x153/0x1a0
>>>    find_get_entries+0x73/0x280
>>>    shmem_undo_range+0x1fc/0x640
>>>    shmem_evict_inode+0x109/0x270
>>>    evict+0x107/0x240
>>>    ? fsnotify_destroy_marks+0x25/0x180
>>>    ? _atomic_dec_and_lock+0x35/0x50
>>>    __dentry_kill+0x71/0x190
>>>    dput+0xd1/0x190
>>>    __fput+0x128/0x2a0
>>>    task_work_run+0x57/0x90
>>>    syscall_exit_to_user_mode+0x1cb/0x1e0
>>>    do_syscall_64+0x67/0x170
>>>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>   RIP: 0033:0x7fd95776eb8b
>>>
>>> If CONFIG_DEBUG_VM is enabled, we will meet VM_BUG_ON_FOLIO(!folio_test_locked(folio)) in
>>> shmem_add_to_page_cache() too.  It seems that the problem is related to memory migration or
>>> compaction which is necessary for reproduction,  although without a clear why.
>>>
>>> To reproduce the problem, we need firstly a zram device as swap backend, and then run the
>>> reproduction program. The reproduction program consists of three parts:
>>>   1. A process constantly changes the status of shmem large folio by these interfaces:
>>>          /sys/kernel/mm/transparent_hugepage/hugepages-<size>/shmem_enabled
>>>   2. A process constantly echo 1 > /proc/sys/vm/compact_memory
>>>   3. A process constantly alloc/free/swapout/swapin shmem large folios.
>>>
>>> I'm not sure whether the first process is necessary but the second and third are. In addition,
>>> I tried hacking to modify compaction_alloc to return NULL, and the problem disappeared,
>>> so I guess the problem is in migration.
>>>
>>> The problem is different with https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
>>> since I have confirmed this porblem still existed after merge the fixed patch.
>>
>> Could you check if your version includes Zi's fix[1]? Not sure if it's related to the shmem large folio split.
>>
>> [1] https://lore.kernel.org/all/AF487A7A-F685-485D-8D74-756C843D6F0A@nvidia.com/
>
> Hi Baolin,
>
> With step 2 and 3 above, I hit the error below on mm-everything-2025-03-05-03-54.
> It hits VM_BUG_ON_FOLIO(!folio_contains(folio, xas.xa_index), folio).
> mm-everything-2025-03-05-03-54 does not have my folio_split() nor xas_try_split()
> patches. Can you take a look at it?
>
> What I did:
>
> 1. run “while true; do echo 1 | sudo tee /proc/sys/vm/compact_memory >/dev/null; done” to trigger compaction;
> 2. use zswap with “echo Y | sudo tee /sys/module/zswap/parameters/enabled”;
> 3. running shmem_aligned_swapin.c you gave me with 128MB SIZE and using
> memset(shared_memory, 0xab, SIZE) at the end instead of just setting 0s
> (this should not make a difference from your original program).
>
>
> [   37.591314] page: refcount:2 mapcount:0 mapping:000000000c53d1a6 index:0x6687 pfn:0x169420
> [   37.592552] memcg:ffff8881781c1800
> [   37.593046] aops:shmem_aops ino:c0f
> [   37.593586] flags: 0x2ffff000002013d(locked|referenced|uptodate|dirty|lru|active|swapbacked|node=0|zone=2|lastcpupid=0xffff)
> [   37.595138] raw: 02ffff000002013d ffffea00058a6b08 ffffea0005a50408 ffff88817951cd68
> [   37.596207] raw: 0000000000006687 0000000000000000 00000002ffffffff ffff8881781c1800
> [   37.597305] page dumped because: VM_BUG_ON_FOLIO(!folio_contains(folio, xas.xa_index))
> [   37.598464] ------------[ cut here ]------------
> [   37.599119] kernel BUG at mm/filemap.c:2134!
> [   37.599729] Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
> [   37.600494] CPU: 4 UID: 1000 PID: 1922 Comm: shmem_aligned_s Not tainted 6.14.0-rc3-mm-everything-2024-01-29-07-19-no-mglru+ #49
> [   37.602267] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [   37.603557] RIP: 0010:find_lock_entries+0x352/0x360
> [   37.604269] Code: 44 24 10 48 8b 4c 24 18 eb b5 48 c7 c6 08 bb 99 82 4c 89 ff e8 cf e6 03 00 0f 0b 48 c7 c6 70 a3 9c 82 4c 89 ff e8 be e6 03 00 <0f> 0b e8 a7 69 ca 00 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 9
> 0
> [   37.606838] RSP: 0018:ffffc900029bba68 EFLAGS: 00010246
> [   37.607566] RAX: 000000000000004a RBX: fffffffffffffffe RCX: 0000000000000000
> [   37.608550] RDX: 0000000000000000 RSI: ffff8882b5d1d900 RDI: ffff8882b5d1d900
> [   37.609580] RBP: ffffc900029bba88 R08: 00000000ffffdfff R09: ffffffff82d238c8
> [   37.610566] R10: ffffffff82c73920 R11: 6d75642065676170 R12: ffffc900029bbb48
> [   37.611551] R13: ffffc900029bbc48 R14: ffff88817951cd68 R15: ffffea0005a50800
> [   37.612535] FS:  00007f40107cc740(0000) GS:ffff8882b5d00000(0000) knlGS:0000000000000000
> [   37.613683] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   37.614483] CR2: 00007f40107bf000 CR3: 0000000101c00004 CR4: 0000000000370ef0
> [   37.615472] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   37.616459] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   37.617481] Call Trace:
> [   37.617838]  <TASK>
> [   37.618183]  ? __die_body.cold+0x19/0x26
> [   37.618746]  ? die+0x2a/0x50
> [   37.619150]  ? do_trap+0xc6/0x110
> [   37.619610]  ? do_error_trap+0x6a/0x90
> [   37.620126]  ? find_lock_entries+0x352/0x360
> [   37.620711]  ? exc_invalid_op+0x4c/0x60
> [   37.621354]  ? find_lock_entries+0x352/0x360
> [   37.622075]  ? asm_exc_invalid_op+0x16/0x20
> [   37.622647]  ? find_lock_entries+0x352/0x360
> [   37.623268]  ? find_lock_entries+0x352/0x360
> [   37.623946]  shmem_undo_range+0xf8/0x6a0
> [   37.624547]  shmem_evict_inode+0x117/0x2b0
> [   37.625145]  evict+0x10b/0x290
> [   37.625607]  ? fsnotify_grab_connector+0x42/0x80
> [   37.626257]  ? fsnotify_destroy_marks+0x26/0x1a0
> [   37.626903]  ? do_raw_spin_unlock+0x49/0xb0
> [   37.627489]  __dentry_kill+0x71/0x1a0
> [   37.628009]  dput+0xf8/0x1c0
> [   37.628427]  __fput+0x143/0x2b0
> [   37.628877]  task_work_run+0x58/0xa0
> [   37.629428]  syscall_exit_to_user_mode+0x168/0x170
> [   37.630099]  do_syscall_64+0x57/0x110
> [   37.630619]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   37.631324] RIP: 0033:0x7f40108dc1f7
> [   37.631844] Code: 73 01 c3 48 8b 0d 29 9c 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f9 9b 0d 00 f7 d8 64 89 01 4
> 8
> [   37.634424] RSP: 002b:00007ffcad4bd468 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> [   37.635466] RAX: 0000000000000000 RBX: 00007ffcad4bd598 RCX: 00007f40108dc1f7
> [   37.636450] RDX: 0000000008000000 RSI: 0000000008000000 RDI: 00007f40087c0000
> [   37.637477] RBP: 00007ffcad4bd480 R08: 0000000000000064 R09: 00000000fffffffd
> [   37.638460] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000000000
> [   37.639442] R13: 00007ffcad4bd5a8 R14: 00007f4010a10000 R15: 000055694d923dd8
> [   37.640422]  </TASK>
> [   37.640750] Modules linked in:
> [   37.641249] ---[ end trace 0000000000000000 ]---

OK, it is probably still caused by the __folio_migrate_mapping() bug, since
writing to sibling entries of a multi-index entry breaks the multi-index entry.
Thank Matthew for doing the experiments.

I will send a proper fix based on Hugh’s version[1] and cc stable.


[1] https://lore.kernel.org/linux-mm/23d65532-859a-e88f-9c24-06a6c7ff4006@google.com/
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Softlockup when test shmem swapout-swapin and compaction
  2025-03-05 18:44     ` Zi Yan
@ 2025-03-05 18:51       ` Matthew Wilcox
  0 siblings, 0 replies; 10+ messages in thread
From: Matthew Wilcox @ 2025-03-05 18:51 UTC (permalink / raw)
  To: Zi Yan
  Cc: Baolin Wang, Liu Shixin, linux-mm, Linux Kernel Mailing List,
	Barry Song, David Hildenbrand, Hugh Dickins, Kefeng Wang,
	Lance Yang, Ryan Roberts, Andrew Morton

On Wed, Mar 05, 2025 at 01:44:29PM -0500, Zi Yan wrote:
> OK, it is probably still caused by the __folio_migrate_mapping() bug, since
> writing to sibling entries of a multi-index entry breaks the multi-index entry.
> Thank Matthew for doing the experiments.

Here's what I did:

diff --git a/lib/test_xarray.c b/lib/test_xarray.c
index 0e865bab4a10..4f38db416ff6 100644
--- a/lib/test_xarray.c
+++ b/lib/test_xarray.c
@@ -645,6 +645,26 @@ static noinline void check_multi_store_3(struct xarray *xa, unsigned long index,

        xa_destroy(xa);
 }
+
+static noinline void check_multi_store_4(struct xarray *xa, unsigned long index,
+               unsigned int order)
+{
+       XA_STATE(xas, xa, index);
+       int i;
+
+       xa_store_order(xa, index, order, xa_mk_value(0), GFP_KERNEL);
+       xa_dump(xa);
+
+       xas_lock(&xas);
+       for (i = 0; i < (1 << order); i++) {
+               xas_store(&xas, xa_mk_value(index));
+               xas_next(&xas);
+       }
+       xas_unlock(&xas);
+       xa_dump(xa);
+
+       xa_destroy(xa);
+}
 #endif

 static noinline void check_multi_store(struct xarray *xa)
@@ -724,6 +744,7 @@ static noinline void check_multi_store(struct xarray *xa)
                check_multi_store_3(xa, 0, i);
                check_multi_store_3(xa, 1UL << i, i);
        }
+       check_multi_store_4(xa, 16, 4);
 #endif
 }


The xa_dump before shows sibling entries:

$ ./tools/testing/radix-tree/xarray
xarray: 0x55b2335be180x head 0x516004c75b82x flags 0 marks 0 0 0
0-63: node 0x516004c75b80x max 0 parent (nil)x shift 0 count 16 values 16 array 0x55b2335be180x list 0x516004c75b98x 0x516004c75b98x marks 0 0 0
16: value 0 (0x0) [0x1x]
17: sibling (slot 16)
18: sibling (slot 16)
19: sibling (slot 16)
[...]

And then after shows them turned into normal entries:

xarray: 0x55b2335be180x head 0x516004c75b82x flags 0 marks 0 0 0
0-63: node 0x516004c75b80x max 0 parent (nil)x shift 0 count 16 values 31 array 0x55b2335be180x list 0x516004c75b98x 0x516004c75b98x marks 0 0 0
16: value 16 (0x10) [0x21x]
17: value 16 (0x10) [0x21x]
18: value 16 (0x10) [0x21x]
19: value 16 (0x10) [0x21x]

so I understand why this took a long time to show up.  For most uses,
you can't tell the difference between these situations.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-03-05 18:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-26  7:03 Softlockup when test shmem swapout-swapin and compaction Liu Shixin
2025-02-26  7:22 ` Baolin Wang
2025-02-27  7:04   ` Liu Shixin
2025-02-27 23:43     ` Zi Yan
2025-02-28  1:52       ` Liu Shixin
2025-02-28  3:39       ` Baolin Wang
2025-02-28 15:19         ` Zi Yan
2025-03-05 16:27   ` Zi Yan
2025-03-05 18:44     ` Zi Yan
2025-03-05 18:51       ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox