[BUG] soft lockup in filemap_get_read

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [BUG] soft lockup in filemap_get_read_batch
@ 2023-10-03 13:48 antal.nemes
  2023-10-03 22:58 ` Dave Chinner
  2024-04-16  9:31 ` [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock during migration zhaoyang.huang
  0 siblings, 2 replies; 5+ messages in thread
From: antal.nemes @ 2023-10-03 13:48 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-fsdevel, Daniel Dao

Hi Matthew,

We have observed intermittent soft lockups on at least seven different hosts:
- six hosts ran 6.2.8.fc37-200
- one host ran 6.0.13.fc37-200

The list of affected hosts is growing.

Stack traces are all similar:

emerg kern kernel - - watchdog: BUG: soft lockup - CPU#7 stuck for 17117s! [postmaster:2238460]
warning kern kernel - - Modules linked in: target_core_user uio target_core_pscsi target_core_file target_core_iblock nbd loop nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver fscache netfs veth iscsi_tcp libiscsi_tcp libiscsi iscsi_target_mod target_core_mod scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bochs drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul i2c_piix4 crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_balloon joydev pcspkr xfs crc32c_intel virtio_net serio_raw ata_generic net_failover failover virtio_scsi pata_acpi qemu_fw_cfg fuse [last unloaded: nbd]
warning kern kernel - - CPU: 7 PID: 2238460 Comm: postmaster Kdump: loaded Tainted: G             L     6.2.8-200.fc37.x86_64 #1
warning kern kernel - - Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
warning kern kernel - - RIP: 0010:xas_descend+0x28/0x70
warning kern kernel - - Code: 90 90 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
warning kern kernel - - RSP: 0018:ffffab66c9f4bb98 EFLAGS: 00000246
warning kern kernel - - RAX: 00000000000000c2 RBX: ffffab66c9f4bbb8 RCX: 0000000000000002
warning kern kernel - - RDX: 0000000000000032 RSI: ffff89cd6c8cd6d0 RDI: ffffab66c9f4bbb8
warning kern kernel - - RBP: ffff89cd6c8cd6d0 R08: ffffab66c9f4be20 R09: 0000000000000000
warning kern kernel - - R10: 0000000000000001 R11: 0000000000000100 R12: 00000000000000b3
warning kern kernel - - R13: 00000000000000b2 R14: 00000000000000b2 R15: ffffab66c9f4be48
warning kern kernel - - FS:  00007ff1e8bfb540(0000) GS:ffff89d35fbc0000(0000) knlGS:0000000000000000
warning kern kernel - - CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
warning kern kernel - - CR2: 00007ff1e8af0768 CR3: 000000016fdde001 CR4: 00000000003706e0
warning kern kernel - - Call Trace:
warning kern kernel - -  <TASK>
warning kern kernel - -  xas_load+0x3d/0x50
warning kern kernel - -  filemap_get_read_batch+0x179/0x270
warning kern kernel - -  filemap_get_pages+0xa9/0x690
warning kern kernel - -  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
warning kern kernel - -  filemap_read+0xd2/0x340
warning kern kernel - -  ? filemap_read+0x32f/0x340
warning kern kernel - -  xfs_file_buffered_read+0x4f/0xd0 [xfs]
warning kern kernel - -  xfs_file_read_iter+0x70/0xe0 [xfs]
warning kern kernel - -  vfs_read+0x23c/0x310
warning kern kernel - -  ksys_read+0x6b/0xf0
warning kern kernel - -  do_syscall_64+0x5b/0x80
warning kern kernel - -  ? syscall_exit_to_user_mode+0x17/0x40
warning kern kernel - -  ? do_syscall_64+0x67/0x80
warning kern kernel - -  ? do_syscall_64+0x67/0x80
warning kern kernel - -  ? __irq_exit_rcu+0x3d/0x140
warning kern kernel - -  entry_SYSCALL_64_after_hwframe+0x72/0xdc
warning kern kernel - - RIP: 0033:0x7ff1e5b20b25
warning kern kernel - - Code: fe ff ff 50 48 8d 3d 0a c9 06 00 e8 25 ee 01 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 f5 4b 2a 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89
warning kern kernel - - RSP: 002b:00007ffe1a5d8d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
warning kern kernel - - RAX: ffffffffffffffda RBX: 00000000035345c0 RCX: 00007ff1e5b20b25
warning kern kernel - - RDX: 0000000000002000 RSI: 00007ff1dc9c3080 RDI: 0000000000000032
warning kern kernel - - RBP: 0000000000000000 R08: 0000000000000009 R09: 0000000000000000
warning kern kernel - - R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
warning kern kernel - - R13: 00007ff1dc9c3080 R14: 0000000000000000 R15: 0000000001452148
warning kern kernel - -  </TASK>

Lockup is always reported from postgres process with all data and config on a XFS filesystem. 
Because this blocks a postgres process, lockup has a bunch of knock-on effects 
(invalid page errors, hanged or aborted transactions, tuple accumulation, etc). 
All occurrences eventually required a reboot to remedy.

Issue coincided with our rollout with the 6.x kernel. Previously we ran Rocky 
Linux 8 with 4.18.* (clone of RHEL8 kernel), so I recognize that this issue may 
not be new (AFAICT, livelocks were sporadically reported since folio merge in 5.17).

Issue takes anywhere from 2 days to 30+ days since boot to materialize, and lockups
are reported for duration ranging from 1min to 7 hours (the latter until it was 
manually rebooted). This is followed by a  period of relatively high load averages
(~2*#cpus), but low CPU usage. Memory usage was < 70%, so it does not appear 
to be a high-psi condition.

We are unable to reproduce the issue at will (i.e. by load/stress testing), but
the affected hosts have had multiple occurrences across reboots, so we should
be able to observe effects of any patches over a longer span.

From what I can tell, this appears to be similar to what was reported in
https://lore.kernel.org/linux-kernel/CA+wXwBS7YTHUmxGP3JrhcKMnYQJcd6=7HE+E1v-guk01L2K3Zw@mail.gmail.com/
and 
https://lore.kernel.org/linux-fsdevel/CA+wXwBRGab3UqbLqsr8xG=ZL2u9bgyDNNea4RGfTDjqB=J3geQ@mail.gmail.com/

> > We also have a deadlock reading a very specific file on this host. We managed to
> > do a kdump on this host and extracted out the state of the mapping.
>
> This is almost certainly a different bug, but alos XArray related, so
> I'll keep looking at this one.

I am not sure if the deadlock that Daniel observed matches our stack trace. 
Assuming yes, has there been any follow-up on this?

We tried the patch from https://bugzilla.kernel.org/show_bug.cgi?id=216646#c31 , but the
soft lockup reoccurred with the same signature.

Is there anything we can do to further aid in troubleshooting? If this is a folio
lock issue, would it be possible to trace where the lock was taken?

Best regards,
Antal



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] soft lockup in filemap_get_read_batch
  2023-10-03 13:48 [BUG] soft lockup in filemap_get_read_batch antal.nemes
@ 2023-10-03 22:58 ` Dave Chinner
  2023-10-04  8:36   ` Antal Nemes
  2024-04-16  9:31 ` [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock during migration zhaoyang.huang
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2023-10-03 22:58 UTC (permalink / raw)
  To: antal.nemes; +Cc: Matthew Wilcox, linux-mm, linux-fsdevel, Daniel Dao

On Tue, Oct 03, 2023 at 03:48:14PM +0200, antal.nemes@hycu.com wrote:
> Hi Matthew,
> 
> We have observed intermittent soft lockups on at least seven different hosts:
> - six hosts ran 6.2.8.fc37-200
> - one host ran 6.0.13.fc37-200
> 
> The list of affected hosts is growing.
> 
> Stack traces are all similar:
> 
> emerg kern kernel - - watchdog: BUG: soft lockup - CPU#7 stuck for 17117s! [postmaster:2238460]
> warning kern kernel - - Modules linked in: target_core_user uio target_core_pscsi target_core_file target_core_iblock nbd loop nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver fscache netfs veth iscsi_tcp libiscsi_tcp libiscsi iscsi_target_mod target_core_mod scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bochs drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul i2c_piix4 crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_balloon joydev pcspkr xfs crc32c_intel virtio_net serio_raw ata_generic net_failover failover virtio_scsi pata_acpi qemu_fw_cfg fuse [last unloaded: nbd]
> warning kern kernel - - CPU: 7 PID: 2238460 Comm: postmaster Kdump: loaded Tainted: G             L     6.2.8-200.fc37.x86_64 #1
> warning kern kernel - - Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
> warning kern kernel - - RIP: 0010:xas_descend+0x28/0x70
> warning kern kernel - - Code: 90 90 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
> warning kern kernel - - RSP: 0018:ffffab66c9f4bb98 EFLAGS: 00000246
> warning kern kernel - - RAX: 00000000000000c2 RBX: ffffab66c9f4bbb8 RCX: 0000000000000002
> warning kern kernel - - RDX: 0000000000000032 RSI: ffff89cd6c8cd6d0 RDI: ffffab66c9f4bbb8
> warning kern kernel - - RBP: ffff89cd6c8cd6d0 R08: ffffab66c9f4be20 R09: 0000000000000000
> warning kern kernel - - R10: 0000000000000001 R11: 0000000000000100 R12: 00000000000000b3
> warning kern kernel - - R13: 00000000000000b2 R14: 00000000000000b2 R15: ffffab66c9f4be48
> warning kern kernel - - FS:  00007ff1e8bfb540(0000) GS:ffff89d35fbc0000(0000) knlGS:0000000000000000
> warning kern kernel - - CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> warning kern kernel - - CR2: 00007ff1e8af0768 CR3: 000000016fdde001 CR4: 00000000003706e0
> warning kern kernel - - Call Trace:
> warning kern kernel - -  <TASK>
> warning kern kernel - -  xas_load+0x3d/0x50
> warning kern kernel - -  filemap_get_read_batch+0x179/0x270
> warning kern kernel - -  filemap_get_pages+0xa9/0x690
> warning kern kernel - -  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> warning kern kernel - -  filemap_read+0xd2/0x340
> warning kern kernel - -  ? filemap_read+0x32f/0x340
> warning kern kernel - -  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> warning kern kernel - -  xfs_file_read_iter+0x70/0xe0 [xfs]
> warning kern kernel - -  vfs_read+0x23c/0x310
> warning kern kernel - -  ksys_read+0x6b/0xf0
> warning kern kernel - -  do_syscall_64+0x5b/0x80
> warning kern kernel - -  ? syscall_exit_to_user_mode+0x17/0x40
> warning kern kernel - -  ? do_syscall_64+0x67/0x80
> warning kern kernel - -  ? do_syscall_64+0x67/0x80
> warning kern kernel - -  ? __irq_exit_rcu+0x3d/0x140
> warning kern kernel - -  entry_SYSCALL_64_after_hwframe+0x72/0xdc

Fixed by commit cbc02854331e ("XArray: Do not return sibling entries
from xa_load()").

Should already be backported to the lastest stable kernels.

-Dave.
-- 
Dave Chinner
david@fromorbit.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] soft lockup in filemap_get_read_batch
  2023-10-03 22:58 ` Dave Chinner
@ 2023-10-04  8:36   ` Antal Nemes
  2023-10-11 13:20     ` Antal Nemes
  0 siblings, 1 reply; 5+ messages in thread
From: Antal Nemes @ 2023-10-04  8:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Matthew Wilcox, linux-mm, linux-fsdevel, Daniel Dao

On Wed, Oct 04, 2023 at 09:58:04AM +1100, Dave Chinner wrote:
> On Tue, Oct 03, 2023 at 03:48:14PM +0200, antal.nemes@hycu.com wrote:
> > Hi Matthew,
> > 
> > We have observed intermittent soft lockups on at least seven different hosts:
> > - six hosts ran 6.2.8.fc37-200
> > - one host ran 6.0.13.fc37-200
> > 
> > The list of affected hosts is growing.
> > 
> > Stack traces are all similar:
> > 
> > emerg kern kernel - - watchdog: BUG: soft lockup - CPU#7 stuck for 17117s! [postmaster:2238460]
> > warning kern kernel - - Modules linked in: target_core_user uio target_core_pscsi target_core_file target_core_iblock nbd loop nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver fscache netfs veth iscsi_tcp libiscsi_tcp libiscsi iscsi_target_mod target_core_mod scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bochs drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul i2c_piix4 crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_balloon joydev pcspkr xfs crc32c_intel virtio_net serio_raw ata_generic net_failover failover virtio_scsi pata_acpi qemu_fw_cfg fuse [last unloaded: nbd]
> > warning kern kernel - - CPU: 7 PID: 2238460 Comm: postmaster Kdump: loaded Tainted: G             L     6.2.8-200.fc37.x86_64 #1
> > warning kern kernel - - Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
> > warning kern kernel - - RIP: 0010:xas_descend+0x28/0x70
> > warning kern kernel - - Code: 90 90 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
> > warning kern kernel - - RSP: 0018:ffffab66c9f4bb98 EFLAGS: 00000246
> > warning kern kernel - - RAX: 00000000000000c2 RBX: ffffab66c9f4bbb8 RCX: 0000000000000002
> > warning kern kernel - - RDX: 0000000000000032 RSI: ffff89cd6c8cd6d0 RDI: ffffab66c9f4bbb8
> > warning kern kernel - - RBP: ffff89cd6c8cd6d0 R08: ffffab66c9f4be20 R09: 0000000000000000
> > warning kern kernel - - R10: 0000000000000001 R11: 0000000000000100 R12: 00000000000000b3
> > warning kern kernel - - R13: 00000000000000b2 R14: 00000000000000b2 R15: ffffab66c9f4be48
> > warning kern kernel - - FS:  00007ff1e8bfb540(0000) GS:ffff89d35fbc0000(0000) knlGS:0000000000000000
> > warning kern kernel - - CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > warning kern kernel - - CR2: 00007ff1e8af0768 CR3: 000000016fdde001 CR4: 00000000003706e0
> > warning kern kernel - - Call Trace:
> > warning kern kernel - -  <TASK>
> > warning kern kernel - -  xas_load+0x3d/0x50
> > warning kern kernel - -  filemap_get_read_batch+0x179/0x270
> > warning kern kernel - -  filemap_get_pages+0xa9/0x690
> > warning kern kernel - -  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> > warning kern kernel - -  filemap_read+0xd2/0x340
> > warning kern kernel - -  ? filemap_read+0x32f/0x340
> > warning kern kernel - -  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> > warning kern kernel - -  xfs_file_read_iter+0x70/0xe0 [xfs]
> > warning kern kernel - -  vfs_read+0x23c/0x310
> > warning kern kernel - -  ksys_read+0x6b/0xf0
> > warning kern kernel - -  do_syscall_64+0x5b/0x80
> > warning kern kernel - -  ? syscall_exit_to_user_mode+0x17/0x40
> > warning kern kernel - -  ? do_syscall_64+0x67/0x80
> > warning kern kernel - -  ? do_syscall_64+0x67/0x80
> > warning kern kernel - -  ? __irq_exit_rcu+0x3d/0x140
> > warning kern kernel - -  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
> Fixed by commit cbc02854331e ("XArray: Do not return sibling entries
> from xa_load()").
> 
> Should already be backported to the lastest stable kernels.

The commit seems to be the same as the patch referenced in 
https://bugzilla.kernel.org/show_bug.cgi?id=216646#c31 

We have been running 6.2.8 with this patch, but the soft lockup still ocurred.

From https://lore.kernel.org/linux-fsdevel/CA+wXwBRGab3UqbLqsr8xG=ZL2u9bgyDNNea4RGfTDjqB=J3geQ@mail.gmail.com/
it looks like there could be a different issue at play (locked folio with null 
mapping)?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] soft lockup in filemap_get_read_batch
  2023-10-04  8:36   ` Antal Nemes
@ 2023-10-11 13:20     ` Antal Nemes
  0 siblings, 0 replies; 5+ messages in thread
From: Antal Nemes @ 2023-10-11 13:20 UTC (permalink / raw)
  To: Antal Nemes
  Cc: Dave Chinner, Matthew Wilcox, linux-mm, linux-fsdevel, Daniel Dao

On Wed, Oct 04, 2023 at 10:36:33AM +0200, Antal Nemes wrote:
> On Wed, Oct 04, 2023 at 09:58:04AM +1100, Dave Chinner wrote:
> > On Tue, Oct 03, 2023 at 03:48:14PM +0200, antal.nemes@hycu.com wrote:
> > > Hi Matthew,
> > > 
> > > We have observed intermittent soft lockups on at least seven different hosts:
> > > - six hosts ran 6.2.8.fc37-200
> > > - one host ran 6.0.13.fc37-200
> > > 
> > > The list of affected hosts is growing.
> > > 
> > > Stack traces are all similar:
> > > 
> > > emerg kern kernel - - watchdog: BUG: soft lockup - CPU#7 stuck for 17117s! [postmaster:2238460]
> > > warning kern kernel - - Modules linked in: target_core_user uio target_core_pscsi target_core_file target_core_iblock nbd loop nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver fscache netfs veth iscsi_tcp libiscsi_tcp libiscsi iscsi_target_mod target_core_mod scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bochs drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul i2c_piix4 crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_balloon joydev pcspkr xfs crc32c_intel virtio_net serio_raw ata_generic net_failover failover virtio_scsi pata_acpi qemu_fw_cfg fuse [last unloaded: nbd]
> > > warning kern kernel - - CPU: 7 PID: 2238460 Comm: postmaster Kdump: loaded Tainted: G             L     6.2.8-200.fc37.x86_64 #1
> > > warning kern kernel - - Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
> > > warning kern kernel - - RIP: 0010:xas_descend+0x28/0x70
> > > warning kern kernel - - Code: 90 90 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
> > > warning kern kernel - - RSP: 0018:ffffab66c9f4bb98 EFLAGS: 00000246
> > > warning kern kernel - - RAX: 00000000000000c2 RBX: ffffab66c9f4bbb8 RCX: 0000000000000002
> > > warning kern kernel - - RDX: 0000000000000032 RSI: ffff89cd6c8cd6d0 RDI: ffffab66c9f4bbb8
> > > warning kern kernel - - RBP: ffff89cd6c8cd6d0 R08: ffffab66c9f4be20 R09: 0000000000000000
> > > warning kern kernel - - R10: 0000000000000001 R11: 0000000000000100 R12: 00000000000000b3
> > > warning kern kernel - - R13: 00000000000000b2 R14: 00000000000000b2 R15: ffffab66c9f4be48
> > > warning kern kernel - - FS:  00007ff1e8bfb540(0000) GS:ffff89d35fbc0000(0000) knlGS:0000000000000000
> > > warning kern kernel - - CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > warning kern kernel - - CR2: 00007ff1e8af0768 CR3: 000000016fdde001 CR4: 00000000003706e0
> > > warning kern kernel - - Call Trace:
> > > warning kern kernel - -  <TASK>
> > > warning kern kernel - -  xas_load+0x3d/0x50
> > > warning kern kernel - -  filemap_get_read_batch+0x179/0x270
> > > warning kern kernel - -  filemap_get_pages+0xa9/0x690
> > > warning kern kernel - -  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> > > warning kern kernel - -  filemap_read+0xd2/0x340
> > > warning kern kernel - -  ? filemap_read+0x32f/0x340
> > > warning kern kernel - -  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> > > warning kern kernel - -  xfs_file_read_iter+0x70/0xe0 [xfs]
> > > warning kern kernel - -  vfs_read+0x23c/0x310
> > > warning kern kernel - -  ksys_read+0x6b/0xf0
> > > warning kern kernel - -  do_syscall_64+0x5b/0x80
> > > warning kern kernel - -  ? syscall_exit_to_user_mode+0x17/0x40
> > > warning kern kernel - -  ? do_syscall_64+0x67/0x80
> > > warning kern kernel - -  ? do_syscall_64+0x67/0x80
> > > warning kern kernel - -  ? __irq_exit_rcu+0x3d/0x140
> > > warning kern kernel - -  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > 
> > Fixed by commit cbc02854331e ("XArray: Do not return sibling entries
> > from xa_load()").
> > 
> > Should already be backported to the lastest stable kernels.
> 
> The commit seems to be the same as the patch referenced in 
> https://bugzilla.kernel.org/show_bug.cgi?id=216646#c31 
> 
> We have been running 6.2.8 with this patch, but the soft lockup still ocurred.
> 
> >From https://lore.kernel.org/linux-fsdevel/CA+wXwBRGab3UqbLqsr8xG=ZL2u9bgyDNNea4RGfTDjqB=J3geQ@mail.gmail.com/
> it looks like there could be a different issue at play (locked folio with null 
> mapping)?
>

Daniel successfully worked around this issue by reverting 
6795801366da0cd3d99e27c37f020a8f16714886 (xfs: Support large folios).

We will follow suit for the time being.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock during migration
  2023-10-03 13:48 [BUG] soft lockup in filemap_get_read_batch antal.nemes
  2023-10-03 22:58 ` Dave Chinner
@ 2024-04-16  9:31 ` zhaoyang.huang
  1 sibling, 0 replies; 5+ messages in thread
From: zhaoyang.huang @ 2024-04-16  9:31 UTC (permalink / raw)
  To: antal.nemes; +Cc: dqminh, linux-fsdevel, linux-mm, steve.kang, huangzhaoyang

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

Livelock in [1] is reported multitimes since v515, where the zero-ref
folio is repeatly found on the page cache by find_get_entry. A possible
timing sequence is proposed in [2], which can be described briefly as
the lockless xarray operation could get harmed by an illegal folio
remaining on the slot[offset]. This commit would like to protect
the xa split stuff(folio_ref_freeze and __split_huge_page) under
lruvec->lock to remove the race window.

[1]
[167789.800297] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[167726.780305] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P155
[167726.780319] (detected by 3, t=17256977 jiffies, g=19883597, q=2397394)
[167726.780325] task:kswapd0         state:R  running task     stack:   24 pid:  155 ppid:     2 flags:0x00000008
[167789.800308] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P155
[167789.800322] (detected by 3, t=17272732 jiffies, g=19883597, q=2397470)
[167789.800328] task:kswapd0         state:R  running task     stack:   24 pid:  155 ppid:     2 flags:0x00000008
[167789.800339] Call trace:
[167789.800342]  dump_backtrace.cfi_jt+0x0/0x8
[167789.800355]  show_stack+0x1c/0x2c
[167789.800363]  sched_show_task+0x1ac/0x27c
[167789.800370]  print_other_cpu_stall+0x314/0x4dc
[167789.800377]  check_cpu_stall+0x1c4/0x36c
[167789.800382]  rcu_sched_clock_irq+0xe8/0x388
[167789.800389]  update_process_times+0xa0/0xe0
[167789.800396]  tick_sched_timer+0x7c/0xd4
[167789.800404]  __run_hrtimer+0xd8/0x30c
[167789.800408]  hrtimer_interrupt+0x1e4/0x2d0
[167789.800414]  arch_timer_handler_phys+0x5c/0xa0
[167789.800423]  handle_percpu_devid_irq+0xbc/0x318
[167789.800430]  handle_domain_irq+0x7c/0xf0
[167789.800437]  gic_handle_irq+0x54/0x12c
[167789.800445]  call_on_irq_stack+0x40/0x70
[167789.800451]  do_interrupt_handler+0x44/0xa0
[167789.800457]  el1_interrupt+0x34/0x64
[167789.800464]  el1h_64_irq_handler+0x1c/0x2c
[167789.800470]  el1h_64_irq+0x7c/0x80
[167789.800474]  xas_find+0xb4/0x28c
[167789.800481]  find_get_entry+0x3c/0x178
[167789.800487]  find_lock_entries+0x98/0x2f8
[167789.800492]  __invalidate_mapping_pages.llvm.3657204692649320853+0xc8/0x224
[167789.800500]  invalidate_mapping_pages+0x18/0x28
[167789.800506]  inode_lru_isolate+0x140/0x2a4
[167789.800512]  __list_lru_walk_one+0xd8/0x204
[167789.800519]  list_lru_walk_one+0x64/0x90
[167789.800524]  prune_icache_sb+0x54/0xe0
[167789.800529]  super_cache_scan+0x160/0x1ec
[167789.800535]  do_shrink_slab+0x20c/0x5c0
[167789.800541]  shrink_slab+0xf0/0x20c
[167789.800546]  shrink_node_memcgs+0x98/0x320
[167789.800553]  shrink_node+0xe8/0x45c
[167789.800557]  balance_pgdat+0x464/0x814
[167789.800563]  kswapd+0xfc/0x23c
[167789.800567]  kthread+0x164/0x1c8
[167789.800573]  ret_from_fork+0x10/0x20

[2]
Thread_isolate:
1. alloc_contig_range->isolate_migratepages_block isolate a certain of
pages to cc->migratepages via pfn
       (folio has refcount: 1 + n (alloc_pages, page_cache))

2. alloc_contig_range->migrate_pages->folio_ref_freeze(folio, 1 +
extra_pins) set the folio->refcnt to 0

3. alloc_contig_range->migrate_pages->xas_split split the folios to
each slot as folio from slot[offset] to slot[offset + sibs]

4. alloc_contig_range->migrate_pages->__split_huge_page->folio_lruvec_lock
failed which have the folio be failed in setting refcnt to 2

5. Thread_kswapd enter the livelock by the chain below
      rcu_read_lock();
   retry:
        find_get_entry
            folio = xas_find
            if(!folio_try_get_rcu)
                xas_reset;
            goto retry;
      rcu_read_unlock();

5'. Thread_holdlock as the lruvec->lru_lock holder could be stalled in
the same core of Thread_kswapd.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 mm/huge_memory.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9859aa4f7553..418e8d03480a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2891,7 +2891,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 {
 	struct folio *folio = page_folio(page);
 	struct page *head = &folio->page;
-	struct lruvec *lruvec;
+	struct lruvec *lruvec = folio_lruvec(folio);
 	struct address_space *swap_cache = NULL;
 	unsigned long offset = 0;
 	int i, nr_dropped = 0;
@@ -2908,8 +2908,6 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 		xa_lock(&swap_cache->i_pages);
 	}
 
-	/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
-	lruvec = folio_lruvec_lock(folio);
 
 	ClearPageHasHWPoisoned(head);
 
@@ -2942,7 +2940,6 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 
 		folio_set_order(new_folio, new_order);
 	}
-	unlock_page_lruvec(lruvec);
 	/* Caller disabled irqs, so they are still disabled here */
 
 	split_page_owner(head, order, new_order);
@@ -2961,7 +2958,6 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 		folio_ref_add(folio, 1 + new_nr);
 		xa_unlock(&folio->mapping->i_pages);
 	}
-	local_irq_enable();
 
 	if (nr_dropped)
 		shmem_uncharge(folio->mapping->host, nr_dropped);
@@ -3048,6 +3044,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 	int extra_pins, ret;
 	pgoff_t end;
 	bool is_hzp;
+	struct lruvec *lruvec;
 
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 	VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
@@ -3159,6 +3156,14 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 
 	/* block interrupt reentry in xa_lock and spinlock */
 	local_irq_disable();
+
+	/*
+	 * take lruvec's lock before freeze the folio to prevent the folio
+	 * remains in the page cache with refcnt == 0, which could lead to
+	 * find_get_entry enters livelock by iterating the xarray.
+	 */
+	lruvec = folio_lruvec_lock(folio);
+
 	if (mapping) {
 		/*
 		 * Check if the folio is present in page cache.
@@ -3203,12 +3208,16 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 		}
 
 		__split_huge_page(page, list, end, new_order);
+		unlock_page_lruvec(lruvec);
+		local_irq_enable();
 		ret = 0;
 	} else {
 		spin_unlock(&ds_queue->split_queue_lock);
 fail:
 		if (mapping)
 			xas_unlock(&xas);
+
+		unlock_page_lruvec(lruvec);
 		local_irq_enable();
 		remap_page(folio, folio_nr_pages(folio));
 		ret = -EAGAIN;
-- 
2.25.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-16  9:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-03 13:48 [BUG] soft lockup in filemap_get_read_batch antal.nemes
2023-10-03 22:58 ` Dave Chinner
2023-10-04  8:36   ` Antal Nemes
2023-10-11 13:20     ` Antal Nemes
2024-04-16  9:31 ` [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock during migration zhaoyang.huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox