On Tue, 2018-02-06 at 12:12 +0500, Mikhail Gavrilov wrote: > On 6 February 2018 at 11:08, Dave Chinner wrote: > > You collected a trace of something, but didn't supply any of the > > other storage and fs config stuff that was mentioned in that link. > > Sorry. > Anyway this information existed in attached dmesg. > > > > Cant you now look into in? > > > > I don't see a filesystem problem from the log you've posted, I see > > some slow IO a minute after boot, then a lockdep false positive > > about 40mins in, but the system then reports GPU memory allocation > > problems and hardware MCEs for the next 4-5 hours before the GPU > > appears to stop working. > > Lockdep about 40mins it's normal? I don't think so. > > > > > [....] > > > > > [ 4.687255] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) > > > > I'm guessing that you have an ssd with ext4 and two 4TB drives. > > And ext4 is on the SSD? > > Yes, right. > > > > > > [ 5.778628] EXT4-fs (sda1): re-mounted. Opts: (null) > > > > ..... > > > [ 7.918812] XFS (sdb): Mounting V5 Filesystem > > > [ 8.123854] XFS (sdb): Starting recovery (logdev: internal) > > > > And there's an XFS filesystem on one drive... > > Yep. > > > > > > [ 77.459679] sysrq: SysRq : Show Blocked State > > > [ 77.459693] task PC stack pid father > > > [ 77.459947] tracker-store D12296 2469 1847 0x00000000 > > > [ 77.459957] Call Trace: > > > [ 77.459963] __schedule+0x2dc/0xba0 > > > [ 77.459966] ? _raw_spin_unlock_irq+0x2c/0x40 > > > [ 77.459970] schedule+0x33/0x90 > > > [ 77.459974] io_schedule+0x16/0x40 > > > [ 77.459978] generic_file_read_iter+0x3b8/0xe10 > > > [ 77.459986] ? page_cache_tree_insert+0x140/0x140 > > > [ 77.460026] xfs_file_buffered_aio_read+0x6e/0x1a0 [xfs] > > > [ 77.460054] xfs_file_read_iter+0x68/0xc0 [xfs] > > > [ 77.460058] __vfs_read+0xf1/0x160 > > > [ 77.460065] vfs_read+0xa3/0x150 > > > [ 77.460069] SyS_pread64+0x98/0xc0 > > > [ 77.460074] entry_SYSCALL_64_fastpath+0x1f/0x96 > > > > That's waiting on a read IO - no indication of anything being wrong > > here.... > > > > > [ 2095.241660] TaskSchedulerFo (16168) used greatest stack depth: 10232 bytes left > > > > > > [ 2173.204790] ====================================================== > > > [ 2173.204791] WARNING: possible circular locking dependency detected > > > [ 2173.204793] 4.15.0-rc4-amd-vega+ #8 Not tainted > > > [ 2173.204794] ------------------------------------------------------ > > > [ 2173.204795] gnome-shell/1971 is trying to acquire lock: > > > [ 2173.204796] (sb_internal){.+.+}, at: [<00000000221fd49d>] xfs_trans_alloc+0xec/0x130 [xfs] > > > [ 2173.204832] > > > but task is already holding lock: > > > [ 2173.204833] (fs_reclaim){+.+.}, at: [<00000000bdc32871>] fs_reclaim_acquire.part.74+0x5/0x30 > > > [ 2173.204837] > > > which lock already depends on the new lock. > > > > And here we go again on another lockdep memory-reclaim false positive > > whack-a-mole game. > > Here occurring interface lagging. > > > > > > [ 2173.204838] > > > the existing dependency chain (in reverse order) is: > > > [ 2173.204839] > > > -> #1 (fs_reclaim){+.+.}: > > > [ 2173.204843] fs_reclaim_acquire.part.74+0x29/0x30 > > > [ 2173.204844] fs_reclaim_acquire+0x19/0x20 > > > [ 2173.204846] kmem_cache_alloc+0x33/0x300 > > > [ 2173.204870] kmem_zone_alloc+0x6c/0xf0 [xfs] > > > [ 2173.204891] xfs_trans_alloc+0x6b/0x130 [xfs] > > > [ 2173.204912] xfs_efi_recover+0x11c/0x1c0 [xfs] > > > [ 2173.204932] xlog_recover_process_efi+0x41/0x60 [xfs] > > > [ 2173.204951] xlog_recover_process_intents.isra.40+0x138/0x270 [xfs] > > > [ 2173.204969] xlog_recover_finish+0x23/0xb0 [xfs] > > > [ 2173.204987] xfs_log_mount_finish+0x61/0xe0 [xfs] > > > [ 2173.205005] xfs_mountfs+0x657/0xa60 [xfs] > > > [ 2173.205022] xfs_fs_fill_super+0x4aa/0x630 [xfs] > > > [ 2173.205024] mount_bdev+0x184/0x1c0 > > > [ 2173.205042] xfs_fs_mount+0x15/0x20 [xfs] > > > [ 2173.205043] mount_fs+0x32/0x150 > > > [ 2173.205045] vfs_kern_mount.part.25+0x5d/0x160 > > > [ 2173.205046] do_mount+0x65d/0xde0 > > > [ 2173.205047] SyS_mount+0x98/0xe0 > > > [ 2173.205049] do_syscall_64+0x6c/0x220 > > > [ 2173.205052] return_from_SYSCALL_64+0x0/0x75 > > > [ 2173.205053] > > > -> #0 (sb_internal){.+.+}: > > > [ 2173.205056] lock_acquire+0xa3/0x1f0 > > > [ 2173.205058] __sb_start_write+0x11c/0x190 > > > [ 2173.205075] xfs_trans_alloc+0xec/0x130 [xfs] > > > [ 2173.205091] xfs_free_eofblocks+0x12a/0x1e0 [xfs] > > > [ 2173.205108] xfs_inactive+0xf0/0x110 [xfs] > > > [ 2173.205125] xfs_fs_destroy_inode+0xbb/0x2d0 [xfs] > > > [ 2173.205127] destroy_inode+0x3b/0x60 > > > [ 2173.205128] evict+0x13e/0x1a0 > > > [ 2173.205129] dispose_list+0x56/0x80 > > > [ 2173.205131] prune_icache_sb+0x5a/0x80 > > > [ 2173.205132] super_cache_scan+0x137/0x1b0 > > > [ 2173.205134] shrink_slab.part.47+0x1fb/0x590 > > > [ 2173.205135] shrink_slab+0x29/0x30 > > > [ 2173.205136] shrink_node+0x11e/0x2f0 > > > [ 2173.205137] do_try_to_free_pages+0xd0/0x350 > > > [ 2173.205138] try_to_free_pages+0x136/0x340 > > > [ 2173.205140] __alloc_pages_slowpath+0x487/0x1150 > > > [ 2173.205141] __alloc_pages_nodemask+0x3a8/0x430 > > > [ 2173.205143] dma_generic_alloc_coherent+0x91/0x160 > > > [ 2173.205146] x86_swiotlb_alloc_coherent+0x25/0x50 > > > [ 2173.205150] ttm_dma_pool_get_pages+0x230/0x630 [ttm] > > > > OK, new symptom of the ages old problem with using lockdep for > > annotating things that are not locks. In this case, it's both > > memory reclaim and filesystem freeze annotations that are colliding > > with an XFS function that can be called above and below memory > > allocation and producing a false positive. > > > > i.e. it's perfectly safe for us to call xfs_trans_alloc() in the > > manner we are from memory reclaim because we're not in a GFP_NOFS or > > PF_MEMALLOC_NOFS context. > > > > And it's also perfectly safe for us to call xfs_trans_alloc from log > > recovery at mount time like we are because the filesystem cannot be > > frozen before a mount is complete and hence sb_internal ordering is > > completely irrelevant at that point. > > > > So it's a false positive, and I don't think there's anything we can > > do to prevent it because using __GFP_NOLOCKDEP in xfs_trans_alloc() > > will mean lockdep will not warn we we have a real deadlock due to > > transaction nesting in memory reclaim contexts..... > > > > From here, there's nothing filesystem related in the logs: > > But here I am feel system hang. > > > > > [.....] > > > > > [ 2229.274826] swiotlb: coherent allocation failed for device 0000:07:00.0 size=2097152 > > > > You are getting gpu memory allocation failures.... > > > > > [ 2234.832320] amdgpu 0000:07:00.0: swiotlb buffer is full (sz: 2097152 bytes) > > > [ 2234.832325] swiotlb: coherent allocation failed for device 0000:07:00.0 size=2097152 > > > > repeatedly, until .... > > > > > [ 2938.815747] mce: [Hardware Error]: Machine check events logged > > > > your hardware starts throwing errors at the CPU. > > It's false positive events due Haswell CPU under virtualization. > https://bugs.launchpad.net/qemu/+bug/1307225 > I am really don't know why Intel still not fix it. > > > > > > [ 2999.259697] kworker/dying (16220) used greatest stack depth: 9808 bytes left > > > [ 3151.714448] perf: interrupt took too long (2521 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 > > > [ 5331.990934] TCP: request_sock_TCP: Possible SYN flooding on port 8201. Sending cookies. Check SNMP counters. > > > [ 5331.991837] TCP: request_sock_TCP: Possible SYN flooding on port 9208. Sending cookies. Check SNMP counters. > > > [ 5334.781978] TCP: request_sock_TCP: Possible SYN flooding on port 7171. Sending cookies. Check SNMP counters. > > > > other bad things are happening to your machine.... > > What it is means? > > > > > > [ 5354.636542] sbis3plugin[29294]: segfault at 8 ip 000000001c84acaf sp 0000000038851b8c error 4 in > > > libQt5Core.so[7f87ee665000+5a3000] > > > [ 5794.612947] perf: interrupt took too long (3152 > 3151), lowering kernel.perf_event_max_sample_rate to 63000 > > > [ 6242.114852] amdgpu 0000:07:00.0: swiotlb buffer is full (sz: 2097152 bytes) > > > [ 6242.114857] swiotlb: coherent allocation failed for device 0000:07:00.0 size=2097152 > > > > Random userspace segfaults and more gpu memory allocation failures > > sbis3plugin segfault is already reported so not needed pay attention to this. > > > > > > [ 9663.267767] mce: [Hardware Error]: Machine check events logged > > > [10322.649619] mce: [Hardware Error]: Machine check events logged > > > [10557.294312] amdgpu 0000:07:00.0: swiotlb buffer is full (sz: 2097152 bytes) > > > [10557.294317] swiotlb: coherent allocation failed for device 0000:07:00.0 size=2097152 > > > > more hardware and gpu memory allocation failures > > > > [more gpu memalloc failures] > > > > > [13609.734065] mce: [Hardware Error]: Machine check events logged > > > [13920.399283] mce: [Hardware Error]: Machine check events logged > > I would not pay attention to "mce: [Hardware Error]: Machine check > events logged" because this it well-known problem with virtualization > Haswell CPU. > > > > [14116.872461] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=1653418, last > > > emitted seq=1653420 > > > [14116.872466] [drm] No hardware hang detected. Did some blocks stall? > > This is too well-known problem with latest AMD Vega GPU > https://bugs.freedesktop.org/show_bug.cgi?id=104001 > > > And finally after 4+ hours hardware errors and the GPU times out and > > drm is confused.... > > > > So there doesn't appear to be any filesystem problem here, just a > > heavily loaded system under memory pressure.... > > This is too strange because machine have enough amount of physical > memory 32GB and only half was used. > > -- > Best Regards, > Mike Gavrilov. > Yet another hung: Trace report: https://dumps.sy24.ru/1/trace_report.txt.bz2 (9.4 MB) dmesg: [ 369.374381] INFO: task TaskSchedulerFo:5624 blocked for more than 120 seconds. [ 369.374391] Not tainted 4.15.0-rc4-amd-vega+ #9 [ 369.374393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.374395] TaskSchedulerFo D11688 5624 3825 0x00000000 [ 369.374400] Call Trace: [ 369.374407] __schedule+0x2dc/0xba0 [ 369.374410] ? __lock_acquire+0x2d4/0x1350 [ 369.374415] ? __down+0x84/0x110 [ 369.374417] schedule+0x33/0x90 [ 369.374419] schedule_timeout+0x25a/0x5b0 [ 369.374423] ? mark_held_locks+0x5f/0x90 [ 369.374425] ? _raw_spin_unlock_irq+0x2c/0x40 [ 369.374426] ? __down+0x84/0x110 [ 369.374429] ? trace_hardirqs_on_caller+0xf4/0x190 [ 369.374431] ? __down+0x84/0x110 [ 369.374433] __down+0xac/0x110 [ 369.374466] ? _xfs_buf_find+0x263/0xac0 [xfs] [ 369.374470] down+0x41/0x50 [ 369.374472] ? down+0x41/0x50 [ 369.374490] xfs_buf_lock+0x4e/0x270 [xfs] [ 369.374507] _xfs_buf_find+0x263/0xac0 [xfs] [ 369.374528] xfs_buf_get_map+0x29/0x490 [xfs] [ 369.374545] xfs_buf_read_map+0x2b/0x300 [xfs] [ 369.374567] xfs_trans_read_buf_map+0xc4/0x5d0 [xfs] [ 369.374585] xfs_read_agi+0xaa/0x200 [xfs] [ 369.374605] xfs_iunlink+0x4d/0x150 [xfs] [ 369.374609] ? current_time+0x32/0x70 [ 369.374629] xfs_droplink+0x54/0x60 [xfs] [ 369.374654] xfs_rename+0xb15/0xd10 [xfs] [ 369.374680] xfs_vn_rename+0xd3/0x140 [xfs] [ 369.374687] vfs_rename+0x476/0x960 [ 369.374695] SyS_rename+0x33f/0x390 [ 369.374704] entry_SYSCALL_64_fastpath+0x1f/0x96 [ 369.374707] RIP: 0033:0x7f01cf705137 [ 369.374708] RSP: 002b:00007f01873e5608 EFLAGS: 00000202 ORIG_RAX: 0000000000000052 [ 369.374710] RAX: ffffffffffffffda RBX: 0000000000000119 RCX: 00007f01cf705137 [ 369.374711] RDX: 00007f01873e56dc RSI: 00003a5cd3540850 RDI: 00003a5cd7ea8000 [ 369.374713] RBP: 00007f01873e6340 R08: 0000000000000000 R09: 00007f01873e54e0 [ 369.374714] R10: 00007f01873e55f0 R11: 0000000000000202 R12: 00007f01873e6218 [ 369.374715] R13: 00007f01873e6358 R14: 0000000000000000 R15: 00003a5cd8416000 [ 369.374725] INFO: task disk_cache:0:3971 blocked for more than 120 seconds. [ 369.374727] Not tainted 4.15.0-rc4-amd-vega+ #9 [ 369.374729] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.374731] disk_cache:0 D12432 3971 3903 0x00000000 [ 369.374735] Call Trace: [ 369.374738] __schedule+0x2dc/0xba0 [ 369.374743] ? wait_for_completion+0x10e/0x1a0 [ 369.374745] schedule+0x33/0x90 [ 369.374747] schedule_timeout+0x25a/0x5b0 [ 369.374751] ? mark_held_locks+0x5f/0x90 [ 369.374753] ? _raw_spin_unlock_irq+0x2c/0x40 [ 369.374755] ? wait_for_completion+0x10e/0x1a0 [ 369.374757] ? trace_hardirqs_on_caller+0xf4/0x190 [ 369.374760] ? wait_for_completion+0x10e/0x1a0 [ 369.374762] wait_for_completion+0x136/0x1a0 [ 369.374765] ? wake_up_q+0x80/0x80 [ 369.374782] ? _xfs_buf_read+0x23/0x30 [xfs] [ 369.374798] xfs_buf_submit_wait+0xb2/0x530 [xfs] [ 369.374814] _xfs_buf_read+0x23/0x30 [xfs] [ 369.374828] xfs_buf_read_map+0x14b/0x300 [xfs] [ 369.374847] ? xfs_trans_read_buf_map+0xc4/0x5d0 [xfs] [ 369.374867] xfs_trans_read_buf_map+0xc4/0x5d0 [xfs] [ 369.374883] xfs_da_read_buf+0xca/0x110 [xfs] [ 369.374901] xfs_dir3_data_read+0x23/0x60 [xfs] [ 369.374916] xfs_dir2_leaf_addname+0x335/0x8b0 [xfs] [ 369.374936] xfs_dir_createname+0x17e/0x1d0 [xfs] [ 369.374956] xfs_create+0x6ad/0x840 [xfs] [ 369.374981] xfs_generic_create+0x1fa/0x2d0 [xfs] [ 369.375000] xfs_vn_mknod+0x14/0x20 [xfs] [ 369.375016] xfs_vn_create+0x13/0x20 [xfs] [ 369.375018] lookup_open+0x5ea/0x7c0 [ 369.375025] ? __wake_up_common_lock+0x65/0xc0 [ 369.375032] path_openat+0x318/0xc80 [ 369.375039] do_filp_open+0x9b/0x110 [ 369.375047] ? _raw_spin_unlock+0x27/0x40 [ 369.375053] do_sys_open+0x1ba/0x250 [ 369.375055] ? do_sys_open+0x1ba/0x250 [ 369.375059] SyS_openat+0x14/0x20 [ 369.375062] entry_SYSCALL_64_fastpath+0x1f/0x96 [ 369.375063] RIP: 0033:0x7f616bf1b080 [ 369.375064] RSP: 002b:00007f614bd56930 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 [ 369.375067] RAX: ffffffffffffffda RBX: 00003d8825112800 RCX: 00007f616bf1b080 [ 369.375068] RDX: 0000000000080041 RSI: 00003d8824da6070 RDI: ffffffffffffff9c [ 369.375069] RBP: 0000000000000022 R08: 0000000000000000 R09: 0000000000000050 [ 369.375070] R10: 00000000000001a4 R11: 0000000000000293 R12: 00007f614bd569c8 [ 369.375071] R13: 0000000000000008 R14: 00003d8824da6150 R15: 0000000000000008 [ 369.375206] Showing all locks held in the system: [ 369.375215] 5 locks held by kworker/2:1/60: [ 369.375221] #0: ((wq_completion)"xfs-eofblocks/%s"mp->m_fsname){+.+.}, at: [<00000000731c4c52>] process_one_work+0x1b9/0x680 [ 369.375230] #1: ((work_completion)(&(&mp->m_eofblocks_work)->work)){+.+.}, at: [<00000000731c4c52>] process_one_work+0x1b9/0x680 [ 369.375236] #2: (&sb->s_type->i_mutex_key#20){++++}, at: [<00000000d49e2308>] xfs_ilock_nowait+0x12d/0x270 [xfs] [ 369.375258] #3: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.375281] #4: (&xfs_nondir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.375301] 1 lock held by khungtaskd/67: [ 369.375302] #0: (tasklist_lock){.+.+}, at: [<000000006840dd64>] debug_show_all_locks+0x3d/0x1a0 [ 369.375314] 3 locks held by kworker/u16:5/148: [ 369.375315] #0: ((wq_completion)"writeback"){+.+.}, at: [<00000000731c4c52>] process_one_work+0x1b9/0x680 [ 369.375321] #1: ((work_completion)(&(&wb->dwork)->work)){+.+.}, at: [<00000000731c4c52>] process_one_work+0x1b9/0x680 [ 369.375327] #2: (&type->s_umount_key#63){++++}, at: [<0000000022e51a82>] trylock_super+0x1b/0x50 [ 369.375392] 4 locks held by gnome-shell/1970: [ 369.375393] #0: (&mm->mmap_sem){++++}, at: [<00000000642ae303>] vm_mmap_pgoff+0xa1/0x120 [ 369.375401] #1: (sb_writers#17){.+.+}, at: [<00000000626e98dc>] touch_atime+0x64/0xd0 [ 369.375408] #2: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.375430] #3: (&xfs_nondir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.375453] 4 locks held by pool/6879: [ 369.375454] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.375462] #1: (&type->i_mutex_dir_key#7/1){+.+.}, at: [<00000000d0bc23a2>] lock_rename+0xda/0x100 [ 369.375470] #2: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.375491] #3: (&xfs_nondir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.375520] 8 locks held by dconf-service/2129: [ 369.375521] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.375538] #1: (&type->i_mutex_dir_key#7/1){+.+.}, at: [<00000000d0bc23a2>] lock_rename+0xda/0x100 [ 369.375546] #2: (&sb->s_type->i_mutex_key#20){++++}, at: [<00000000926eb288>] lock_two_nondirectories+0x6d/0x80 [ 369.375553] #3: (&sb->s_type->i_mutex_key#20/4){+.+.}, at: [<0000000032f8e229>] lock_two_nondirectories+0x56/0x80 [ 369.375571] #4: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.375607] #5: (&xfs_dir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.375639] #6: (&xfs_nondir_ilock_class){++++}, at: [<000000005543d627>] xfs_ilock_nowait+0x194/0x270 [xfs] [ 369.375671] #7: (&xfs_nondir_ilock_class){++++}, at: [<000000005543d627>] xfs_ilock_nowait+0x194/0x270 [xfs] [ 369.375741] 1 lock held by tracker-store/2481: [ 369.375743] #0: (&sb->s_type->i_mutex_key#20){++++}, at: [<000000009a06b5ff>] xfs_ilock+0x1a6/0x210 [xfs] [ 369.375822] 8 locks held by TaskSchedulerBa/3894: [ 369.375824] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.375835] #1: (&type->i_mutex_dir_key#7/1){+.+.}, at: [<00000000d0bc23a2>] lock_rename+0xda/0x100 [ 369.375848] #2: (&inode->i_rwsem){++++}, at: [<00000000926eb288>] lock_two_nondirectories+0x6d/0x80 [ 369.375858] #3: (&inode->i_rwsem/4){+.+.}, at: [<0000000032f8e229>] lock_two_nondirectories+0x56/0x80 [ 369.375869] #4: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.375905] #5: (&xfs_nondir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.375936] #6: (&xfs_dir_ilock_class){++++}, at: [<000000005543d627>] xfs_ilock_nowait+0x194/0x270 [xfs] [ 369.375967] #7: (&xfs_nondir_ilock_class){++++}, at: [<000000005543d627>] xfs_ilock_nowait+0x194/0x270 [xfs] [ 369.375997] 6 locks held by TaskSchedulerFo/3896: [ 369.375999] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.376010] #1: (&type->i_mutex_dir_key#7/1){+.+.}, at: [<000000001fe370fd>] do_unlinkat+0x129/0x300 [ 369.376023] #2: (&inode->i_rwsem){++++}, at: [<00000000d6a8d3d3>] vfs_unlink+0x50/0x1c0 [ 369.376033] #3: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.376068] #4: (&xfs_dir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.376098] #5: (&xfs_nondir_ilock_class){++++}, at: [<000000005543d627>] xfs_ilock_nowait+0x194/0x270 [xfs] [ 369.376130] 2 locks held by TaskSchedulerFo/3897: [ 369.376132] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.376143] #1: (&type->i_mutex_dir_key#7){++++}, at: [<000000000a1a7597>] path_openat+0x2fe/0xc80 [ 369.376155] 4 locks held by TaskSchedulerFo/3898: [ 369.376157] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.376179] #1: (&type->i_mutex_dir_key#7){++++}, at: [<000000000a1a7597>] path_openat+0x2fe/0xc80 [ 369.376191] #2: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.376226] #3: (&xfs_dir_ilock_class/5){+.+.}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.376261] 3 locks held by TaskSchedulerFo/4004: [ 369.376263] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.376274] #1: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.376310] #2: (&xfs_nondir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.376341] 3 locks held by TaskSchedulerFo/4214: [ 369.376343] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.376353] #1: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.376388] #2: (&xfs_nondir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.376420] 8 locks held by TaskSchedulerFo/5624: [ 369.376421] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.376433] #1: (&type->i_mutex_dir_key#7/1){+.+.}, at: [<00000000d0bc23a2>] lock_rename+0xda/0x100 [ 369.376446] #2: (&sb->s_type->i_mutex_key#20){++++}, at: [<00000000926eb288>] lock_two_nondirectories+0x6d/0x80 [ 369.376457] #3: (&sb->s_type->i_mutex_key#20/4){+.+.}, at: [<0000000032f8e229>] lock_two_nondirectories+0x56/0x80 [ 369.376470] #4: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.376504] #5: (&xfs_dir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.376535] #6: (&xfs_nondir_ilock_class/2){+.+.}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.376568] #7: (&xfs_nondir_ilock_class/3){+.+.}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.376600] 2 locks held by TaskSchedulerFo/5625: [ 369.376602] #0: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.376636] #1: (&xfs_nondir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.376667] 2 locks held by TaskSchedulerFo/5627: [ 369.376669] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.376680] #1: (&type->i_mutex_dir_key#7/1){+.+.}, at: [<000000001fe370fd>] do_unlinkat+0x129/0x300 [ 369.376695] 5 locks held by disk_cache:0/3971: [ 369.376697] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.376708] #1: (&type->i_mutex_dir_key#7){++++}, at: [<000000000a1a7597>] path_openat+0x2fe/0xc80 [ 369.376719] #2: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.376754] #3: (&xfs_dir_ilock_class/5){+.+.}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.376786] #4: (&(&ip->i_lock)->mr_lock){+.+.}, at: [<000000005543d627>] xfs_ilock_nowait+0x194/0x270 [xfs] [ 369.376824] 1 lock held by firefox/4007: [ 369.376826] #0: (&type->i_mutex_dir_key#7){++++}, at: [<00000000487923d9>] path_openat+0x6d6/0xc80 [ 369.376840] 5 locks held by Cache2 I/O/4896: [ 369.376842] #0: (sb_writers#17){.+.+}, at: [<0000000090328571>] do_sys_ftruncate.constprop.17+0xdf/0x110 [ 369.376855] #1: (&sb->s_type->i_mutex_key#20){++++}, at: [<000000001bfdce57>] do_truncate+0x65/0xc0 [ 369.376866] #2: (&(&ip->i_mmaplock)->mr_lock){++++}, at: [<00000000493cf182>] xfs_ilock+0x156/0x210 [xfs] [ 369.376896] #3: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.376931] #4: (&xfs_nondir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.376964] 4 locks held by Classif~ Update/5798: [ 369.376966] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.376977] #1: (&type->i_mutex_dir_key#7/1){+.+.}, at: [<0000000083a49cad>] filename_create+0x83/0x160 [ 369.376990] #2: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.377025] #3: (&xfs_dir_ilock_class/5){+.+.}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.377059] 4 locks held by StreamTrans #29/6033: [ 369.377060] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.377071] #1: (&type->i_mutex_dir_key#7/1){+.+.}, at: [<00000000d0bc23a2>] lock_rename+0xda/0x100 [ 369.377084] #2: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.377119] #3: (&xfs_nondir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.377150] 3 locks held by QuotaManager IO/6194: [ 369.377152] #0: (&f->f_pos_lock){+.+.}, at: [<00000000a655448c>] __fdget_pos+0x4c/0x60 [ 369.377179] #1: (&type->i_mutex_dir_key#7){++++}, at: [<000000009c036bbe>] iterate_dir+0x53/0x1a0 [ 369.377199] #2: (&xfs_dir_ilock_class){++++}, at: [<00000000276bf747>] xfs_ilock+0xe6/0x210 [xfs] [ 369.377242] 2 locks held by StreamTrans #35/6237: [ 369.377244] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.377255] #1: (&type->i_mutex_dir_key#7/1){+.+.}, at: [<00000000d0bc23a2>] lock_rename+0xda/0x100 [ 369.377268] 3 locks held by DOM Worker/6246: [ 369.377270] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.377281] #1: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.377317] #2: (&xfs_nondir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.377349] 2 locks held by StreamTrans #42/6259: [ 369.377350] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.377361] #1: (&type->i_mutex_dir_key#7/1){+.+.}, at: [<00000000d0bc23a2>] lock_rename+0xda/0x100 [ 369.377377] 3 locks held by StreamTrans #48/6956: [ 369.377378] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.377389] #1: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.377424] #2: (&xfs_nondir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.377472] 1 lock held by pool/4595: [ 369.377474] #0: (&type->i_mutex_dir_key#7){++++}, at: [<00000000534237e8>] lookup_slow+0xe5/0x220 [ 369.377509] 1 lock held by worker/5171: [ 369.377511] #0: (&sb->s_type->i_mutex_key#20){++++}, at: [<000000009a06b5ff>] xfs_ilock+0x1a6/0x210 [xfs] [ 369.377558] 1 lock held by CPU 0/KVM/5172: [ 369.377561] #0: (&vcpu->mutex){+.+.}, at: [<00000000109f3ea1>] vcpu_load+0x1c/0x60 [kvm] [ 369.377601] 1 lock held by gitkraken/5407: [ 369.377604] #0: (&type->i_mutex_dir_key#7){++++}, at: [<00000000534237e8>] lookup_slow+0xe5/0x220 [ 369.377624] 3 locks held by gitkraken/5495: [ 369.377626] #0: (&f->f_pos_lock){+.+.}, at: [<00000000a655448c>] __fdget_pos+0x4c/0x60 [ 369.377642] #1: (&type->i_mutex_dir_key#7){++++}, at: [<000000009c036bbe>] iterate_dir+0x53/0x1a0 [ 369.377660] #2: (&xfs_dir_ilock_class){++++}, at: [<00000000276bf747>] xfs_ilock+0xe6/0x210 [xfs] [ 369.377742] 1 lock held by trace-cmd/6122: [ 369.377743] #0: (&pipe->mutex/1){+.+.}, at: [<0000000005d368c0>] pipe_lock+0x1f/0x30 [ 369.377756] 1 lock held by trace-cmd/6123: [ 369.377757] #0: (&pipe->mutex/1){+.+.}, at: [<0000000005d368c0>] pipe_lock+0x1f/0x30 [ 369.377768] 1 lock held by trace-cmd/6124: [ 369.377770] #0: (&pipe->mutex/1){+.+.}, at: [<0000000005d368c0>] pipe_lock+0x1f/0x30 [ 369.377781] 1 lock held by trace-cmd/6125: [ 369.377783] #0: (&pipe->mutex/1){+.+.}, at: [<0000000005d368c0>] pipe_lock+0x1f/0x30 [ 369.377795] 1 lock held by trace-cmd/6126: [ 369.377797] #0: (&pipe->mutex/1){+.+.}, at: [<0000000005d368c0>] pipe_lock+0x1f/0x30 [ 369.377808] 1 lock held by trace-cmd/6127: [ 369.377809] #0: (&pipe->mutex/1){+.+.}, at: [<0000000005d368c0>] pipe_lock+0x1f/0x30 [ 369.377821] 1 lock held by trace-cmd/6128: [ 369.377822] #0: (&pipe->mutex/1){+.+.}, at: [<0000000005d368c0>] pipe_lock+0x1f/0x30 [ 369.377833] 1 lock held by trace-cmd/6129: [ 369.377835] #0: (&pipe->mutex/1){+.+.}, at: [<0000000005d368c0>] pipe_lock+0x1f/0x30 [ 369.377851] 2 locks held by nautilus/6272: [ 369.377853] #0: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.377890] #1: (&xfs_nondir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.377933] 6 locks held by rm/6958: [ 369.377935] #0: (sb_writers#17){.+.+}, at: [<00000000e08ea99d>] mnt_want_write+0x24/0x50 [ 369.377946] #1: (&type->i_mutex_dir_key#7/1){+.+.}, at: [<000000001fe370fd>] do_unlinkat+0x129/0x300 [ 369.377960] #2: (&sb->s_type->i_mutex_key#20){++++}, at: [<00000000d6a8d3d3>] vfs_unlink+0x50/0x1c0 [ 369.377971] #3: (sb_internal#2){.+.+}, at: [<000000009149be51>] xfs_trans_alloc+0xec/0x130 [xfs] [ 369.378006] #4: (&xfs_dir_ilock_class){++++}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.378038] #5: (&xfs_nondir_ilock_class/1){+.+.}, at: [<000000009f144141>] xfs_ilock+0x16e/0x210 [xfs] [ 369.378073] ============================================= Again false positive? If it not fs problem why process blocked for such time? Expected behaviour: processes should not blocked even on high I/O load. I agree that they must work more slower than usual, but not stuck in some place as we see here. And if it caused not by fs I wonder to now why here hits such limits.