From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f45.google.com (mail-pa0-f45.google.com [209.85.220.45]) by kanga.kvack.org (Postfix) with ESMTP id 90D606B0032 for ; Wed, 25 Feb 2015 09:31:32 -0500 (EST) Received: by pabrd3 with SMTP id rd3so5687900pab.1 for ; Wed, 25 Feb 2015 06:31:32 -0800 (PST) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [2001:e42:101:1:202:181:97:72]) by mx.google.com with ESMTPS id e9si1937886pas.9.2015.02.25.06.31.30 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 25 Feb 2015 06:31:31 -0800 (PST) Subject: Re: How to handle TIF_MEMDIE stalls? From: Tetsuo Handa References: <201502172057.GCD09362.FtHQMVSLJOFFOO@I-love.SAKURA.ne.jp> <201502242020.IDI64912.tOOQSVJFOFLHMF@I-love.SAKURA.ne.jp> <20150224152033.GA3782@thunk.org> <20150224210244.GA13666@dastard> In-Reply-To: <20150224210244.GA13666@dastard> Message-Id: <201502252331.IEJ78629.OOOFSLFMHQtFVJ@I-love.SAKURA.ne.jp> Date: Wed, 25 Feb 2015 23:31:17 +0900 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org List-ID: To: david@fromorbit.com Cc: tytso@mit.edu, rientjes@google.com, hannes@cmpxchg.org, mhocko@suse.cz, dchinner@redhat.com, linux-mm@kvack.org, oleg@redhat.com, akpm@linux-foundation.org, mgorman@suse.de, torvalds@linux-foundation.org, fernando_b1@lab.ntt.co.jp Dave Chinner wrote: > This exact discussion is already underway. > > My initial proposal: > > http://oss.sgi.com/archives/xfs/2015-02/msg00314.html > > Why mempools don't work but transaction based reservations will: > > http://oss.sgi.com/archives/xfs/2015-02/msg00339.html > > Reservation needs to be an accounting mechanisms, not preallocation: > > http://oss.sgi.com/archives/xfs/2015-02/msg00456.html > http://oss.sgi.com/archives/xfs/2015-02/msg00457.html > http://oss.sgi.com/archives/xfs/2015-02/msg00458.html > > And that's where the discussion currently sits. I got two problems (one is stall at io_schedule(), the other is kernel panic due to xfs's assertion failure) using Linux 3.19. I guess those problems are caused by not retrying !GFP_FS allocations under OOM. Will those problems go away by using transaction based reservations? And if yes, are they simple enough to backport to vendor's kernels? (From http://I-love.SAKURA.ne.jp/tmp/serial-20150225-1.txt.xz ) ---------- [ 1225.773411] kworker/3:0H D ffff88007cadb4f8 11632 27 2 0x00000000 [ 1225.776911] ffff88007cadb4f8 ffff88007cadb508 ffff88007cac6740 0000000000014080 [ 1225.780670] ffffffff8101cd19 ffff88007cadbfd8 0000000000014080 ffff88007c28b740 [ 1225.784431] ffff88007cac6740 ffff88007cadb540 ffff88007f8d4998 ffff88007cadb540 [ 1225.788766] Call Trace: [ 1225.789988] [] ? read_tsc+0x9/0x10 [ 1225.792444] [] ? xfs_iunpin_wait+0x19/0x20 [ 1225.795228] [] io_schedule+0xa0/0x130 [ 1225.797802] [] __xfs_iunpin_wait+0xe9/0x140 [ 1225.800621] [] ? autoremove_wake_function+0x40/0x40 [ 1225.803770] [] xfs_iunpin_wait+0x19/0x20 [ 1225.806471] [] xfs_reclaim_inode+0x7c/0x360 [ 1225.809283] [] xfs_reclaim_inodes_ag+0x257/0x370 [ 1225.812308] [] ? radix_tree_gang_lookup_tag+0x89/0xd0 [ 1225.815532] [] ? list_lru_walk_node+0x148/0x190 [ 1225.817951] [] xfs_reclaim_inodes_nr+0x33/0x40 [ 1225.819373] [] xfs_fs_free_cached_objects+0x15/0x20 [ 1225.820898] [] super_cache_scan+0x169/0x170 [ 1225.822245] [] shrink_node_slabs+0x1d6/0x370 [ 1225.823588] [] shrink_zone+0x20a/0x240 [ 1225.824830] [] do_try_to_free_pages+0x16c/0x460 [ 1225.826230] [] try_to_free_pages+0xba/0x150 [ 1225.827570] [] __alloc_pages_nodemask+0x5b2/0x9d0 [ 1225.829030] [] kmem_getpages+0x8c/0x200 [ 1225.830277] [] fallback_alloc+0x17b/0x230 [ 1225.831561] [] ____cache_alloc_node+0x18b/0x1c0 [ 1225.833061] [] kmem_cache_alloc+0x330/0x5c0 [ 1225.834435] [] ? ida_pre_get+0x69/0x100 [ 1225.835719] [] ida_pre_get+0x69/0x100 [ 1225.836963] [] ida_simple_get+0x42/0xf0 [ 1225.838248] [] create_worker+0x31/0x1c0 [ 1225.839519] [] worker_thread+0x3d1/0x4d0 [ 1225.840800] [] ? rescuer_thread+0x3a0/0x3a0 [ 1225.842123] [] kthread+0xd2/0xf0 [ 1225.843234] [] ? perf_trace_xen_mmu_ptep_modify_prot+0x90/0xf0 [ 1225.844978] [] ? kthread_create_on_node+0x180/0x180 [ 1225.846481] [] ret_from_fork+0x7c/0xb0 [ 1225.847718] [] ? kthread_create_on_node+0x180/0x180 [ 1225.849279] kswapd0 D ffff88007708f998 11552 45 2 0x00000000 [ 1225.850977] ffff88007708f998 0000000000000000 ffff88007c28b740 0000000000014080 [ 1225.852798] 0000000000000003 ffff88007708ffd8 0000000000014080 ffff880077ff2740 [ 1225.854575] ffff88007c28b740 0000000000000000 ffff88007948e3a8 ffff88007948e3ac [ 1225.856358] Call Trace: [ 1225.856928] [] schedule_preempt_disabled+0x29/0x70 [ 1225.858384] [] __mutex_lock_slowpath+0x95/0x100 [ 1225.859799] [] mutex_lock+0x23/0x37 [ 1225.860983] [] xfs_reclaim_inodes_ag+0x2cc/0x370 [ 1225.862403] [] ? __enqueue_entity+0x78/0x80 [ 1225.863742] [] ? enqueue_entity+0x237/0x8f0 [ 1225.865100] [] ? radix_tree_gang_lookup_tag+0x89/0xd0 [ 1225.866659] [] ? list_lru_walk_node+0x148/0x190 [ 1225.868106] [] xfs_reclaim_inodes_nr+0x33/0x40 [ 1225.869522] [] xfs_fs_free_cached_objects+0x15/0x20 [ 1225.871015] [] super_cache_scan+0x169/0x170 [ 1225.872338] [] shrink_node_slabs+0x1d6/0x370 [ 1225.873679] [] shrink_zone+0x20a/0x240 [ 1225.874920] [] kswapd+0x4fd/0x9c0 [ 1225.876049] [] ? mem_cgroup_shrink_node_zone+0x140/0x140 [ 1225.877654] [] kthread+0xd2/0xf0 [ 1225.878762] [] ? perf_trace_xen_mmu_ptep_modify_prot+0x90/0xf0 [ 1225.880495] [] ? kthread_create_on_node+0x180/0x180 [ 1225.881996] [] ret_from_fork+0x7c/0xb0 [ 1225.883336] [] ? kthread_create_on_node+0x180/0x180 ---------- (From http://I-love.SAKURA.ne.jp/tmp/serial-20150225-2.txt.xz + http://I-love.SAKURA.ne.jp/tmp/crash-20150225-2.log.xz ) ---------- [ 189.586204] Out of memory: Kill process 3701 (a.out) score 834 or sacrifice child [ 189.586205] Killed process 3701 (a.out) total-vm:2167392kB, anon-rss:1465820kB, file-rss:4kB [ 189.586210] Kill process 3702 (a.out) sharing same memory [ 189.586211] Kill process 3714 (a.out) sharing same memory [ 189.586212] Kill process 3748 (a.out) sharing same memory [ 189.586213] Kill process 3755 (a.out) sharing same memory [ 189.593470] XFS: Assertion failed: XFS_FORCED_SHUTDOWN(mp), file: fs/xfs/xfs_inode.c, line: 1701 [ 189.593491] ------------[ cut here ]------------ [ 189.593492] kernel BUG at fs/xfs/xfs_message.c:106! [ 189.593493] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 189.593511] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter ip_tables coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul ablk_helper cryptd dm_mirror dm_region_hash dm_log microcode dm_mod ppdev parport_pc pcspkr vmw_balloon serio_raw vmw_vmci parport shpchp i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput ata_generic pata_acpi sd_mod ata_piix mptspi libata scsi_transport_spi e1000 mptscsih mptbase floppy [ 189.593512] CPU: 1 PID: 3755 Comm: a.out Not tainted 3.19.0 #42 [ 189.593512] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 189.593513] task: ffff88007a848740 ti: ffff88005c064000 task.ti: ffff88005c064000 [ 189.593517] RIP: 0010:[] [] assfail+0x22/0x30 [ 189.593517] RSP: 0000:ffff88005c067af8 EFLAGS: 00010292 [ 189.593518] RAX: 0000000000000054 RBX: ffff880079349c00 RCX: 0000000000000050 [ 189.593518] RDX: 0000000000005050 RSI: 0000000000000282 RDI: 0000000000000282 [ 189.593519] RBP: ffff88005c067af8 R08: 0000000000000282 R09: 0000000000000000 [ 189.593519] R10: ffffffff81ec95c8 R11: 656c696166206e6f R12: ffff88005ee92800 [ 189.593519] R13: 00000000fffffff4 R14: ffffffff81838140 R15: ffff880064505390 [ 189.593520] FS: 00007f62d93e0740(0000) GS:ffff88007f840000(0000) knlGS:0000000000000000 [ 189.593521] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 189.593521] CR2: 00007fb901282763 CR3: 0000000077b00000 CR4: 00000000000407e0 [ 189.593562] Stack: [ 189.593564] ffff88005c067b38 ffffffff812ab2d7 ffff880079349e48 ffff88007a6feef0 [ 189.593564] ffff88005c067b38 ffff880079349c00 0000000000000001 ffff880079349db8 [ 189.593565] ffff88005c067b58 ffffffff812acb98 ffff880079349db8 ffff880079349c00 [ 189.593565] Call Trace: [ 189.593568] [] xfs_inactive_truncate+0x67/0x150 [ 189.593569] [] xfs_inactive+0x1c8/0x1f0 [ 189.593570] [] xfs_fs_evict_inode+0x86/0xd0 [ 189.593572] [] evict+0xb8/0x190 [ 189.593574] [] iput+0xf5/0x180 [ 189.593575] [] __dentry_kill+0x188/0x1f0 [ 189.593576] [] dput+0xa5/0x170 [ 189.593577] [] __fput+0x16d/0x1e0 [ 189.593578] [] ____fput+0xe/0x10 [ 189.593580] [] task_work_run+0xaf/0xf0 [ 189.593582] [] do_exit+0x2d8/0xbe0 [ 189.593583] [] ? recalc_sigpending+0x1f/0x60 [ 189.593584] [] do_group_exit+0x3f/0xa0 [ 189.593585] [] get_signal+0x1d2/0x6f0 [ 189.593588] [] do_signal+0x28/0x720 [ 189.593589] [] ? __sb_end_write+0x35/0x70 [ 189.593591] [] ? vfs_write+0x172/0x1f0 [ 189.593592] [] do_notify_resume+0x4c/0x90 [ 189.593594] [] int_signal+0x12/0x17 [ 189.593602] Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 f1 41 89 d0 48 c7 c6 48 8b 97 81 48 89 fa 31 c0 48 89 e5 31 ff e8 de fb ff ff <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 [ 189.593603] RIP [] assfail+0x22/0x30 [ 189.593604] RSP ---------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org