Chris, Thanks for the reply. You have been a great help. Do you know if these changes were implemented any farther back than 3.2? I wouldn't feel comfortable running a release candidate kernel in a production environment. Thanks again On Fri, Dec 9, 2011 at 6:55 AM, Christoph Hellwig wrote: > On Thu, Dec 08, 2011 at 01:03:51PM -0500, Ryan C. England wrote: > > I am looking for assistance on XFS which is why I have joined this > mailing > > list. I'm receiving a stack overflow on our file server. The server is > > running Scientific Linux 6.1 with the following kernel, > > 2.6.32-131.21.1.el6.x86_64. > > > > This is causing random reboots which is more annoying than anything. I > > found a couple of links in the archives but wasn't quite sure how to > apply > > this patch. I can provide whatever information necessary in order for > > assistance in troubleshooting. > > It's really mostly an issue with the VM page reclaim and writeback > code. The kernel still has the old balance dirty pages code which calls > into writeback code from the stack of the write system call, which > already comes from NFSD with massive amounts of stack used. Then > the writeback code calls into XFS to write data out, then you get the > full XFS btree code, which then ends up in kmalloc and memory reclaim. > > You probably have only a third of the stack actually used by XFS, the > rest is from NFSD/writeback code and page reclaim. I don't think any > of this is easily fixable in a 2.6.32 codebase. Current mainline 3.2-rc > now has the I/O-less balance dirty pages which will basically split the > stack footprint in half, but it's an invasive change to the writeback > code that isn't easily backportable. > > > Dec 6 20:27:55 localhost kernel: ------------[ cut here ]------------ > > Dec 6 20:27:55 localhost kernel: WARNING: at arch/x86/kernel/irq_64.c:47 > > handle_irq+0x8f/0xa0() (Not tainted) > > Dec 6 20:27:55 localhost kernel: Hardware name: X8DTH-i/6/iF/6F > > Dec 6 20:27:55 localhost kernel: do_IRQ: nfsd near stack overflow > > (cur:ffff880622208000,sp:ffff880622208160) > > Dec 6 20:27:55 localhost kernel: Modules linked in: mpt2sas > > scsi_transport_sas raid_class mptctl mptbase nfsd lockd nfs_acl > auth_rpcgss > > autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ip6t_REJECT > > nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter > > ip6_tables ipv6 xfs exportfs dm_mirror dm_region_hash dm_log ses > enclosure > > ixgbe mdio microcode igb serio_raw ghes hed i2c_i801 i2c_core sg iTCO_wdt > > iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext4 mbcache > > jbd2 megaraid_sas(U) sd_mod crc_t10dif ahci dm_mod [last unloaded: > > scsi_wait_scan] > > Dec 6 20:27:55 localhost kernel: Pid: 2898, comm: nfsd Not tainted > > 2.6.32-131.21.1.el6.x86_64 #1 > > Dec 6 20:27:55 localhost kernel: Call Trace: > > Dec 6 20:27:55 localhost kernel: [] ? > > warn_slowpath_common+0x87/0xc0 > > Dec 6 20:27:55 localhost kernel: [] ? > > __do_softirq+0x11a/0x1d0 > > Dec 6 20:27:55 localhost kernel: [] ? > > warn_slowpath_fmt+0x46/0x50 > > Dec 6 20:27:55 localhost kernel: [] ? > > call_softirq+0x1c/0x30 > > Dec 6 20:27:55 localhost kernel: [] ? > > handle_irq+0x8f/0xa0 > > Dec 6 20:27:55 localhost kernel: [] ? do_IRQ+0x6c/0xf0 > > Dec 6 20:27:55 localhost kernel: [] ? > > ret_from_intr+0x0/0x11 > > Dec 6 20:27:55 localhost kernel: [] ? > > kmem_cache_free+0xbf/0x2b0 > > Dec 6 20:27:55 localhost kernel: [] ? > > free_buffer_head+0x22/0x50 > > Dec 6 20:27:55 localhost kernel: [] ? > > try_to_free_buffers+0x79/0xc0 > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_vm_releasepage+0xbc/0x130 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > try_to_release_page+0x30/0x60 > > Dec 6 20:27:55 localhost kernel: [] ? > > shrink_page_list.clone.0+0x4f1/0x5c0 > > Dec 6 20:27:55 localhost kernel: [] ? > > shrink_inactive_list+0x2f8/0x740 > > Dec 6 20:27:55 localhost kernel: [] ? > > free_pcppages_bulk+0x2b6/0x390 > > Dec 6 20:27:55 localhost kernel: [] ? > > shrink_zone+0x38f/0x520 > > Dec 6 20:27:55 localhost kernel: [] ? > > __mem_cgroup_uncharge_common+0x198/0x270 > > Dec 6 20:27:55 localhost kernel: [] ? > > zone_reclaim+0x354/0x410 > > Dec 6 20:27:55 localhost kernel: [] ? > > isolate_pages_global+0x0/0x380 > > Dec 6 20:27:55 localhost kernel: [] ? > > get_page_from_freelist+0x694/0x820 > > Dec 6 20:27:55 localhost kernel: [] ? > > shrink_inactive_list+0x4f2/0x740 > > Dec 6 20:27:55 localhost kernel: [] ? > > __alloc_pages_nodemask+0x111/0x8b0 > > Dec 6 20:27:55 localhost kernel: [] ? > > find_get_page+0x1e/0xa0 > > Dec 6 20:27:55 localhost kernel: [] ? > > find_lock_page+0x37/0x80 > > Dec 6 20:27:55 localhost kernel: [] ? > > alloc_pages_current+0xaa/0x110 > > Dec 6 20:27:55 localhost kernel: [] ? > > __page_cache_alloc+0x87/0x90 > > Dec 6 20:27:55 localhost kernel: [] ? > > find_or_create_page+0x4f/0xb0 > > Dec 6 20:27:55 localhost kernel: [] ? > > _xfs_buf_lookup_pages+0x145/0x360 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > _xfs_buf_initialize+0xcb/0x140 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_buf_get+0x77/0x1b0 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_buf_read+0x2c/0x100 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_trans_read_buf+0x219/0x440 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_btree_read_buf_block+0x5e/0xc0 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_btree_ptr_offset+0x4c/0x90 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_btree_lookup+0xbf/0x470 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_alloc_ag_vextent_near+0x98a/0xb70 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_trans_log_buf+0x9d/0xe0 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_bmap_add_extent_delay_real+0xe54/0x18d0 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > kmem_zone_alloc+0x9a/0xe0 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_trans_mod_dquot_byino+0x79/0xd0 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_bmap_add_extent+0x3ff/0x420 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_bmbt_init_cursor+0x4a/0x150 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_bmapi+0xb14/0x11a0 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > down_write+0x16/0x40 > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_iomap_write_allocate+0x1c5/0x3b0 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > generic_make_request+0x21e/0x5b0 > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_iomap+0x389/0x440 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > __mark_inode_dirty+0x6c/0x160 > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_map_blocks+0x2d/0x40 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_page_state_convert+0x2f8/0x750 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > radix_tree_gang_lookup_tag_slot+0x95/0xe0 > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_vm_writepage+0x86/0x170 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > __writepage+0x17/0x40 > > Dec 6 20:27:55 localhost kernel: [] ? > > write_cache_pages+0x1c9/0x4a0 > > Dec 6 20:27:55 localhost kernel: [] ? > > __writepage+0x0/0x40 > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_iflush+0x203/0x210 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_bdwrite+0x5f/0xa0 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_trans_unlocked_item+0x39/0x60 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > generic_writepages+0x24/0x30 > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_vm_writepages+0x5e/0x80 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > do_writepages+0x21/0x40 > > Dec 6 20:27:55 localhost kernel: [] ? > > writeback_single_inode+0xdd/0x2c0 > > Dec 6 20:27:55 localhost kernel: [] ? > > writeback_sb_inodes+0xce/0x180 > > Dec 6 20:27:55 localhost kernel: [] ? > > writeback_inodes_wb+0xab/0x1b0 > > Dec 6 20:27:55 localhost kernel: [] ? > > balance_dirty_pages+0x21e/0x4d0 > > Dec 6 20:27:55 localhost kernel: [] ? > > mark_buffer_dirty+0x61/0xa0 > > Dec 6 20:27:55 localhost kernel: [] ? > > balance_dirty_pages_ratelimited_nr+0x64/0x70 > > Dec 6 20:27:55 localhost kernel: [] ? > > generic_file_buffered_write+0x1c3/0x2a0 > > Dec 6 20:27:55 localhost kernel: [] ? > > current_fs_time+0x27/0x30 > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_write+0x76f/0xb70 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > memcpy_toiovec+0x55/0x80 > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_file_aio_write+0x0/0x70 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > xfs_file_aio_write+0x61/0x70 [xfs] > > Dec 6 20:27:55 localhost kernel: [] ? > > do_sync_readv_writev+0xfb/0x140 > > Dec 6 20:27:55 localhost kernel: [] ? > > d_obtain_alias+0x4d/0x160 > > Dec 6 20:27:55 localhost kernel: [] ? > > autoremove_wake_function+0x0/0x40 > > Dec 6 20:27:55 localhost kernel: [] ? > > security_task_setgroups+0x16/0x20 > > Dec 6 20:27:55 localhost kernel: [] ? > > security_file_permission+0x16/0x20 > > Dec 6 20:27:55 localhost kernel: [] ? > > do_readv_writev+0xcf/0x1f0 > > Dec 6 20:27:55 localhost kernel: [] ? > > nfsd_setuser_and_check_port+0x62/0xb0 [nfsd] > > Dec 6 20:27:55 localhost kernel: [] ? > > vfs_writev+0x46/0x60 > > Dec 6 20:27:55 localhost kernel: [] ? > > nfsd_vfs_write+0x107/0x430 [nfsd] > > Dec 6 20:27:55 localhost kernel: [] ? > > dentry_open+0x52/0xc0 > > Dec 6 20:27:55 localhost kernel: [] ? > > nfsd_open+0x13e/0x210 [nfsd] > > Dec 6 20:27:55 localhost kernel: [] ? > > nfsd_write+0xe7/0x100 [nfsd] > > Dec 6 20:27:55 localhost kernel: [] ? > > nfsd3_proc_write+0xaf/0x140 [nfsd] > > Dec 6 20:27:55 localhost kernel: [] ? > > nfsd_dispatch+0xfe/0x240 [nfsd] > > Dec 6 20:27:55 localhost kernel: [] ? > > svc_process_common+0x344/0x640 [sunrpc] > > Dec 6 20:27:55 localhost kernel: [] ? > > default_wake_function+0x0/0x20 > > Dec 6 20:27:55 localhost kernel: [] ? > > svc_process+0x110/0x160 [sunrpc] > > Dec 6 20:27:55 localhost kernel: [] ? nfsd+0xc2/0x160 > > [nfsd] > > Dec 6 20:27:55 localhost kernel: [] ? nfsd+0x0/0x160 > > [nfsd] > > Dec 6 20:27:55 localhost kernel: [] ? > kthread+0x96/0xa0 > > Dec 6 20:27:55 localhost kernel: [] ? > child_rip+0xa/0x20 > > Dec 6 20:27:55 localhost kernel: [] ? kthread+0x0/0xa0 > > Dec 6 20:27:55 localhost kernel: [] ? > child_rip+0x0/0x20 > > Dec 6 20:27:55 localhost kernel: ---[ end trace e8b62253d4084e2b ]--- > > > > -- > > Ryan C. England > > Corvid Technologies > > office: 704-799-6944 x158 > > cell: 980-521-2297 > > > _______________________________________________ > > xfs mailing list > > xfs@oss.sgi.com > > http://oss.sgi.com/mailman/listinfo/xfs > > ---end quoted text--- > -- Ryan C. England Corvid Technologies office: 704-799-6944 x158 cell: 980-521-2297