From: Shawn Bohrer <sbohrer@rgmadvisors.com>
To: xfs@oss.sgi.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: deadlock on vmap_area_lock
Date: Wed, 1 May 2013 09:43:41 -0500 [thread overview]
Message-ID: <20130501144341.GA2404@BohrerMBP.rgmadvisors.com> (raw)
I've got two compute clusters with around 350 machines each which are
running kernels based off of 3.1.9 (Yes I realize this is ancient by
todays standards). All of the machines run a 'find' command once an
hour on one of the mounted XFS filesystems. Occasionally these find
commands get stuck requiring a reboot of the system. I took a peek
today and see this with perf:
72.22% find [kernel.kallsyms] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|--98.84%-- vm_map_ram
| _xfs_buf_map_pages
| xfs_buf_get
| xfs_buf_read
| xfs_trans_read_buf
| xfs_da_do_buf
| xfs_da_read_buf
| xfs_dir2_block_getdents
| xfs_readdir
| xfs_file_readdir
| vfs_readdir
| sys_getdents
| system_call_fastpath
| __getdents64
|
|--1.12%-- _xfs_buf_map_pages
| xfs_buf_get
| xfs_buf_read
| xfs_trans_read_buf
| xfs_da_do_buf
| xfs_da_read_buf
| xfs_dir2_block_getdents
| xfs_readdir
| xfs_file_readdir
| vfs_readdir
| sys_getdents
| system_call_fastpath
| __getdents64
--0.04%-- [...]
Looking at the code my best guess is that we are spinning on
vmap_area_lock, but I could be wrong. This is the only process
spinning on the machine so I'm assuming either another process has
blocked while holding the lock, or perhaps this find process has tried
to acquire the vmap_area_lock twice?
I've skimmed through the change logs between 3.1 and 3.9 but nothing
stood out as fix for this bug. Does this ring a bell with anyone? If
I have a machine that is currently in one of these stuck states does
anyone have any tips to identifying the processes currently holding
the lock?
Additionally as I mentioned before I have two clusters of roughly
equal size though one cluster hits this issue more frequently. On
that cluster with approximately 350 machines we get about 10 stuck
machines a month. The other cluster has about 450 machines but we
only get about 1 or 2 stuck machines a month. Both clusters run the
same find command every hour, but the workloads on the machines are
different. The cluster that hits the issue more frequently tends to
run more memory intensive jobs.
I'm open to building some debug kernels to help track this down,
though I can't upgrade all of the machines in one shot so it may take
a while to reproduce. I'm happy to provide any other information if
people have questions.
Thanks,
Shawn
--
---------------------------------------------------------------
This email, along with any attachments, is confidential. If you
believe you received this message in error, please contact the
sender immediately and delete all copies of the message.
Thank you.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2013-05-01 14:43 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-01 14:43 Shawn Bohrer [this message]
2013-05-01 15:57 ` David Rientjes
2013-05-01 16:44 ` Shawn Bohrer
2013-05-01 17:01 ` David Rientjes
2013-05-01 22:03 ` Dave Chinner
2013-05-02 16:21 ` Shawn Bohrer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130501144341.GA2404@BohrerMBP.rgmadvisors.com \
--to=sbohrer@rgmadvisors.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox