From: ebiederm@xmission.com (Eric W. Biederman)
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Russ Meyerriecks <rmeyerriecks@digium.com>,
sruffell@digium.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Greg KH <greg@kroah.com>
Subject: Re: [PATCH] mm/dmapool.c: Do not create/destroy sysfs file while holding pools_lock
Date: Tue, 01 Mar 2011 20:35:53 -0800 [thread overview]
Message-ID: <m1wrki3zrq.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <20110301170117.258e06e2.akpm@linux-foundation.org> (Andrew Morton's message of "Tue, 1 Mar 2011 17:01:17 -0800")
Andrew Morton <akpm@linux-foundation.org> writes:
> On Mon, 28 Feb 2011 16:41:24 -0600
> Russ Meyerriecks <rmeyerriecks@digium.com> wrote:
>
>> From: Shaun Ruffell <sruffell@digium.com>
>>
>> Eliminates a circular lock dependency reported by lockdep. When reading the
>> "pools" file from a PCI device via sysfs, the s_active lock is acquired before
>> the pools_lock. When unloading the driver and destroying the pool, pools_lock
>> is acquired before the s_active lock.
>>
>> cat/12016 is trying to acquire lock:
>> (pools_lock){+.+.+.}, at: [<c04ef113>] show_pools+0x43/0x140
>>
>> but task is already holding lock:
>> (s_active#82){++++.+}, at: [<c0554e1b>] sysfs_read_file+0xab/0x160
>>
>> which lock already depends on the new lock.
>
> sysfs_dirent_init_lockdep() and the 6992f53349 ("sysfs: Use one lockdep
> class per sysfs attribute") which added it are rather scary.
>
> The alleged bug appears to be due to taking pools_lock outside
> device_create_file() (which takes magical sysfs PseudoVirtualLocks)
> versus show_pools(), which takes pools_lock but is called from inside
> magical sysfs PseudoVirtualLocks.
>
> I don't know if this is actually a real bug or not. Probably not, as
> this device_create_file() does not match the reasons for 6992f53349:
> "There is a sysfs idiom where writing to one sysfs file causes the
> addition or removal of other sysfs files". But that's a guess.
device_create_file is arguable But this also happens with
device_remove_file, and that is exactly the deadlock scenario I added
the lockdep annotation to catch. So the patch clearly does not fix the
issue.
Eric
>> --- a/mm/dmapool.c
>> +++ b/mm/dmapool.c
>> @@ -174,21 +174,28 @@ struct dma_pool *dma_pool_create(const char *name, struct device *dev,
>> init_waitqueue_head(&retval->waitq);
>>
>> if (dev) {
>> - int ret;
>> + int first_pool;
>>
>> mutex_lock(&pools_lock);
>> if (list_empty(&dev->dma_pools))
>> - ret = device_create_file(dev, &dev_attr_pools);
>> + first_pool = 1;
>> else
>> - ret = 0;
>> + first_pool = 0;
>> /* note: not currently insisting "name" be unique */
>> - if (!ret)
>> - list_add(&retval->pools, &dev->dma_pools);
>> - else {
>> - kfree(retval);
>> - retval = NULL;
>> - }
>> + list_add(&retval->pools, &dev->dma_pools);
>> mutex_unlock(&pools_lock);
>> +
>> + if (first_pool) {
>> + int ret;
>> + ret = device_create_file(dev, &dev_attr_pools);
>> + if (ret) {
>> + mutex_lock(&pools_lock);
>> + list_del(&retval->pools);
>> + mutex_unlock(&pools_lock);
>> + kfree(retval);
>> + retval = NULL;
>> + }
>> + }
>
> Not a good fix, IMO. The problem is that if two CPUs concurrently call
> dma_pool_create(), the first CPU will spend time creating the sysfs
> file. Meanwhile, the second CPU will whizz straight back to its
> caller. The caller now thinks that the sysfs file has been created and
> returns to userspace, which immediately tries to read the sysfs file.
> But the first CPU hasn't finished creating it yet. Userspace fails.
>
> One way of fixing this would be to create another singleton lock:
>
>
> {
> static DEFINE_MUTEX(pools_sysfs_lock);
> static bool pools_sysfs_done;
>
> mutex_lock(&pools_sysfs_lock);
> if (pools_sysfs_done == false) {
> create_sysfs_stuff();
> pools_sysfs_done = true;
> }
> mutex_unlock(&pools_sysfs_lock);
> }
>
> That's not terribly pretty.
Or possibly use module_init style magic. Where use module
initialization and remove to trigger creation and deletion of the sysfs.
Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-03-02 4:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-28 22:41 Russ Meyerriecks
2011-03-02 1:01 ` Andrew Morton
2011-03-02 4:35 ` Eric W. Biederman [this message]
2011-03-02 5:23 ` Shaun Ruffell
2011-03-02 5:17 ` Shaun Ruffell
2011-03-02 7:24 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1wrki3zrq.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=greg@kroah.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rmeyerriecks@digium.com \
--cc=sruffell@digium.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox