Re: [RFC 1/2] mm: additional page lock debugging

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Sasha Levin <sasha.levin@oracle.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC 1/2] mm: additional page lock debugging
Date: Mon, 30 Dec 2013 11:33:47 -0500	[thread overview]
Message-ID: <52C1A06B.4070605@oracle.com> (raw)
In-Reply-To: <20131230114317.GA8117@node.dhcp.inet.fi>

On 12/30/2013 06:43 AM, Kirill A. Shutemov wrote:
> On Sat, Dec 28, 2013 at 08:45:03PM -0500, Sasha Levin wrote:
>> We've recently stumbled on several issues with the page lock which
>> triggered BUG_ON()s.
>>
>> While working on them, it was clear that due to the complexity of
>> locking its pretty hard to figure out if something is supposed
>> to be locked or not, and if we encountered a race it was quite a
>> pain narrowing it down.
>>
>> This is an attempt at solving this situation. This patch adds simple
>> asserts to catch cases where someone is trying to lock the page lock
>> while it's already locked, and cases to catch someone unlocking the
>> lock without it being held.
>>
>> My initial patch attempted to use lockdep to get further coverege,
>> but that attempt uncovered the amount of issues triggered and made
>> it impossible to debug the lockdep integration without clearing out
>> a large portion of existing bugs.
>>
>> This patch adds a new option since it will horribly break any system
>> booting with it due to the amount of issues that it uncovers. This is
>> more of a "call for help" to other mm/ hackers to help clean it up.
>>
>> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
>> ---
>>   include/linux/pagemap.h | 11 +++++++++++
>>   lib/Kconfig.debug       |  9 +++++++++
>>   mm/filemap.c            |  4 +++-
>>   3 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>> index 1710d1b..da24939 100644
>> --- a/include/linux/pagemap.h
>> +++ b/include/linux/pagemap.h
>> @@ -321,6 +321,14 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
>>   	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>>   }
>>
>> +#ifdef CONFIG_DEBUG_VM_PAGE_LOCKS
>> +#define VM_ASSERT_LOCKED(page) VM_BUG_ON_PAGE(!PageLocked(page), (page))
>> +#define VM_ASSERT_UNLOCKED(page) VM_BUG_ON_PAGE(PageLocked(page), (page))
>> +#else
>> +#define VM_ASSERT_LOCKED(page) do { } while (0)
>> +#define VM_ASSERT_UNLOCKED(page) do { } while (0)
>> +#endif
>> +
>>   extern void __lock_page(struct page *page);
>>   extern int __lock_page_killable(struct page *page);
>>   extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>> @@ -329,16 +337,19 @@ extern void unlock_page(struct page *page);
>>
>>   static inline void __set_page_locked(struct page *page)
>>   {
>> +	VM_ASSERT_UNLOCKED(page);
>>   	__set_bit(PG_locked, &page->flags);
>>   }
>>
>>   static inline void __clear_page_locked(struct page *page)
>>   {
>> +	VM_ASSERT_LOCKED(page);
>>   	__clear_bit(PG_locked, &page->flags);
>>   }
>>
>>   static inline int trylock_page(struct page *page)
>>   {
>> +	VM_ASSERT_UNLOCKED(page);
>
> This is not correct. It's perfectly fine if the page is locked here: it's
> why trylock needed.
>
> IIUC, what we want to catch is the case when the page has already locked
> by the task.

Frankly, we shouldn't have trylock_page() at all.

Look at page_referenced() for example. Instead of assuming that it has to be
called with page lock held, it's trying to acquire the lock and to free it only
if it's the one that allocated it.

Why isn't there a VM_BUG_ON() there to test whether the page is locked, and let
the callers handle that?

>
> I don't think it's reasonable to re-implement this functionality. We
> really need to hook up lockdep.

The issue with adding lockdep straight away is that we need to deal with
long held page locks somehow nicely. Unlike regular locks, these may be
held for a rather long while, triggering really long locking chains which
lockdep doesn't really like.

Many places lock a long list of pages in bulk - we could allow that with
nesting, but then you lose your ability to detect trivial deadlocks.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-12-30 16:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-29  1:45 Sasha Levin
2013-12-29  1:45 ` [RFC 2/2] mm: additional checks to page flag set/clear Sasha Levin
2013-12-30 12:12   ` Kirill A. Shutemov
2013-12-30 11:43 ` [RFC 1/2] mm: additional page lock debugging Kirill A. Shutemov
2013-12-30 16:33   ` Sasha Levin [this message]
2013-12-30 22:48     ` Kirill A. Shutemov
2013-12-31  3:22       ` Sasha Levin
2013-12-31 16:26         ` Peter Zijlstra
2013-12-31 16:42           ` Sasha Levin
2014-01-06 10:04             ` Peter Zijlstra
2014-02-10 23:19               ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52C1A06B.4070605@oracle.com \
    --to=sasha.levin@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox