From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f72.google.com (mail-pg0-f72.google.com [74.125.83.72]) by kanga.kvack.org (Postfix) with ESMTP id D12576B02B4 for ; Tue, 8 Aug 2017 12:51:29 -0400 (EDT) Received: by mail-pg0-f72.google.com with SMTP id k190so39615104pge.9 for ; Tue, 08 Aug 2017 09:51:29 -0700 (PDT) Received: from NAM03-DM3-obe.outbound.protection.outlook.com (mail-dm3nam03on0043.outbound.protection.outlook.com. [104.47.41.43]) by mx.google.com with ESMTPS id 5si1122499pfo.543.2017.08.08.09.51.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 08 Aug 2017 09:51:28 -0700 (PDT) Subject: Re: A possible bug: Calling mutex_lock while holding spinlock From: axie References: <2d442de2-c5d4-ecce-2345-4f8f34314247@amd.com> <20170803153902.71ceaa3b435083fc2e112631@linux-foundation.org> <20170804134928.l4klfcnqatni7vsc@black.fi.intel.com> <6027ba44-d3ca-9b0b-acdf-f2ec39f01929@amd.com> Message-ID: Date: Tue, 8 Aug 2017 12:51:15 -0400 MIME-Version: 1.0 In-Reply-To: <6027ba44-d3ca-9b0b-acdf-f2ec39f01929@amd.com> Content-Type: multipart/alternative; boundary="------------7F042AE10785D0944957B1B8" Content-Language: en-US Sender: owner-linux-mm@kvack.org List-ID: To: "Kirill A. Shutemov" , Andrew Morton Cc: Alex Deucher , "Writer, Tim" , linux-mm@kvack.org, "Xie, AlexBin" This is a multi-part message in MIME format. --------------7F042AE10785D0944957B1B8 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi Kirill, Here is the result from the user:"This patch does appear fix the issue." Thanks, Alex (Bin) Xie On 2017-08-04 10:03 AM, axie wrote: > Hi Kirill, > > > Thanks for the patch. I have sent the patch to the user asking whether > he can give it a try. > > > Regards, > > Alex (Bin) Xie > > > > On 2017-08-04 09:49 AM, Kirill A. Shutemov wrote: >> On Thu, Aug 03, 2017 at 03:39:02PM -0700, Andrew Morton wrote: >>> (cc Kirill) >>> >>> On Thu, 3 Aug 2017 12:35:28 -0400 axie wrote: >>> >>>> Hi Andrew, >>>> >>>> >>>> I got a report yesterday with "BUG: sleeping function called from >>>> invalid context at kernel/locking/mutex.c" >>>> >>>> I checked the relevant functions for the issue. Function >>>> page_vma_mapped_walk did acquire spinlock. Later, in MMU notifier, >>>> amdgpu_mn_invalidate_page called function mutex_lock, which triggered >>>> the "bug". >>>> >>>> Function page_vma_mapped_walk was introduced recently by you in commit >>>> c7ab0d2fdc840266b39db94538f74207ec2afbf6 and >>>> ace71a19cec5eb430207c3269d8a2683f0574306. >>>> >>>> Would you advise how to proceed with this bug? Change >>>> page_vma_mapped_walk not to use spinlock? Or change >>>> amdgpu_mn_invalidate_page to use spinlock to meet the change, or >>>> something else? >>>> >>> hm, as far as I can tell this was an unintended side-effect of >>> c7ab0d2fd ("mm: convert try_to_unmap_one() to use >>> page_vma_mapped_walk()"). Before that patch, >>> mmu_notifier_invalidate_page() was not called under page_table_lock. >>> After that patch, mmu_notifier_invalidate_page() is called under >>> page_table_lock. >>> >>> Perhaps Kirill can suggest a fix? >> Sorry for this. >> >> What about the patch below? >> >> From f48dbcdd0ed83dee9a157062b7ca1e2915172678 Mon Sep 17 00:00:00 2001 >> From: "Kirill A. Shutemov" >> Date: Fri, 4 Aug 2017 16:37:26 +0300 >> Subject: [PATCH] rmap: do not call mmu_notifier_invalidate_page() >> under ptl >> >> MMU notifiers can sleep, but in page_mkclean_one() we call >> mmu_notifier_invalidate_page() under page table lock. >> >> Let's instead use mmu_notifier_invalidate_range() outside >> page_vma_mapped_walk() loop. >> >> Signed-off-by: Kirill A. Shutemov >> Fixes: c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use >> page_vma_mapped_walk()") >> --- >> mm/rmap.c | 21 +++++++++++++-------- >> 1 file changed, 13 insertions(+), 8 deletions(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index ced14f1af6dc..b4b711a82c01 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -852,10 +852,10 @@ static bool page_mkclean_one(struct page *page, >> struct vm_area_struct *vma, >> .flags = PVMW_SYNC, >> }; >> int *cleaned = arg; >> + bool invalidation_needed = false; >> while (page_vma_mapped_walk(&pvmw)) { >> int ret = 0; >> - address = pvmw.address; >> if (pvmw.pte) { >> pte_t entry; >> pte_t *pte = pvmw.pte; >> @@ -863,11 +863,11 @@ static bool page_mkclean_one(struct page *page, >> struct vm_area_struct *vma, >> if (!pte_dirty(*pte) && !pte_write(*pte)) >> continue; >> - flush_cache_page(vma, address, pte_pfn(*pte)); >> - entry = ptep_clear_flush(vma, address, pte); >> + flush_cache_page(vma, pvmw.address, pte_pfn(*pte)); >> + entry = ptep_clear_flush(vma, pvmw.address, pte); >> entry = pte_wrprotect(entry); >> entry = pte_mkclean(entry); >> - set_pte_at(vma->vm_mm, address, pte, entry); >> + set_pte_at(vma->vm_mm, pvmw.address, pte, entry); >> ret = 1; >> } else { >> #ifdef CONFIG_TRANSPARENT_HUGE_PAGECACHE >> @@ -877,11 +877,11 @@ static bool page_mkclean_one(struct page *page, >> struct vm_area_struct *vma, >> if (!pmd_dirty(*pmd) && !pmd_write(*pmd)) >> continue; >> - flush_cache_page(vma, address, page_to_pfn(page)); >> - entry = pmdp_huge_clear_flush(vma, address, pmd); >> + flush_cache_page(vma, pvmw.address, page_to_pfn(page)); >> + entry = pmdp_huge_clear_flush(vma, pvmw.address, pmd); >> entry = pmd_wrprotect(entry); >> entry = pmd_mkclean(entry); >> - set_pmd_at(vma->vm_mm, address, pmd, entry); >> + set_pmd_at(vma->vm_mm, pvmw.address, pmd, entry); >> ret = 1; >> #else >> /* unexpected pmd-mapped page? */ >> @@ -890,11 +890,16 @@ static bool page_mkclean_one(struct page *page, >> struct vm_area_struct *vma, >> } >> if (ret) { >> - mmu_notifier_invalidate_page(vma->vm_mm, address); >> (*cleaned)++; >> + invalidation_needed = true; >> } >> } >> + if (invalidation_needed) { >> + mmu_notifier_invalidate_range(vma->vm_mm, address, >> + address + (1UL << compound_order(page))); >> + } >> + >> return true; >> } > --------------7F042AE10785D0944957B1B8 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit

Hi Kirill,

Here is the result from the user:"This patch does appear fix the issue."

Thanks,

Alex (Bin) Xie


On 2017-08-04 10:03 AM, axie wrote:
Hi Kirill,


Thanks for the patch. I have sent the patch to the user asking whether he can give it a try.


Regards,

Alex (Bin) Xie



On 2017-08-04 09:49 AM, Kirill A. Shutemov wrote:
On Thu, Aug 03, 2017 at 03:39:02PM -0700, Andrew Morton wrote:
(cc Kirill)

On Thu, 3 Aug 2017 12:35:28 -0400 axie <axie@amd.com> wrote:

Hi Andrew,


I got a report yesterday with "BUG: sleeping function called from
invalid context at kernel/locking/mutex.c"

I checked the relevant functions for the issue. Function
page_vma_mapped_walk did acquire spinlock. Later, in MMU notifier,
amdgpu_mn_invalidate_page called function mutex_lock, which triggered
the "bug".

Function page_vma_mapped_walk was introduced recently by you in commit
c7ab0d2fdc840266b39db94538f74207ec2afbf6 and
ace71a19cec5eb430207c3269d8a2683f0574306.

Would you advise how to proceed with this bug? Change
page_vma_mapped_walk not to use spinlock? Or change
amdgpu_mn_invalidate_page to use spinlock to meet the change, or
something else?

hm, as far as I can tell this was an unintended side-effect of
c7ab0d2fd ("mm: convert try_to_unmap_one() to use
page_vma_mapped_walk()").A Before that patch,
mmu_notifier_invalidate_page() was not called under page_table_lock.
After that patch, mmu_notifier_invalidate_page() is called under
page_table_lock.

Perhaps Kirill can suggest a fix?
Sorry for this.

What about the patch below?

A From f48dbcdd0ed83dee9a157062b7ca1e2915172678 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Fri, 4 Aug 2017 16:37:26 +0300
Subject: [PATCH] rmap: do not call mmu_notifier_invalidate_page() under ptl

MMU notifiers can sleep, but in page_mkclean_one() we call
mmu_notifier_invalidate_page() under page table lock.

Let's instead use mmu_notifier_invalidate_range() outside
page_vma_mapped_walk() loop.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()")
---
A mm/rmap.c | 21 +++++++++++++--------
A 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index ced14f1af6dc..b4b711a82c01 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -852,10 +852,10 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
A A A A A A A A A .flags = PVMW_SYNC,
A A A A A };
A A A A A int *cleaned = arg;
+A A A bool invalidation_needed = false;
A A A A A A while (page_vma_mapped_walk(&pvmw)) {
A A A A A A A A A int ret = 0;
-A A A A A A A address = pvmw.address;
A A A A A A A A A if (pvmw.pte) {
A A A A A A A A A A A A A pte_t entry;
A A A A A A A A A A A A A pte_t *pte = pvmw.pte;
@@ -863,11 +863,11 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
A A A A A A A A A A A A A if (!pte_dirty(*pte) && !pte_write(*pte))
A A A A A A A A A A A A A A A A A continue;
A -A A A A A A A A A A A flush_cache_page(vma, address, pte_pfn(*pte));
-A A A A A A A A A A A entry = ptep_clear_flush(vma, address, pte);
+A A A A A A A A A A A flush_cache_page(vma, pvmw.address, pte_pfn(*pte));
+A A A A A A A A A A A entry = ptep_clear_flush(vma, pvmw.address, pte);
A A A A A A A A A A A A A entry = pte_wrprotect(entry);
A A A A A A A A A A A A A entry = pte_mkclean(entry);
-A A A A A A A A A A A set_pte_at(vma->vm_mm, address, pte, entry);
+A A A A A A A A A A A set_pte_at(vma->vm_mm, pvmw.address, pte, entry);
A A A A A A A A A A A A A ret = 1;
A A A A A A A A A } else {
A #ifdef CONFIG_TRANSPARENT_HUGE_PAGECACHE
@@ -877,11 +877,11 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
A A A A A A A A A A A A A if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
A A A A A A A A A A A A A A A A A continue;
A -A A A A A A A A A A A flush_cache_page(vma, address, page_to_pfn(page));
-A A A A A A A A A A A entry = pmdp_huge_clear_flush(vma, address, pmd);
+A A A A A A A A A A A flush_cache_page(vma, pvmw.address, page_to_pfn(page));
+A A A A A A A A A A A entry = pmdp_huge_clear_flush(vma, pvmw.address, pmd);
A A A A A A A A A A A A A entry = pmd_wrprotect(entry);
A A A A A A A A A A A A A entry = pmd_mkclean(entry);
-A A A A A A A A A A A set_pmd_at(vma->vm_mm, address, pmd, entry);
+A A A A A A A A A A A set_pmd_at(vma->vm_mm, pvmw.address, pmd, entry);
A A A A A A A A A A A A A ret = 1;
A #else
A A A A A A A A A A A A A /* unexpected pmd-mapped page? */
@@ -890,11 +890,16 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
A A A A A A A A A }
A A A A A A A A A A if (ret) {
-A A A A A A A A A A A mmu_notifier_invalidate_page(vma->vm_mm, address);
A A A A A A A A A A A A A (*cleaned)++;
+A A A A A A A A A A A invalidation_needed = true;
A A A A A A A A A }
A A A A A }
A +A A A if (invalidation_needed) {
+A A A A A A A mmu_notifier_invalidate_range(vma->vm_mm, address,
+A A A A A A A A A A A A A A A address + (1UL << compound_order(page)));
+A A A }
+
A A A A A return true;
A }
A


--------------7F042AE10785D0944957B1B8-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org