From: Hugh Dickins <hughd@google.com>
To: Dave Jones <davej@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
David Rientjes <rientjes@google.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kernel-team@fedoraproject.org
Subject: Re: exit_mmap() BUG_ON triggering since 3.1
Date: Wed, 15 Feb 2012 18:14:12 -0800 (PST) [thread overview]
Message-ID: <alpine.LSU.2.00.1202151801020.19691@eggly.anvils> (raw)
In-Reply-To: <20120215183317.GA26977@redhat.com>
On Wed, 15 Feb 2012, Dave Jones wrote:
> We've had three reports against the Fedora kernel recently where
> a process exits, and we're tripping up the
>
> BUG_ON(mm->nr_ptes > (FIRST_USER_ADDRESS+PMD_SIZE-1)>>PMD_SHIFT);
>
> in exit_mmap()
>
> It started happening with 3.1, but still occurs on 3.2
> (no 3.3rc reports yet, but it's not getting much testing).
>
> https://bugzilla.redhat.com/show_bug.cgi?id=786632
> https://bugzilla.redhat.com/show_bug.cgi?id=787527
> https://bugzilla.redhat.com/show_bug.cgi?id=790546
>
> I don't see anything special in common between the loaded modules.
>
> anyone?
My suspicion was that it would be related to Transparent HugePages:
they do complicate the pagetable story. And I think I have found a
potential culprit. I don't know if nr_ptes is the only loser from
these two split_huge_pages calls, but assuming it is...
[PATCH] mm: fix BUG on mm->nr_ptes
mm->nr_ptes had unusual locking: down_read mmap_sem plus page_table_lock
when incrementing, down_write mmap_sem (or mm_users 0) when decrementing;
whereas THP is careful to increment and decrement it under page_table_lock.
Now most of those paths in THP also hold mmap_sem for read or write (with
appropriate checks on mm_users), but two do not: when split_huge_page()
is called by hwpoison_user_mappings(), and when called by add_to_swap().
It's conceivable that the latter case is responsible for the exit_mmap()
BUG_ON mm->nr_ptes that has been reported on Fedora.
THP's understanding of the locking seems reasonable, so take that lock
to update it in free_pgd_range(): try to avoid retaking it repeatedly
by passing the count up from levels below - free_pgtables() already
does its best to combine calls across neighbouring vmas.
Or should we try harder to avoid the extra locking: test mm_users?
#ifdef on THP? Or consider the accuracy of this count not worth
extra locking, and just scrap the BUG_ON now?
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
---
mm/memory.c | 40 +++++++++++++++++++++++++++-------------
1 file changed, 27 insertions(+), 13 deletions(-)
--- 3.3-rc3/mm/memory.c 2012-01-31 14:51:15.100021868 -0800
+++ linux/mm/memory.c 2012-02-15 17:01:46.588649490 -0800
@@ -419,22 +419,23 @@ void pmd_clear_bad(pmd_t *pmd)
* Note: this doesn't free the actual pages themselves. That
* has been handled earlier when unmapping all the memory regions.
*/
-static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
+static long free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
unsigned long addr)
{
pgtable_t token = pmd_pgtable(*pmd);
pmd_clear(pmd);
pte_free_tlb(tlb, token, addr);
- tlb->mm->nr_ptes--;
+ return 1;
}
-static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
+static inline long free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
unsigned long addr, unsigned long end,
unsigned long floor, unsigned long ceiling)
{
pmd_t *pmd;
unsigned long next;
unsigned long start;
+ long nr_ptes = 0;
start = addr;
pmd = pmd_offset(pud, addr);
@@ -442,32 +443,35 @@ static inline void free_pmd_range(struct
next = pmd_addr_end(addr, end);
if (pmd_none_or_clear_bad(pmd))
continue;
- free_pte_range(tlb, pmd, addr);
+ nr_ptes += free_pte_range(tlb, pmd, addr);
} while (pmd++, addr = next, addr != end);
start &= PUD_MASK;
if (start < floor)
- return;
+ goto out;
if (ceiling) {
ceiling &= PUD_MASK;
if (!ceiling)
- return;
+ goto out;
}
if (end - 1 > ceiling - 1)
- return;
+ goto out;
pmd = pmd_offset(pud, start);
pud_clear(pud);
pmd_free_tlb(tlb, pmd, start);
+out:
+ return nr_ptes;
}
-static inline void free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
+static inline long free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
unsigned long addr, unsigned long end,
unsigned long floor, unsigned long ceiling)
{
pud_t *pud;
unsigned long next;
unsigned long start;
+ long nr_ptes = 0;
start = addr;
pud = pud_offset(pgd, addr);
@@ -475,23 +479,25 @@ static inline void free_pud_range(struct
next = pud_addr_end(addr, end);
if (pud_none_or_clear_bad(pud))
continue;
- free_pmd_range(tlb, pud, addr, next, floor, ceiling);
+ nr_ptes += free_pmd_range(tlb, pud, addr, next, floor, ceiling);
} while (pud++, addr = next, addr != end);
start &= PGDIR_MASK;
if (start < floor)
- return;
+ goto out;
if (ceiling) {
ceiling &= PGDIR_MASK;
if (!ceiling)
- return;
+ goto out;
}
if (end - 1 > ceiling - 1)
- return;
+ goto out;
pud = pud_offset(pgd, start);
pgd_clear(pgd);
pud_free_tlb(tlb, pud, start);
+out:
+ return nr_ptes;
}
/*
@@ -505,6 +511,7 @@ void free_pgd_range(struct mmu_gather *t
{
pgd_t *pgd;
unsigned long next;
+ long nr_ptes = 0;
/*
* The next few lines have given us lots of grief...
@@ -553,8 +560,15 @@ void free_pgd_range(struct mmu_gather *t
next = pgd_addr_end(addr, end);
if (pgd_none_or_clear_bad(pgd))
continue;
- free_pud_range(tlb, pgd, addr, next, floor, ceiling);
+ nr_ptes += free_pud_range(tlb, pgd, addr, next, floor, ceiling);
} while (pgd++, addr = next, addr != end);
+
+ if (nr_ptes) {
+ struct mm_struct *mm = tlb->mm;
+ spin_lock(&mm->page_table_lock);
+ mm->nr_ptes -= nr_ptes;
+ spin_unlock(&mm->page_table_lock);
+ }
}
void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma,
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-02-16 2:14 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-15 18:33 Dave Jones
2012-02-16 2:14 ` Hugh Dickins [this message]
2012-02-16 2:22 ` Roland Dreier
2012-02-16 2:48 ` Hugh Dickins
2012-02-16 7:07 ` Andrea Arcangeli
2012-02-16 9:53 ` Hugh Dickins
2012-02-16 21:42 ` Andrea Arcangeli
2012-03-02 22:53 ` Hugh Dickins
2012-03-02 22:58 ` Andrew Morton
2012-03-02 23:09 ` Hugh Dickins
2012-03-05 19:59 ` Josh Boyer
2012-03-05 20:07 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.00.1202151801020.19691@eggly.anvils \
--to=hughd@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=davej@redhat.com \
--cc=kernel-team@fedoraproject.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox