linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hugh@veritas.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Ramiro Voicu <Ramiro.Voicu@cern.ch>,
	linux-mm@kvack.org, bugme-daemon@bugzilla.kernel.org
Subject: Re: [Bugme-new] [Bug 7645] New: Kernel BUG at mm/memory.c:1124
Date: Sat, 9 Dec 2006 04:34:16 +0000 (GMT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0612090427180.3684@blonde.wat.veritas.com> (raw)
In-Reply-To: <20061208155200.0e2794a1.akpm@osdl.org>

On Fri, 8 Dec 2006, Andrew Morton wrote:
> On Thu, 7 Dec 2006 21:22:57 +0000 (GMT)
> Hugh Dickins <hugh@veritas.com> wrote:
> > 
> > Please try the simple patch below: I expect it to fix your problem.
> > Whether it's the right patch, I'm not quite sure: we do commonly use
> > zap_page_range and zeromap_page_range with mmap_sem held for write,
> > but perhaps we'd want to avoid such serialization in this case?
> 
> Ramiro, have you had a chance to test this yet?

Here's a bigger but better patch: if you wouldn't mind,
please try this one instead, Ramiro - thanks.


Ramiro Voicu hits the BUG_ON(!pte_none(*pte)) in zeromap_pte_range:
kernel bugzilla 7645.  Right: read_zero_pagealigned uses down_read of
mmap_sem, but another thread's racing read of /dev/zero, or a normal
fault, can easily set that pte again, in between zap_page_range and
zeromap_page_range getting there.  It's been wrong ever since 2.4.3.

The simple fix is to use down_write instead, but that would serialize
reads of /dev/zero more than at present: perhaps some app would be
badly affected.  So instead let zeromap_page_range return the error
instead of BUG_ON, and read_zero_pagealigned break to the slower
clear_user loop in that case - there's no need to optimize for it.

Use -EEXIST for when a pte is found: BUG_ON in mmap_zero (the other
user of zeromap_page_range), though it really isn't interesting there.
And since mmap_zero wants -EAGAIN for out-of-memory, the zeromaps
better return that than -ENOMEM.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 drivers/char/mem.c |   12 ++++++++----
 mm/memory.c        |   32 +++++++++++++++++++++-----------
 2 files changed, 29 insertions(+), 15 deletions(-)

--- 2.6.19/drivers/char/mem.c	2006-11-29 21:57:37.000000000 +0000
+++ linux/drivers/char/mem.c	2006-12-08 14:09:51.000000000 +0000
@@ -646,7 +646,8 @@ static inline size_t read_zero_pagealign
 			count = size;
 
 		zap_page_range(vma, addr, count, NULL);
-        	zeromap_page_range(vma, addr, count, PAGE_COPY);
+        	if (zeromap_page_range(vma, addr, count, PAGE_COPY))
+			break;
 
 		size -= count;
 		buf += count;
@@ -713,11 +714,14 @@ out:
 
 static int mmap_zero(struct file * file, struct vm_area_struct * vma)
 {
+	int err;
+
 	if (vma->vm_flags & VM_SHARED)
 		return shmem_zero_setup(vma);
-	if (zeromap_page_range(vma, vma->vm_start, vma->vm_end - vma->vm_start, vma->vm_page_prot))
-		return -EAGAIN;
-	return 0;
+	err = zeromap_page_range(vma, vma->vm_start,
+			vma->vm_end - vma->vm_start, vma->vm_page_prot);
+	BUG_ON(err == -EEXIST);
+	return err;
 }
 #else /* CONFIG_MMU */
 static ssize_t read_zero(struct file * file, char * buf, 
--- 2.6.19/mm/memory.c	2006-11-29 21:57:37.000000000 +0000
+++ linux/mm/memory.c	2006-12-08 14:09:51.000000000 +0000
@@ -1110,23 +1110,29 @@ static int zeromap_pte_range(struct mm_s
 {
 	pte_t *pte;
 	spinlock_t *ptl;
+	int err = 0;
 
 	pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
 	if (!pte)
-		return -ENOMEM;
+		return -EAGAIN;
 	arch_enter_lazy_mmu_mode();
 	do {
 		struct page *page = ZERO_PAGE(addr);
 		pte_t zero_pte = pte_wrprotect(mk_pte(page, prot));
+
+		if (unlikely(!pte_none(*pte))) {
+			err = -EEXIST;
+			pte++;
+			break;
+		}
 		page_cache_get(page);
 		page_add_file_rmap(page);
 		inc_mm_counter(mm, file_rss);
-		BUG_ON(!pte_none(*pte));
 		set_pte_at(mm, addr, pte, zero_pte);
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 	arch_leave_lazy_mmu_mode();
 	pte_unmap_unlock(pte - 1, ptl);
-	return 0;
+	return err;
 }
 
 static inline int zeromap_pmd_range(struct mm_struct *mm, pud_t *pud,
@@ -1134,16 +1140,18 @@ static inline int zeromap_pmd_range(stru
 {
 	pmd_t *pmd;
 	unsigned long next;
+	int err;
 
 	pmd = pmd_alloc(mm, pud, addr);
 	if (!pmd)
-		return -ENOMEM;
+		return -EAGAIN;
 	do {
 		next = pmd_addr_end(addr, end);
-		if (zeromap_pte_range(mm, pmd, addr, next, prot))
-			return -ENOMEM;
+		err = zeromap_pte_range(mm, pmd, addr, next, prot);
+		if (err)
+			break;
 	} while (pmd++, addr = next, addr != end);
-	return 0;
+	return err;
 }
 
 static inline int zeromap_pud_range(struct mm_struct *mm, pgd_t *pgd,
@@ -1151,16 +1159,18 @@ static inline int zeromap_pud_range(stru
 {
 	pud_t *pud;
 	unsigned long next;
+	int err;
 
 	pud = pud_alloc(mm, pgd, addr);
 	if (!pud)
-		return -ENOMEM;
+		return -EAGAIN;
 	do {
 		next = pud_addr_end(addr, end);
-		if (zeromap_pmd_range(mm, pud, addr, next, prot))
-			return -ENOMEM;
+		err = zeromap_pmd_range(mm, pud, addr, next, prot);
+		if (err)
+			break;
 	} while (pud++, addr = next, addr != end);
-	return 0;
+	return err;
 }
 
 int zeromap_page_range(struct vm_area_struct *vma,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-12-09  4:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200612070355.kB73tGf4021820@fire-2.osdl.org>
2006-12-07  4:12 ` Andrew Morton
2006-12-07  5:15   ` Ramiro Voicu
2006-12-07  7:03     ` Andrew Morton
2006-12-07 14:54       ` Ramiro Voicu
2006-12-07 21:22         ` Hugh Dickins
2006-12-08 23:52           ` Andrew Morton
2006-12-09  4:34             ` Hugh Dickins [this message]
2006-12-09 17:24               ` Ramiro Voicu
2006-12-09 18:40                 ` Hugh Dickins
2006-12-11 18:56                   ` Ramiro Voicu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0612090427180.3684@blonde.wat.veritas.com \
    --to=hugh@veritas.com \
    --cc=Ramiro.Voicu@cern.ch \
    --cc=akpm@osdl.org \
    --cc=bugme-daemon@bugzilla.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox