linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ramiro Voicu <Ramiro.Voicu@cern.ch>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@osdl.org>,
	linux-mm@kvack.org, bugme-daemon@bugzilla.kernel.org
Subject: Re: [Bugme-new] [Bug 7645] New: Kernel BUG at mm/memory.c:1124
Date: Sat, 09 Dec 2006 18:24:38 +0100	[thread overview]
Message-ID: <457AF156.8070606@cern.ch> (raw)
In-Reply-To: <Pine.LNX.4.64.0612090427180.3684@blonde.wat.veritas.com>

It seems that this patch fixed the problem. I tested on my desktop and
the problem seems gone.

Based on what Hugh supposed, I was able to have a small java program to
test it ... and indeed it is very possible that there was a race in the
initial app

I will try to test it tomorrow on the other machine ( it is unable to
boot now after a hard reboot ), but I think the bug can be closed now.

Thank you very much for your support!

Cheers,
Ramiro

Hugh Dickins wrote:
> On Fri, 8 Dec 2006, Andrew Morton wrote:
>> On Thu, 7 Dec 2006 21:22:57 +0000 (GMT)
>> Hugh Dickins <hugh@veritas.com> wrote:
>>> Please try the simple patch below: I expect it to fix your problem.
>>> Whether it's the right patch, I'm not quite sure: we do commonly use
>>> zap_page_range and zeromap_page_range with mmap_sem held for write,
>>> but perhaps we'd want to avoid such serialization in this case?
>> Ramiro, have you had a chance to test this yet?
> 
> Here's a bigger but better patch: if you wouldn't mind,
> please try this one instead, Ramiro - thanks.
> 
> 
> Ramiro Voicu hits the BUG_ON(!pte_none(*pte)) in zeromap_pte_range:
> kernel bugzilla 7645.  Right: read_zero_pagealigned uses down_read of
> mmap_sem, but another thread's racing read of /dev/zero, or a normal
> fault, can easily set that pte again, in between zap_page_range and
> zeromap_page_range getting there.  It's been wrong ever since 2.4.3.
> 
> The simple fix is to use down_write instead, but that would serialize
> reads of /dev/zero more than at present: perhaps some app would be
> badly affected.  So instead let zeromap_page_range return the error
> instead of BUG_ON, and read_zero_pagealigned break to the slower
> clear_user loop in that case - there's no need to optimize for it.
> 
> Use -EEXIST for when a pte is found: BUG_ON in mmap_zero (the other
> user of zeromap_page_range), though it really isn't interesting there.
> And since mmap_zero wants -EAGAIN for out-of-memory, the zeromaps
> better return that than -ENOMEM.
> 
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ---
> 
>  drivers/char/mem.c |   12 ++++++++----
>  mm/memory.c        |   32 +++++++++++++++++++++-----------
>  2 files changed, 29 insertions(+), 15 deletions(-)
> 
> --- 2.6.19/drivers/char/mem.c	2006-11-29 21:57:37.000000000 +0000
> +++ linux/drivers/char/mem.c	2006-12-08 14:09:51.000000000 +0000
> @@ -646,7 +646,8 @@ static inline size_t read_zero_pagealign
>  			count = size;
>  
>  		zap_page_range(vma, addr, count, NULL);
> -        	zeromap_page_range(vma, addr, count, PAGE_COPY);
> +        	if (zeromap_page_range(vma, addr, count, PAGE_COPY))
> +			break;
>  
>  		size -= count;
>  		buf += count;
> @@ -713,11 +714,14 @@ out:
>  
>  static int mmap_zero(struct file * file, struct vm_area_struct * vma)
>  {
> +	int err;
> +
>  	if (vma->vm_flags & VM_SHARED)
>  		return shmem_zero_setup(vma);
> -	if (zeromap_page_range(vma, vma->vm_start, vma->vm_end - vma->vm_start, vma->vm_page_prot))
> -		return -EAGAIN;
> -	return 0;
> +	err = zeromap_page_range(vma, vma->vm_start,
> +			vma->vm_end - vma->vm_start, vma->vm_page_prot);
> +	BUG_ON(err == -EEXIST);
> +	return err;
>  }
>  #else /* CONFIG_MMU */
>  static ssize_t read_zero(struct file * file, char * buf, 
> --- 2.6.19/mm/memory.c	2006-11-29 21:57:37.000000000 +0000
> +++ linux/mm/memory.c	2006-12-08 14:09:51.000000000 +0000
> @@ -1110,23 +1110,29 @@ static int zeromap_pte_range(struct mm_s
>  {
>  	pte_t *pte;
>  	spinlock_t *ptl;
> +	int err = 0;
>  
>  	pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
>  	if (!pte)
> -		return -ENOMEM;
> +		return -EAGAIN;
>  	arch_enter_lazy_mmu_mode();
>  	do {
>  		struct page *page = ZERO_PAGE(addr);
>  		pte_t zero_pte = pte_wrprotect(mk_pte(page, prot));
> +
> +		if (unlikely(!pte_none(*pte))) {
> +			err = -EEXIST;
> +			pte++;
> +			break;
> +		}
>  		page_cache_get(page);
>  		page_add_file_rmap(page);
>  		inc_mm_counter(mm, file_rss);
> -		BUG_ON(!pte_none(*pte));
>  		set_pte_at(mm, addr, pte, zero_pte);
>  	} while (pte++, addr += PAGE_SIZE, addr != end);
>  	arch_leave_lazy_mmu_mode();
>  	pte_unmap_unlock(pte - 1, ptl);
> -	return 0;
> +	return err;
>  }
>  
>  static inline int zeromap_pmd_range(struct mm_struct *mm, pud_t *pud,
> @@ -1134,16 +1140,18 @@ static inline int zeromap_pmd_range(stru
>  {
>  	pmd_t *pmd;
>  	unsigned long next;
> +	int err;
>  
>  	pmd = pmd_alloc(mm, pud, addr);
>  	if (!pmd)
> -		return -ENOMEM;
> +		return -EAGAIN;
>  	do {
>  		next = pmd_addr_end(addr, end);
> -		if (zeromap_pte_range(mm, pmd, addr, next, prot))
> -			return -ENOMEM;
> +		err = zeromap_pte_range(mm, pmd, addr, next, prot);
> +		if (err)
> +			break;
>  	} while (pmd++, addr = next, addr != end);
> -	return 0;
> +	return err;
>  }
>  
>  static inline int zeromap_pud_range(struct mm_struct *mm, pgd_t *pgd,
> @@ -1151,16 +1159,18 @@ static inline int zeromap_pud_range(stru
>  {
>  	pud_t *pud;
>  	unsigned long next;
> +	int err;
>  
>  	pud = pud_alloc(mm, pgd, addr);
>  	if (!pud)
> -		return -ENOMEM;
> +		return -EAGAIN;
>  	do {
>  		next = pud_addr_end(addr, end);
> -		if (zeromap_pmd_range(mm, pud, addr, next, prot))
> -			return -ENOMEM;
> +		err = zeromap_pmd_range(mm, pud, addr, next, prot);
> +		if (err)
> +			break;
>  	} while (pud++, addr = next, addr != end);
> -	return 0;
> +	return err;
>  }
>  
>  int zeromap_page_range(struct vm_area_struct *vma,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-12-09 17:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200612070355.kB73tGf4021820@fire-2.osdl.org>
2006-12-07  4:12 ` Andrew Morton
2006-12-07  5:15   ` Ramiro Voicu
2006-12-07  7:03     ` Andrew Morton
2006-12-07 14:54       ` Ramiro Voicu
2006-12-07 21:22         ` Hugh Dickins
2006-12-08 23:52           ` Andrew Morton
2006-12-09  4:34             ` Hugh Dickins
2006-12-09 17:24               ` Ramiro Voicu [this message]
2006-12-09 18:40                 ` Hugh Dickins
2006-12-11 18:56                   ` Ramiro Voicu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=457AF156.8070606@cern.ch \
    --to=ramiro.voicu@cern.ch \
    --cc=akpm@osdl.org \
    --cc=bugme-daemon@bugzilla.kernel.org \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox