* [PATCH] mremap NULL pointer dereference fix
@ 2004-02-17 4:41 Rajesh Venkatasubramanian
2004-02-17 5:31 ` Andrew Morton
2004-02-17 5:38 ` Linus Torvalds
0 siblings, 2 replies; 9+ messages in thread
From: Rajesh Venkatasubramanian @ 2004-02-17 4:41 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, Linux-MM
This path fixes a NULL pointer dereference bug in mremap. In
move_one_page we need to re-check the src because an allocation
for the dst page table can drop page_table_lock, and somebody
else can invalidate the src.
In my old Quad Pentium II 200MHz 256MB, with 2.6.3-rc3-mm1-preempt,
I could hit the NULL pointer dereference bug with the program in the
following URL:
http://www-personal.engin.umich.edu/~vrajesh/linux/mremap-nullptr/
Full trace of the bug can be found at the above URL. A partial call
trace is below.
kernel: PREEMPT SMP
kernel: EIP is at copy_one_pte+0x12/0xa0
kernel: [<c01558a3>] move_one_page+0xa3/0x110
kernel: [<c0155947>] move_page_tables+0x37/0x80
kernel: [<c0155a1a>] move_vma+0x8a/0x5e0
kernel: [<c015620c>] do_mremap+0x29c/0x3d0
kernel: [<c015638d>] sys_mremap+0x4d/0x6d
kernel: [<c03d5ee7>] syscall_call+0x7/0xb
Please apply.
mm/mremap.c | 26 ++++++++++++++++++++------
1 files changed, 20 insertions(+), 6 deletions(-)
diff -puN mm/mremap.c~nullptr mm/mremap.c
--- mmlinux-2.6/mm/mremap.c~nullptr 2004-02-16 17:24:00.000000000 -0500
+++ mmlinux-2.6-jaya/mm/mremap.c 2004-02-16 17:24:00.000000000 -0500
@@ -135,17 +135,31 @@ move_one_page(struct vm_area_struct *vma
dst = alloc_one_pte_map(mm, new_addr);
if (src == NULL)
src = get_one_pte_map_nested(mm, old_addr);
+ /*
+ * Since alloc_one_pte_map can drop and re-acquire
+ * page_table_lock, we should re-check the src entry...
+ */
+ if (src == NULL) {
+ pte_unmap(dst);
+ goto flush_out;
+ }
error = copy_one_pte(vma, old_addr, src, dst, &pte_chain);
pte_unmap_nested(src);
pte_unmap(dst);
- } else
- /*
- * Why do we need this flush ? If there is no pte for
- * old_addr, then there must not be a pte for it as well.
- */
- flush_tlb_page(vma, old_addr);
+ goto unlock_out;
+ }
+
+flush_out:
+ /*
+ * Why do we need this flush ? If there is no pte for
+ * old_addr, then there must not be a pte for it as well.
+ */
+ flush_tlb_page(vma, old_addr);
+
+unlock_out:
spin_unlock(&mm->page_table_lock);
pte_chain_free(pte_chain);
+
out:
return error;
}
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mremap NULL pointer dereference fix
2004-02-17 4:41 [PATCH] mremap NULL pointer dereference fix Rajesh Venkatasubramanian
@ 2004-02-17 5:31 ` Andrew Morton
2004-02-17 5:38 ` Linus Torvalds
1 sibling, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2004-02-17 5:31 UTC (permalink / raw)
To: Rajesh Venkatasubramanian; +Cc: linux-kernel, Linux-MM
Rajesh Venkatasubramanian <vrajesh@umich.edu> wrote:
>
> This path fixes a NULL pointer dereference bug in mremap. In
> move_one_page we need to re-check the src because an allocation
> for the dst page table can drop page_table_lock, and somebody
> else can invalidate the src.
OK.
> In my old Quad Pentium II 200MHz 256MB, with 2.6.3-rc3-mm1-preempt,
> I could hit the NULL pointer dereference bug with the program in the
> following URL:
>
> http://www-personal.engin.umich.edu/~vrajesh/linux/mremap-nullptr/
I cannot make any oops happen with that test app. On a 2-way,
CONFIG_PREEMPT=y.
I think we can simplify things in there a bit. How does this look?
mm/mremap.c | 16 +++++++++-------
1 files changed, 9 insertions(+), 7 deletions(-)
diff -puN mm/mremap.c~mremap-oops-fix mm/mremap.c
--- 25/mm/mremap.c~mremap-oops-fix 2004-02-16 20:53:25.000000000 -0800
+++ 25-akpm/mm/mremap.c 2004-02-16 21:00:05.000000000 -0800
@@ -135,15 +135,17 @@ move_one_page(struct vm_area_struct *vma
dst = alloc_one_pte_map(mm, new_addr);
if (src == NULL)
src = get_one_pte_map_nested(mm, old_addr);
- error = copy_one_pte(vma, old_addr, src, dst, &pte_chain);
- pte_unmap_nested(src);
- pte_unmap(dst);
- } else
/*
- * Why do we need this flush ? If there is no pte for
- * old_addr, then there must not be a pte for it as well.
+ * Since alloc_one_pte_map can drop and re-acquire
+ * page_table_lock, we should re-check the src entry...
*/
- flush_tlb_page(vma, old_addr);
+ if (src) {
+ error = copy_one_pte(vma, old_addr, src,
+ dst, &pte_chain);
+ pte_unmap_nested(src);
+ }
+ pte_unmap(dst);
+ }
spin_unlock(&mm->page_table_lock);
pte_chain_free(pte_chain);
out:
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mremap NULL pointer dereference fix
2004-02-17 4:41 [PATCH] mremap NULL pointer dereference fix Rajesh Venkatasubramanian
2004-02-17 5:31 ` Andrew Morton
@ 2004-02-17 5:38 ` Linus Torvalds
2004-02-17 5:49 ` Linus Torvalds
1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2004-02-17 5:38 UTC (permalink / raw)
To: Rajesh Venkatasubramanian; +Cc: akpm, linux-kernel, Linux-MM
On Mon, 16 Feb 2004, Rajesh Venkatasubramanian wrote:
>
> This path fixes a NULL pointer dereference bug in mremap. In
> move_one_page we need to re-check the src because an allocation
> for the dst page table can drop page_table_lock, and somebody
> else can invalidate the src.
Ugly, but yes. The "!page_table_present(mm, new_addr))" code just before
the "alloc_one_pte_map()" should already have done this, but while the
page tables themselves are safe due to us holding the mm semaphore, the
pte entry itself at "src" is not.
I hate that code, and your patch makes it even uglier. This code could do
with a real clean-up, but for now I think your patch will do.
Thanks,
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mremap NULL pointer dereference fix
2004-02-17 5:38 ` Linus Torvalds
@ 2004-02-17 5:49 ` Linus Torvalds
2004-02-17 6:00 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2004-02-17 5:49 UTC (permalink / raw)
To: Rajesh Venkatasubramanian; +Cc: akpm, linux-kernel, Linux-MM
On Mon, 16 Feb 2004, Linus Torvalds wrote:
>
> Ugly, but yes. The "!page_table_present(mm, new_addr))" code just before
> the "alloc_one_pte_map()" should already have done this, but while the
> page tables themselves are safe due to us holding the mm semaphore, the
> pte entry itself at "src" is not.
>
> I hate that code, and your patch makes it even uglier. This code could do
> with a real clean-up, but for now I think your patch will do.
Hmm.. Looking a bit more at it, does this alternate patch work? It's
_slightly_ less ugly, and it also removes the nonsensical TLB invalidate
instead of moving it around together with the comment that says that it
shouldn't exist.
The TLB is (properly) invalidated by "copy_one_pte()" if the mapping
actually changes.
Did I miss anything?
Linus
---
===== mm/mremap.c 1.38 vs edited =====
--- 1.38/mm/mremap.c Wed Feb 4 00:04:56 2004
+++ edited/mm/mremap.c Mon Feb 16 21:44:26 2004
@@ -133,17 +133,21 @@
src = NULL;
}
dst = alloc_one_pte_map(mm, new_addr);
- if (src == NULL)
+ if (src == NULL) {
src = get_one_pte_map_nested(mm, old_addr);
+ /*
+ * "src" could be NULL now, because somebody
+ * might have dropped the (clean) pte entry
+ * while we did the destination pmd allocation.
+ */
+ if (!src)
+ goto out_unmap_dst;
+ }
error = copy_one_pte(vma, old_addr, src, dst, &pte_chain);
pte_unmap_nested(src);
+out_unmap_dst:
pte_unmap(dst);
- } else
- /*
- * Why do we need this flush ? If there is no pte for
- * old_addr, then there must not be a pte for it as well.
- */
- flush_tlb_page(vma, old_addr);
+ }
spin_unlock(&mm->page_table_lock);
pte_chain_free(pte_chain);
out:
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mremap NULL pointer dereference fix
2004-02-17 5:49 ` Linus Torvalds
@ 2004-02-17 6:00 ` Andrew Morton
2004-02-17 6:06 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2004-02-17 6:00 UTC (permalink / raw)
To: Linus Torvalds; +Cc: vrajesh, linux-kernel, Linux-MM
Linus Torvalds <torvalds@osdl.org> wrote:
>
> Hmm.. Looking a bit more at it, does this alternate patch work? It's
> _slightly_ less ugly, and it also removes the nonsensical TLB invalidate
> instead of moving it around together with the comment that says that it
> shouldn't exist.
>
> The TLB is (properly) invalidated by "copy_one_pte()" if the mapping
> actually changes.
>
> Did I miss anything?
This saves a goto. It works, but I wasn't able to trigger
the oops without it either.
diff -puN mm/mremap.c~mremap-oops-fix mm/mremap.c
--- 25/mm/mremap.c~mremap-oops-fix 2004-02-16 20:53:25.000000000 -0800
+++ 25-akpm/mm/mremap.c 2004-02-16 21:00:05.000000000 -0800
@@ -135,15 +135,17 @@ move_one_page(struct vm_area_struct *vma
dst = alloc_one_pte_map(mm, new_addr);
if (src == NULL)
src = get_one_pte_map_nested(mm, old_addr);
- error = copy_one_pte(vma, old_addr, src, dst, &pte_chain);
- pte_unmap_nested(src);
- pte_unmap(dst);
- } else
/*
- * Why do we need this flush ? If there is no pte for
- * old_addr, then there must not be a pte for it as well.
+ * Since alloc_one_pte_map can drop and re-acquire
+ * page_table_lock, we should re-check the src entry...
*/
- flush_tlb_page(vma, old_addr);
+ if (src) {
+ error = copy_one_pte(vma, old_addr, src,
+ dst, &pte_chain);
+ pte_unmap_nested(src);
+ }
+ pte_unmap(dst);
+ }
spin_unlock(&mm->page_table_lock);
pte_chain_free(pte_chain);
out:
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mremap NULL pointer dereference fix
2004-02-17 6:00 ` Andrew Morton
@ 2004-02-17 6:06 ` Linus Torvalds
2004-02-17 13:23 ` Rajesh Venkatasubramanian
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Linus Torvalds @ 2004-02-17 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: vrajesh, linux-kernel, Linux-MM
On Mon, 16 Feb 2004, Andrew Morton wrote:
>
> This saves a goto. It works, but I wasn't able to trigger
> the oops without it either.
To trigger the bug you have to have _just_ the right memory usage, I
suspect. You literally have to have the destination page directory
allocation unmap the _exact_ source page (which has to be clean) for the
bug to hit.
So I suspect the oops only triggers on the machine that the trigger
program was written for.
Your version of the patch saves a goto in the source, but results in an
extra goto in the generated assembly unless the compiler is clever enough
to notice the double test for NULL.
Never mind, that's a micro-optimization, and your version is cleaner.
Let's go with it if Rajesh can verify that it fixes the problem for him.
Rajesh?
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mremap NULL pointer dereference fix
2004-02-17 6:06 ` Linus Torvalds
@ 2004-02-17 13:23 ` Rajesh Venkatasubramanian
2004-02-17 21:33 ` Rajesh Venkatasubramanian
2004-02-19 14:29 ` [PATCH] orphaned ptes -- mremap vs. truncate race Rajesh Venkatasubramanian
2 siblings, 0 replies; 9+ messages in thread
From: Rajesh Venkatasubramanian @ 2004-02-17 13:23 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, Linux-MM
> To trigger the bug you have to have _just_ the right memory usage, I
> suspect. You literally have to have the destination page directory
> allocation unmap the _exact_ source page (which has to be clean) for the
> bug to hit.
>
To trigger the bug, I have to run my test program in a "while true;"
loop for an hour or so.
> So I suspect the oops only triggers on the machine that the trigger
> program was written for.
>
> Your version of the patch saves a goto in the source, but results in an
> extra goto in the generated assembly unless the compiler is clever enough
> to notice the double test for NULL.
>
> Never mind, that's a micro-optimization, and your version is cleaner.
> Let's go with it if Rajesh can verify that it fixes the problem for him.
I will test the patch and report.
Thanks,
Rajesh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mremap NULL pointer dereference fix
2004-02-17 6:06 ` Linus Torvalds
2004-02-17 13:23 ` Rajesh Venkatasubramanian
@ 2004-02-17 21:33 ` Rajesh Venkatasubramanian
2004-02-19 14:29 ` [PATCH] orphaned ptes -- mremap vs. truncate race Rajesh Venkatasubramanian
2 siblings, 0 replies; 9+ messages in thread
From: Rajesh Venkatasubramanian @ 2004-02-17 21:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, Linux-MM
> >
> > This saves a goto. It works, but I wasn't able to trigger
> > the oops without it either.
>
> To trigger the bug you have to have _just_ the right memory usage, I
> suspect. You literally have to have the destination page directory
> allocation unmap the _exact_ source page (which has to be clean) for the
> bug to hit.
A minor point. It is not necessary for the src to be clean because a
parallel truncate can also invalidate the src. Actually, my test program
uses truncate to invalidate the src.
> Your version of the patch saves a goto in the source, but results in an
> extra goto in the generated assembly unless the compiler is clever enough
> to notice the double test for NULL.
>
> Never mind, that's a micro-optimization, and your version is cleaner.
Yeah. Andrew's patch is lot cleaner than my _crap_ patch.
> Let's go with it if Rajesh can verify that it fixes the problem for him.
Yeap. Andrew's patch fixes the problem. I did put in a printk along with
Andrew's patch to check whether the NULL src condition repeats. I could
trigger the condition again, and the machine didn't oops because of the
patch.
Thanks,
Rajesh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] orphaned ptes -- mremap vs. truncate race
2004-02-17 6:06 ` Linus Torvalds
2004-02-17 13:23 ` Rajesh Venkatasubramanian
2004-02-17 21:33 ` Rajesh Venkatasubramanian
@ 2004-02-19 14:29 ` Rajesh Venkatasubramanian
2 siblings, 0 replies; 9+ messages in thread
From: Rajesh Venkatasubramanian @ 2004-02-19 14:29 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, Linux-MM
Copying and moving page tables can race with invalidate_mmap_range
and leave some orphaned ptes in the target page table unless some
fix is already in place. For extra information about related races
follow the links [1] and [2].
Fork (dup_mmap) copies page tables from an old process to a new
process. In dup_mmap, orphaned ptes due to a race between
copy_page_range and invalidate_mmap_range can be avoided by ordering
the old process's vma and the new process's vma appropriately in
the corresponding i_mmap{_shared} list. This patch does that and
adds a fat comment. This ordering ensures that invalidate_mmap_range
zaps ptes from the old vma before the new vma. This helps to avoid
orphaned ptes.
Currently, mremap does not add the new_vma to the corresponding
i_mmap{_shared} list before copying the page tables. This is racy
and leads to orphaned ptes. You can use the test program in the
link [4] to trigger the race. In my old Quad Pentium II 200Mz 256MB,
I can consistently trigger the race with the test program.
This patch adds the new_vma to the corresponding i_mmap{_shared} list
in appropriate order before moving the page tables. This does not
entirely solve the orphaned ptes problem because in the error path
move_page_tables moves the ptes from the new_vma to the old vma
(i.e., in opposite to the order of those vmas in the i_mmap{_shared}
list). Therefore, to fix orphaned ptes in the error path, this patch
uses the mapping's truncate_count. The fix is ugly and not efficient.
However, truncate race in the error path is _very_ rare. So I think
it is okay to take some performance penalty.
In the error path, this patch ignores nonlinear mappings since
it's my understanding that we do not care about SIGBUS in nonlinear
maps. To find Andrew Morton's take on this follow the link [3].
This patch is for 2.6.3-mm1. The patch is tested minimally.
Let me know of my stupidities and mistakes, if any.
Links:
-----
[1] initial patch for do_no_page() vs. truncate race
http://marc.theaimsgroup.com/?t=105434202900003
[2] final patch for do_no_page() vs. truncate -- distributed FS
http://marc.theaimsgroup.com/?t=105544905100001
[3] nonlinear map - truncate - SIGBUS
http://marc.theaimsgroup.com/?m=106595961920958
[4] Test programs
http://www-personal.engin.umich.edu/~vrajesh/linux/mremap-truncate/
include/linux/mm.h | 2 +
kernel/fork.c | 9 ++++++-
mm/mmap.c | 30 +++++++++++++++++++++-----
mm/mremap.c | 60 +++++++++++++++++++++++++++++++++++++++--------------
4 files changed, 78 insertions(+), 23 deletions(-)
diff -puN kernel/fork.c~mremap_race kernel/fork.c
--- mmlinux-2.6/kernel/fork.c~mremap_race 2004-02-19 00:39:36.000000000 -0500
+++ mmlinux-2.6-jaya/kernel/fork.c 2004-02-19 00:39:36.000000000 -0500
@@ -316,9 +316,14 @@ static inline int dup_mmap(struct mm_str
if (tmp->vm_flags & VM_DENYWRITE)
atomic_dec(&inode->i_writecount);
- /* insert tmp into the share list, just after mpnt */
+ /*
+ * insert tmp into the share list, just after mpnt.
+ * Note that this order of insertion is important to
+ * avoid orphaned ptes due to a rare race between
+ * invalidate_mmap_range and copy_page_range.
+ */
down(&file->f_mapping->i_shared_sem);
- list_add_tail(&tmp->shared, &mpnt->shared);
+ list_add(&tmp->shared, &mpnt->shared);
up(&file->f_mapping->i_shared_sem);
}
diff -puN include/linux/mm.h~mremap_race include/linux/mm.h
--- mmlinux-2.6/include/linux/mm.h~mremap_race 2004-02-19 00:39:36.000000000 -0500
+++ mmlinux-2.6-jaya/include/linux/mm.h 2004-02-19 00:39:36.000000000 -0500
@@ -530,6 +530,8 @@ extern void si_meminfo_node(struct sysin
/* mmap.c */
extern void insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
+extern void add_vma_to_process(struct mm_struct *, struct vm_area_struct *);
+extern void unmap_vma(struct mm_struct *, struct vm_area_struct *);
extern void build_mmap_rb(struct mm_struct *);
extern void exit_mmap(struct mm_struct *);
diff -puN mm/mremap.c~mremap_race mm/mremap.c
--- mmlinux-2.6/mm/mremap.c~mremap_race 2004-02-19 00:39:36.000000000 -0500
+++ mmlinux-2.6-jaya/mm/mremap.c 2004-02-19 00:39:36.000000000 -0500
@@ -191,8 +191,9 @@ static unsigned long move_vma(struct vm_
unsigned long new_addr)
{
struct mm_struct *mm = vma->vm_mm;
+ struct address_space *mapping = NULL;
struct vm_area_struct *new_vma, *next, *prev;
- int allocated_vma;
+ int allocated_vma, sequence = 0;
int split = 0;
new_vma = NULL;
@@ -244,23 +245,40 @@ static unsigned long move_vma(struct vm_
if (!new_vma)
goto out;
allocated_vma = 1;
+ *new_vma = *vma;
+ INIT_LIST_HEAD(&new_vma->shared);
+ new_vma->vm_start = new_addr;
+ new_vma->vm_end = new_addr+new_len;
+ new_vma->vm_pgoff += (addr-vma->vm_start) >> PAGE_SHIFT;
+
+ if (new_vma->vm_file) {
+ struct inode *inode;
+ get_file(new_vma->vm_file);
+ inode = new_vma->vm_file->f_dentry->d_inode;
+ mapping = new_vma->vm_file->f_mapping;
+ if (new_vma->vm_flags & VM_DENYWRITE)
+ atomic_dec(&inode->i_writecount);
+ /*
+ * insert new_vma into the shared list, just after vma.
+ * Note that this ordering of insertion is important
+ * to avoid orphaned ptes due to a rare race between
+ * invalidate_mmap_range and move_page_tables.
+ */
+ down(&mapping->i_shared_sem);
+ sequence = atomic_read(&mapping->truncate_count);
+ list_add(&new_vma->shared, &vma->shared);
+ up(&mapping->i_shared_sem);
+ }
+
+ if (new_vma->vm_ops && new_vma->vm_ops->open)
+ new_vma->vm_ops->open(new_vma);
}
if (!move_page_tables(vma, new_addr, addr, old_len)) {
unsigned long vm_locked = vma->vm_flags & VM_LOCKED;
- if (allocated_vma) {
- *new_vma = *vma;
- INIT_LIST_HEAD(&new_vma->shared);
- new_vma->vm_start = new_addr;
- new_vma->vm_end = new_addr+new_len;
- new_vma->vm_pgoff += (addr-vma->vm_start) >> PAGE_SHIFT;
- if (new_vma->vm_file)
- get_file(new_vma->vm_file);
- if (new_vma->vm_ops && new_vma->vm_ops->open)
- new_vma->vm_ops->open(new_vma);
- insert_vm_struct(current->mm, new_vma);
- }
+ if (allocated_vma)
+ add_vma_to_process(current->mm, new_vma);
/* Conceal VM_ACCOUNT so old reservation is not undone */
if (vma->vm_flags & VM_ACCOUNT) {
@@ -291,8 +309,20 @@ static unsigned long move_vma(struct vm_
}
return new_addr;
}
- if (allocated_vma)
- kmem_cache_free(vm_area_cachep, new_vma);
+
+ /*
+ * Ugly. Not efficient. Paranoid about leaving orphaned ptes due to
+ * mremap vs. truncate race. But then, it works, and we do not worry
+ * too much about burning few extra cycles under memory pressure.
+ */
+ if (mapping && !(vma->vm_flags & VM_NONLINEAR) &&
+ (unlikely(sequence != atomic_read(&mapping->truncate_count)))) {
+ flush_cache_range(vma, addr, addr + old_len);
+ zap_page_range(vma, addr, old_len);
+ }
+
+ if (allocated_vma)
+ unmap_vma(current->mm, new_vma);
out:
return -ENOMEM;
}
diff -puN mm/mmap.c~mremap_race mm/mmap.c
--- mmlinux-2.6/mm/mmap.c~mremap_race 2004-02-19 00:39:36.000000000 -0500
+++ mmlinux-2.6-jaya/mm/mmap.c 2004-02-19 00:39:36.000000000 -0500
@@ -1079,13 +1079,8 @@ no_mmaps:
* By the time this function is called, the area struct has been
* removed from the process mapping list.
*/
-static void unmap_vma(struct mm_struct *mm, struct vm_area_struct *area)
+void unmap_vma(struct mm_struct *mm, struct vm_area_struct *area)
{
- size_t len = area->vm_end - area->vm_start;
-
- area->vm_mm->total_vm -= len >> PAGE_SHIFT;
- if (area->vm_flags & VM_LOCKED)
- area->vm_mm->locked_vm -= len >> PAGE_SHIFT;
/*
* Is this a new hole at the lowest possible address?
*/
@@ -1113,6 +1108,10 @@ static void unmap_vma_list(struct mm_str
{
do {
struct vm_area_struct *next = mpnt->vm_next;
+ size_t len = mpnt->vm_end - mpnt->vm_start;
+ mpnt->vm_mm->total_vm -= len >> PAGE_SHIFT;
+ if (mpnt->vm_flags & VM_LOCKED)
+ mpnt->vm_mm->locked_vm -= len >> PAGE_SHIFT;
unmap_vma(mm, mpnt);
mpnt = next;
} while (mpnt != NULL);
@@ -1485,3 +1484,22 @@ void insert_vm_struct(struct mm_struct *
vma_link(mm, vma, prev, rb_link, rb_parent);
validate_mm(mm);
}
+
+/* Insert vm structure into process list sorted by address */
+
+void add_vma_to_process(struct mm_struct * mm, struct vm_area_struct * vma)
+{
+ struct vm_area_struct * __vma, * prev;
+ struct rb_node ** rb_link, * rb_parent;
+
+ __vma = find_vma_prepare(mm,vma->vm_start,&prev,&rb_link,&rb_parent);
+ if (__vma && __vma->vm_start < vma->vm_end)
+ BUG();
+ spin_lock(&mm->page_table_lock);
+ __vma_link_list(mm, vma, prev, rb_parent);
+ __vma_link_rb(mm, vma, rb_link, rb_parent);
+ spin_unlock(&mm->page_table_lock);
+ mark_mm_hugetlb(mm, vma);
+ mm->map_count++;
+ validate_mm(mm);
+}
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-02-19 14:29 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-17 4:41 [PATCH] mremap NULL pointer dereference fix Rajesh Venkatasubramanian
2004-02-17 5:31 ` Andrew Morton
2004-02-17 5:38 ` Linus Torvalds
2004-02-17 5:49 ` Linus Torvalds
2004-02-17 6:00 ` Andrew Morton
2004-02-17 6:06 ` Linus Torvalds
2004-02-17 13:23 ` Rajesh Venkatasubramanian
2004-02-17 21:33 ` Rajesh Venkatasubramanian
2004-02-19 14:29 ` [PATCH] orphaned ptes -- mremap vs. truncate race Rajesh Venkatasubramanian
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox