linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* 2.5.52-mm2
@ 2002-12-19  5:53 Andrew Morton
  2002-12-19  8:54 ` 2.5.52-mm2 William Lee Irwin III
  2002-12-19  9:41 ` 2.5.52-mm2 Andrew Morton
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2002-12-19  5:53 UTC (permalink / raw)
  To: lkml, linux-mm

url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.52/2.5.52-mm2/

. Big reorganisation of shared pagetable code.  It is a cleanup, and
  there should be no functional changes.  The diff is considerably
  easier to read now.

  In this patchset, shared pagetables are configurable again, and the
  default is "off".  This is because the intent is that pagetable sharing
  always be enabled (on ia32 at least).  But we want it to work when it
  is disabled too.  So in this -mm, pagetable sahring is disabled. 
  Henceforth it will be enabled.  Make sense?

. Added Bill Irwin's patches, get them some additional testing.

. The per-cpu kmalloc infrastructure.

. Another update of the patch management scripts is at

	http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.9/

  no great changes here.  Various fixes and tweaks.



Changes since 2.5.52-mm1:

+shpte-reorg.patch

 The shared pagetable patch reorganisation.

+shpte-reorg-fixes.patch

 Make it work with CONFIG_SHAREPTE=n

-lockless-current_kernel_time.patch

 Dropped for now, because it is ia32-only and it is time to get some
 non-ia32 testing done.

+block-allocator-doc.patch

 Some commentary.

+ext2-rename-vars.patch

 Make ext2_new_block() understandable

+remove-memshared.patch

 Remvoe /proc/meminfo:MemShared

+bin2bcd.patch

 Code consolidation/cleanup

+log_buf_size.patch

 Configurable printk buffer size.

+semtimedop-update.patch

 Wire up semtimedop() for 32-bit ia32 apps on ia64.

+nfs-kmap_atomic.patch

 Use kmap_atomic in NFS

+ext3-bh-dirty-race.patch

 Fix a rare BUG in ext3

+unalign-radix-tree-nodes.patch

 Space saving for radix_tree_nodes

+htlb-0.patch
+htlb-1.patch
+htlb-2.patch
+htlb-3.patch

 hugetlbpage updates

+kmalloc_percpu.patch

 per-cpu kmalloc infrastructure

+kmalloc_percpu-rtcache.patch
+kmalloc_percpu-mibs-1.patch
+kmalloc_percpu-mibs-2.patch
+kmalloc_percpu-mibs-3.patch

 Applications thereof

+wli-01_numaq_io.patch
+wli-02_do_sak.patch
+wli-03_proc_super.patch
+wli-04_cap_set_pg.patch
+wli-06_uml_get_task.patch
+wli-07_numaq_mem_map.patch
+wli-08_numaq_pgdat.patch
+wli-09_has_stopped_jobs.patch
+wli-10_inode_wait.patch
+wli-11_pgd_ctor.patch
+wli-12_pidhash_size.patch
+wli-13_rmap_nrpte.patch

 Bill's stuff.



All 78 patches:


linus.patch
  cset-1.883.3.60-to-1.900.txt.gz

kgdb.patch

sync_fs-deadlock-fix.patch
  sync_fs deadlock fix

shrink_list-dirty-page-race.patch
  fix a page dirtying race in vmscan.c

slab-poisoning.patch
  more informative slab poisoning

nommu-generic_file_readonly_mmap.patch
  Add generic_file_readonly_mmap() for nommu

dio-return-partial-result.patch

aio-direct-io-infrastructure.patch
  AIO support for raw/O_DIRECT

deferred-bio-dirtying.patch
  bio dirtying infrastructure

aio-direct-io.patch
  AIO support for raw/O_DIRECT

aio-dio-debug.patch

dio-reduce-context-switch-rate.patch
  Reduced wakeup rate in direct-io code

cputimes_stat.patch
  Retore per-cpu time accounting, with a config option

reduce-random-context-switch-rate.patch
  Reduce context switch rate due to the random driver

inlines-net.patch

rbtree-iosched.patch
  rbtree-based IO scheduler

deadsched-fix.patch
  deadline scheduler fix

quota-smp-locks.patch
  Subject: [PATCH] Quota SMP locks

shpte-ng.patch
  pagetable sharing for ia32

shpte-nonlinear.patch
  shpte: support nonlinear mappings and clean up clear_share_range()

shpte-reorg.patch

shpte-reorg-fixes.patch
  shared pagetable reorg fixes

shpte-always-on.patch
  Force CONFIG_SHAREPTE=y for ia32

ptrace-flush.patch
  Subject: [PATCH] ptrace on 2.5.44

buffer-debug.patch
  buffer.c debugging

misc.patch
  misc fixes

warn-null-wakeup.patch

pentium-II.patch
  Pentium-II support bits

rcu-stats.patch
  RCU statistics reporting

auto-unplug.patch
  self-unplugging request queues

less-unplugging.patch
  Remove most of the blk_run_queues() calls

ext3-fsync-speedup.patch
  Clean up ext3_sync_file()

remove-PF_NOWARN.patch
  Remove PF_NOWARN

scheduler-tunables.patch
  scheduler tunables

blocking-kswapd.patch
  Give kswapd writeback higher priority than pdflush

block-allocator-doc.patch
  ext2/3 commentary and cleanup

spread-find_group_other.patch
  ext2/3: better starting group for S_ISREG files

ext3-alloc-spread.patch
  ext3: smarter block allocation startup

ext2-alloc-spread.patch
  ext2: smarter block allocation startup

ext2-rename-vars.patch
  rename locals in ext2_new_block()

ext3-use-after-free.patch
  ext3 use-after-free bugfix

dio-always-kmalloc.patch
  direct-io: dynamically allocate struct dio

file-nr-doc-fix.patch
  Docs: fix explanation of file-nr

set_page_dirty_lock.patch
  fix set_page_dirty vs truncate&free races

remove-memshared.patch
  Remove /proc/meminfo:MemShared

bin2bcd.patch
  BIN_TO_BCD consolidation

log_buf_size.patch
  move LOG_BUF_SIZE to header/config

semtimedop-update.patch
  Enable semtimedop for ia64 32-bit emulation.

nfs-kmap_atomic.patch
  use kmap_atomic instaed of kmap in NFS client

ext3-bh-dirty-race.patch
  ext3: fix buffer dirtying

unalign-radix-tree-nodes.patch
  don't cacheline-align radix_tree_nodes

htlb-0.patch
  hugetlb bugfixes

htlb-1.patch
  hugetlb: report shared memory attachment counts

htlb-2.patch
  hugetlb: fix MAP_FIXED handling

htlb-3.patch
  hugetlbfs: set inode->i_size

kmalloc_percpu.patch
  kmalloc_percpu -- stripped down version

kmalloc_percpu-rtcache.patch
  Make rt_cache_stat use kmalloc_percpu

kmalloc_percpu-mibs-1.patch
  Change Networking mibs to use kmalloc_percpu -- 1/3

kmalloc_percpu-mibs-2.patch
  Change Networking mibs to use kmalloc_percpu -- 2/3

kmalloc_percpu-mibs-3.patch
  Change Networking mibs to use kmalloc_percpu -- 3/3

wli-01_numaq_io.patch
  (undescribed patch)

wli-02_do_sak.patch
  (undescribed patch)

wli-03_proc_super.patch
  (undescribed patch)

wli-04_cap_set_pg.patch
  (undescribed patch)

wli-06_uml_get_task.patch
  (undescribed patch)

wli-07_numaq_mem_map.patch
  (undescribed patch)

wli-08_numaq_pgdat.patch
  (undescribed patch)

wli-09_has_stopped_jobs.patch
  (undescribed patch)

wli-10_inode_wait.patch
  (undescribed patch)

wli-11_pgd_ctor.patch
  (undescribed patch)

wli-12_pidhash_size.patch
  (undescribed patch)

wli-13_rmap_nrpte.patch
  (undescribed patch)

dcache_rcu-2.patch
  dcache_rcu-2-2.5.51.patch

dcache_rcu-3.patch
  dcache_rcu-3-2.5.51.patch

page-walk-api.patch

page-walk-scsi.patch

page-walk-api-update.patch
  pagewalk API update

gup-check-valid.patch
  valid page test in get_user_pages()
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.52-mm2
  2002-12-19  5:53 2.5.52-mm2 Andrew Morton
@ 2002-12-19  8:54 ` William Lee Irwin III
  2002-12-19  9:28   ` 2.5.52-mm2 William Lee Irwin III
  2002-12-19  9:41 ` 2.5.52-mm2 Andrew Morton
  1 sibling, 1 reply; 11+ messages in thread
From: William Lee Irwin III @ 2002-12-19  8:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, linux-mm

On Wed, Dec 18, 2002 at 09:53:18PM -0800, Andrew Morton wrote:
> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.52/2.5.52-mm2/

Kernel compile on ramfs, shpte off, overcommit on (probably more like a
stress test for shpte):

c0116810 187174   0.528821    pfn_to_nid
c01168b8 192912   0.545032    x86_profile_hook
c0163c0c 267362   0.755375    d_lookup
c014e530 286920   0.810632    get_empty_filp
c01b0b18 287959   0.813567    __copy_from_user
c01320cc 307597   0.86905     find_get_page
c011a6b0 331219   0.935789    scheduler_tick
c0140890 342427   0.967455    vm_enough_memory
c013fc60 353705   0.999319    handle_mm_fault
c014eb49 358710   1.01346     .text.lock.file_table
c011a1f8 379521   1.07226     load_balance
c01b0ab0 840162   2.3737      __copy_to_user
c013f5a0 1040429  2.93951     do_anonymous_page
c01358c8 1056576  2.98513     __get_page_state
c014406c 1260931  3.56249     page_remove_rmap
c0143e68 1265355  3.57499     page_add_rmap
c0106f38 21236731 59.9999     poll_idle

shpte on will follow.


Bill

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.52-mm2
  2002-12-19  8:54 ` 2.5.52-mm2 William Lee Irwin III
@ 2002-12-19  9:28   ` William Lee Irwin III
  2002-12-19 10:12     ` 2.5.52-mm2 William Lee Irwin III
  0 siblings, 1 reply; 11+ messages in thread
From: William Lee Irwin III @ 2002-12-19  9:28 UTC (permalink / raw)
  To: Andrew Morton, lkml, linux-mm

On Wed, Dec 18, 2002 at 09:53:18PM -0800, Andrew Morton wrote:
>> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.52/2.5.52-mm2/

On Thu, Dec 19, 2002 at 12:54:26AM -0800, William Lee Irwin III wrote:
> Kernel compile on ramfs, shpte off, overcommit on (probably more like a
> stress test for shpte):

With shpte on:

c0135790 123025   0.33807     nr_free_pages
c015021c 134621   0.369936    __fput
c0119974 144602   0.397364    kmap_atomic
c01b23d0 146899   0.403676    __copy_user_intel
c014c050 159002   0.436935    pte_unshare
c0122c28 171802   0.472109    current_kernel_time
c01fbd9c 172897   0.475118    sync_buffer
c0116870 187066   0.514054    pfn_to_nid
c0115390 199497   0.548214    smp_apic_timer_interrupt
c013f820 236468   0.64981     do_no_page
c0116918 260314   0.715338    x86_profile_hook
c016566c 274151   0.753362    d_lookup
c01b2578 280625   0.771152    __copy_from_user
c013212c 323752   0.889665    find_get_page
c0140800 338288   0.929609    vm_enough_memory
c013fbd0 355384   0.976589    handle_mm_fault
c014ff90 358839   0.986083    get_empty_filp
c011a710 363642   0.999282    scheduler_tick
c011a258 507320   1.39411     load_balance
c01505a9 664873   1.82706     .text.lock.file_table
c01b2510 834467   2.2931      __copy_to_user
c0135928 1051119  2.88846     __get_page_state
c013f400 1062731  2.92037     do_anonymous_page
c0143fa0 1273498  3.49955     page_add_rmap
c014419c 1386659  3.81051     page_remove_rmap
c0106f38 21099307 57.9805     poll_idle


Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.52-mm2
  2002-12-19  5:53 2.5.52-mm2 Andrew Morton
  2002-12-19  8:54 ` 2.5.52-mm2 William Lee Irwin III
@ 2002-12-19  9:41 ` Andrew Morton
  2002-12-19  9:50   ` 2.5.52-mm2 Andrew Morton
  2002-12-19 17:02   ` mremap use-after-free [was Re: 2.5.52-mm2] Hugh Dickins
  1 sibling, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2002-12-19  9:41 UTC (permalink / raw)
  To: lkml, linux-mm

Andrew Morton wrote:
> 
> ...
> slab-poisoning.patch
>   more informative slab poisoning

This patch has exposed a quite long-standing use-after-free bug in
mremap().  It make the machine go BUG when starting the X server if
memory debugging is turned on.

The bug might be present in 2.4 as well..


--- 25/mm/mremap.c~move_vma-use-after-free	Thu Dec 19 00:51:49 2002
+++ 25-akpm/mm/mremap.c	Thu Dec 19 01:08:45 2002
@@ -183,14 +183,16 @@ static unsigned long move_vma(struct vm_
 	next = find_vma_prev(mm, new_addr, &prev);
 	if (next) {
 		if (prev && prev->vm_end == new_addr &&
-		    can_vma_merge(prev, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
+				can_vma_merge(prev, vma->vm_flags) &&
+				!(vma->vm_flags & VM_SHARED)) {
 			spin_lock(&mm->page_table_lock);
 			prev->vm_end = new_addr + new_len;
 			spin_unlock(&mm->page_table_lock);
 			new_vma = prev;
 			if (next != prev->vm_next)
 				BUG();
-			if (prev->vm_end == next->vm_start && can_vma_merge(next, prev->vm_flags)) {
+			if (prev->vm_end == next->vm_start &&
+					can_vma_merge(next, prev->vm_flags)) {
 				spin_lock(&mm->page_table_lock);
 				prev->vm_end = next->vm_end;
 				__vma_unlink(mm, next, prev);
@@ -201,7 +203,8 @@ static unsigned long move_vma(struct vm_
 				kmem_cache_free(vm_area_cachep, next);
 			}
 		} else if (next->vm_start == new_addr + new_len &&
-			   can_vma_merge(next, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
+					can_vma_merge(next, vma->vm_flags) &&
+					!(vma->vm_flags & VM_SHARED)) {
 			spin_lock(&mm->page_table_lock);
 			next->vm_start = new_addr;
 			spin_unlock(&mm->page_table_lock);
@@ -210,7 +213,8 @@ static unsigned long move_vma(struct vm_
 	} else {
 		prev = find_vma(mm, new_addr-1);
 		if (prev && prev->vm_end == new_addr &&
-		    can_vma_merge(prev, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
+				can_vma_merge(prev, vma->vm_flags) &&
+				!(vma->vm_flags & VM_SHARED)) {
 			spin_lock(&mm->page_table_lock);
 			prev->vm_end = new_addr + new_len;
 			spin_unlock(&mm->page_table_lock);
@@ -227,12 +231,16 @@ static unsigned long move_vma(struct vm_
 	}
 
 	if (!move_page_tables(vma, new_addr, addr, old_len)) {
+		unsigned long must_fault_in;
+		unsigned long fault_in_start;
+		unsigned long fault_in_end;
+
 		if (allocated_vma) {
 			*new_vma = *vma;
 			INIT_LIST_HEAD(&new_vma->shared);
 			new_vma->vm_start = new_addr;
 			new_vma->vm_end = new_addr+new_len;
-			new_vma->vm_pgoff += (addr - vma->vm_start) >> PAGE_SHIFT;
+			new_vma->vm_pgoff += (addr-vma->vm_start) >> PAGE_SHIFT;
 			if (new_vma->vm_file)
 				get_file(new_vma->vm_file);
 			if (new_vma->vm_ops && new_vma->vm_ops->open)
@@ -251,19 +259,25 @@ static unsigned long move_vma(struct vm_
 		} else
 			vma = NULL;		/* nothing more to do */
 
-		do_munmap(current->mm, addr, old_len);
-
 		/* Restore VM_ACCOUNT if one or two pieces of vma left */
 		if (vma) {
 			vma->vm_flags |= VM_ACCOUNT;
 			if (split)
 				vma->vm_next->vm_flags |= VM_ACCOUNT;
 		}
+
+		must_fault_in = new_vma->vm_flags & VM_LOCKED;
+		fault_in_start = new_vma->vm_start;
+		fault_in_end = new_vma->vm_end;
+
+		do_munmap(current->mm, addr, old_len);
+
+		/* new_vma could have been invalidated by do_munmap */
+
 		current->mm->total_vm += new_len >> PAGE_SHIFT;
-		if (new_vma->vm_flags & VM_LOCKED) {
+		if (must_fault_in) {
 			current->mm->locked_vm += new_len >> PAGE_SHIFT;
-			make_pages_present(new_vma->vm_start,
-					   new_vma->vm_end);
+			make_pages_present(fault_in_start, fault_in_end);
 		}
 		return new_addr;
 	}

_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.52-mm2
  2002-12-19  9:41 ` 2.5.52-mm2 Andrew Morton
@ 2002-12-19  9:50   ` Andrew Morton
  2002-12-19 17:02   ` mremap use-after-free [was Re: 2.5.52-mm2] Hugh Dickins
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2002-12-19  9:50 UTC (permalink / raw)
  To: lkml, linux-mm

Andrew Morton wrote:
> 
> Andrew Morton wrote:
> >
> > ...
> > slab-poisoning.patch
> >   more informative slab poisoning
> 
> This patch has exposed a quite long-standing use-after-free bug in
> mremap().  It make the machine go BUG when starting the X server if
> memory debugging is turned on.
> 
> The bug might be present in 2.4 as well..

here's a 2.4 patch:




move_vma() calls do_munmap() and then uses the memory at *new_vma.

But when starting X11 it just happens that the memory which do_munmap
unmapped had the same start address and the range at *new_vma.  So
new_vma is freed by do_munmap().

This was never noticed before because (vm_flags & VM_LOCKED) evaluates
false when vm_flags is 0x5a5a5a5a.  But I just changed that to
0x6b6b6b6b and we call make_pages_present() with start == end ==
0x6b6b6b6b and it goes BUG.

The fix is for move_vma() to not inspect the values of any vma's after
it has called do_munmap().

The patch does that, for `vma' and `new_vma'.

It also removes three redundant tests of !vma->vm_file.  can_vma_merge()
already tested that.


 mm/mremap.c |   31 +++++++++++++++++++++++--------
 1 files changed, 23 insertions(+), 8 deletions(-)

--- 24/mm/mremap.c~move_vma-use-after-free	Thu Dec 19 01:29:52 2002
+++ 24-akpm/mm/mremap.c	Thu Dec 19 01:31:43 2002
@@ -134,14 +134,16 @@ static inline unsigned long move_vma(str
 	next = find_vma_prev(mm, new_addr, &prev);
 	if (next) {
 		if (prev && prev->vm_end == new_addr &&
-		    can_vma_merge(prev, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
+				can_vma_merge(prev, vma->vm_flags) &&
+				!(vma->vm_flags & VM_SHARED)) {
 			spin_lock(&mm->page_table_lock);
 			prev->vm_end = new_addr + new_len;
 			spin_unlock(&mm->page_table_lock);
 			new_vma = prev;
 			if (next != prev->vm_next)
 				BUG();
-			if (prev->vm_end == next->vm_start && can_vma_merge(next, prev->vm_flags)) {
+			if (prev->vm_end == next->vm_start &&
+					can_vma_merge(next, prev->vm_flags)) {
 				spin_lock(&mm->page_table_lock);
 				prev->vm_end = next->vm_end;
 				__vma_unlink(mm, next, prev);
@@ -151,7 +153,8 @@ static inline unsigned long move_vma(str
 				kmem_cache_free(vm_area_cachep, next);
 			}
 		} else if (next->vm_start == new_addr + new_len &&
-			   can_vma_merge(next, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
+					can_vma_merge(next, vma->vm_flags) &&
+					!(vma->vm_flags & VM_SHARED)) {
 			spin_lock(&mm->page_table_lock);
 			next->vm_start = new_addr;
 			spin_unlock(&mm->page_table_lock);
@@ -160,7 +163,8 @@ static inline unsigned long move_vma(str
 	} else {
 		prev = find_vma(mm, new_addr-1);
 		if (prev && prev->vm_end == new_addr &&
-		    can_vma_merge(prev, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
+				can_vma_merge(prev, vma->vm_flags) &&
+				!(vma->vm_flags & VM_SHARED)) {
 			spin_lock(&mm->page_table_lock);
 			prev->vm_end = new_addr + new_len;
 			spin_unlock(&mm->page_table_lock);
@@ -177,11 +181,15 @@ static inline unsigned long move_vma(str
 	}
 
 	if (!move_page_tables(current->mm, new_addr, addr, old_len)) {
+		unsigned long must_fault_in;
+		unsigned long fault_in_start;
+		unsigned long fault_in_end;
+
 		if (allocated_vma) {
 			*new_vma = *vma;
 			new_vma->vm_start = new_addr;
 			new_vma->vm_end = new_addr+new_len;
-			new_vma->vm_pgoff += (addr - vma->vm_start) >> PAGE_SHIFT;
+			new_vma->vm_pgoff += (addr-vma->vm_start) >> PAGE_SHIFT;
 			new_vma->vm_raend = 0;
 			if (new_vma->vm_file)
 				get_file(new_vma->vm_file);
@@ -189,12 +197,19 @@ static inline unsigned long move_vma(str
 				new_vma->vm_ops->open(new_vma);
 			insert_vm_struct(current->mm, new_vma);
 		}
+
+		must_fault_in = new_vma->vm_flags & VM_LOCKED;
+		fault_in_start = new_vma->vm_start;
+		fault_in_end = new_vma->vm_end;
+
 		do_munmap(current->mm, addr, old_len);
+
+		/* new_vma could have been invalidated by do_munmap */
+
 		current->mm->total_vm += new_len >> PAGE_SHIFT;
-		if (new_vma->vm_flags & VM_LOCKED) {
+		if (must_fault_in) {
 			current->mm->locked_vm += new_len >> PAGE_SHIFT;
-			make_pages_present(new_vma->vm_start,
-					   new_vma->vm_end);
+			make_pages_present(fault_in_start, fault_in_end);
 		}
 		return new_addr;
 	}

_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.52-mm2
  2002-12-19  9:28   ` 2.5.52-mm2 William Lee Irwin III
@ 2002-12-19 10:12     ` William Lee Irwin III
  2002-12-19 10:31       ` 2.5.52-mm2 Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: William Lee Irwin III @ 2002-12-19 10:12 UTC (permalink / raw)
  To: Andrew Morton, lkml, linux-mm

On Wed, Dec 18, 2002 at 09:53:18PM -0800, Andrew Morton wrote:
>>> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.52/2.5.52-mm2/

On Thu, Dec 19, 2002 at 12:54:26AM -0800, William Lee Irwin III wrote:
>> Kernel compile on ramfs, shpte off, overcommit on (probably more like a
>> stress test for shpte):

On Thu, Dec 19, 2002 at 01:28:53AM -0800, William Lee Irwin III wrote:
> With shpte on:

With the following patch:

c013d4d4 94944    0.310788    zap_pte_range
c01355d0 104773   0.342962    nr_free_pages
c014f65c 107566   0.352105    __fput
c01b1750 112055   0.366799    __copy_user_intel
c0115350 121040   0.39621     smp_apic_timer_interrupt
c0119814 126089   0.412738    kmap_atomic
c014b6cc 145095   0.474952    pte_unshare
c01fb11c 145992   0.477888    sync_buffer
c0122a78 148079   0.484719    current_kernel_time
c01168b8 193805   0.634398    x86_profile_hook
c013f140 205233   0.671806    do_no_page
c0164aac 235356   0.77041     d_lookup
c01b18f8 257358   0.842431    __copy_from_user
c0131f7c 275559   0.90201     find_get_page
c011a560 282341   0.92421     scheduler_tick
c0140090 300128   0.982434    vm_enough_memory
c013f4bc 310474   1.0163      handle_mm_fault
c014f3d0 312725   1.02367     get_empty_filp
c011a0a8 365066   1.195       load_balance
c014f9e9 502737   1.64565     .text.lock.file_table
c01b1890 719105   2.35391     __copy_to_user
c0135768 911894   2.98498     __get_page_state
c013ee50 952823   3.11895     do_anonymous_page
c01436d0 1079864  3.53481     page_add_rmap
c01438cc 1186938  3.8853      page_remove_rmap
c0106f38 17763755 58.1476     poll_idle


pfn_to_nid() got lots of icache misses. Try using a macro.

 arch/i386/kernel/i386_ksyms.c |    1 -
 arch/i386/kernel/numaq.c      |   15 ++-------------
 include/asm-i386/numaq.h      |    3 ++-
 3 files changed, 4 insertions(+), 15 deletions(-)


diff -urpN linux-2.5.52-mm1/arch/i386/kernel/i386_ksyms.c mm1-2.5.52-1/arch/i386/kernel/i386_ksyms.c
--- linux-2.5.52-mm1/arch/i386/kernel/i386_ksyms.c	2002-12-16 19:29:45.000000000 -0800
+++ mm1-2.5.52-1/arch/i386/kernel/i386_ksyms.c	2002-12-17 08:47:25.000000000 -0800
@@ -67,7 +67,6 @@ EXPORT_SYMBOL(EISA_bus);
 EXPORT_SYMBOL(MCA_bus);
 #ifdef CONFIG_DISCONTIGMEM
 EXPORT_SYMBOL(node_data);
-EXPORT_SYMBOL(pfn_to_nid);
 #endif
 #ifdef CONFIG_X86_NUMAQ
 EXPORT_SYMBOL(xquad_portio);
diff -urpN linux-2.5.52-mm1/arch/i386/kernel/numaq.c mm1-2.5.52-1/arch/i386/kernel/numaq.c
--- linux-2.5.52-mm1/arch/i386/kernel/numaq.c	2002-12-15 18:08:13.000000000 -0800
+++ mm1-2.5.52-1/arch/i386/kernel/numaq.c	2002-12-17 08:51:44.000000000 -0800
@@ -27,6 +27,7 @@
 #include <linux/mm.h>
 #include <linux/bootmem.h>
 #include <linux/mmzone.h>
+#include <linux/module.h>
 #include <asm/numaq.h>
 
 /* These are needed before the pgdat's are created */
@@ -82,19 +83,7 @@ static void __init smp_dump_qct(void)
  * physnode_map[8- ] = -1;
  */
 int physnode_map[MAX_ELEMENTS] = { [0 ... (MAX_ELEMENTS - 1)] = -1};
-
-#define PFN_TO_ELEMENT(pfn) (pfn / PAGES_PER_ELEMENT)
-#define PA_TO_ELEMENT(pa) (PFN_TO_ELEMENT(pa >> PAGE_SHIFT))
-
-int pfn_to_nid(unsigned long pfn)
-{
-	int nid = physnode_map[PFN_TO_ELEMENT(pfn)];
-
-	if (nid == -1)
-		BUG(); /* address is not present */
-
-	return nid;
-}
+EXPORT_SYMBOL(physnode_map);
 
 /*
  * for each node mark the regions
diff -urpN linux-2.5.52-mm1/include/asm-i386/numaq.h mm1-2.5.52-1/include/asm-i386/numaq.h
--- linux-2.5.52-mm1/include/asm-i386/numaq.h	2002-12-15 18:08:09.000000000 -0800
+++ mm1-2.5.52-1/include/asm-i386/numaq.h	2002-12-17 08:45:19.000000000 -0800
@@ -38,10 +38,11 @@
 #define MAX_ELEMENTS 256
 #define PAGES_PER_ELEMENT (16777216/256)
 
+extern int physnode_map[];
+#define pfn_to_nid(pfn)	({ physnode_map[(pfn) / PAGES_PER_ELEMENT]; })
 #define pfn_to_pgdat(pfn) NODE_DATA(pfn_to_nid(pfn))
 #define PHYSADDR_TO_NID(pa) pfn_to_nid(pa >> PAGE_SHIFT)
 #define MAX_NUMNODES		8
-extern int pfn_to_nid(unsigned long);
 extern void get_memcfg_numaq(void);
 #define get_memcfg_numa() get_memcfg_numaq()
 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.52-mm2
  2002-12-19 10:12     ` 2.5.52-mm2 William Lee Irwin III
@ 2002-12-19 10:31       ` Andrew Morton
  2002-12-19 10:51         ` 2.5.52-mm2 William Lee Irwin III
  2002-12-19 15:22         ` 2.5.52-mm2 Martin J. Bligh
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2002-12-19 10:31 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: lkml, linux-mm

William Lee Irwin III wrote:
> 
> On Wed, Dec 18, 2002 at 09:53:18PM -0800, Andrew Morton wrote:
> >>> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.52/2.5.52-mm2/
> 
> On Thu, Dec 19, 2002 at 12:54:26AM -0800, William Lee Irwin III wrote:
> >> Kernel compile on ramfs, shpte off, overcommit on (probably more like a
> >> stress test for shpte):
> 
> On Thu, Dec 19, 2002 at 01:28:53AM -0800, William Lee Irwin III wrote:
> > With shpte on:
> 
> With the following patch:
> 
> c013d4d4 94944    0.310788    zap_pte_range
> c01355d0 104773   0.342962    nr_free_pages
> c014f65c 107566   0.352105    __fput
> c01b1750 112055   0.366799    __copy_user_intel
> c0115350 121040   0.39621     smp_apic_timer_interrupt
> c0119814 126089   0.412738    kmap_atomic
> c014b6cc 145095   0.474952    pte_unshare
> c01fb11c 145992   0.477888    sync_buffer
> c0122a78 148079   0.484719    current_kernel_time
> c01168b8 193805   0.634398    x86_profile_hook
> c013f140 205233   0.671806    do_no_page
> c0164aac 235356   0.77041     d_lookup
> c01b18f8 257358   0.842431    __copy_from_user
> c0131f7c 275559   0.90201     find_get_page
> c011a560 282341   0.92421     scheduler_tick
> c0140090 300128   0.982434    vm_enough_memory
> c013f4bc 310474   1.0163      handle_mm_fault
> c014f3d0 312725   1.02367     get_empty_filp
> c011a0a8 365066   1.195       load_balance
> c014f9e9 502737   1.64565     .text.lock.file_table
> c01b1890 719105   2.35391     __copy_to_user
> c0135768 911894   2.98498     __get_page_state
> c013ee50 952823   3.11895     do_anonymous_page
> c01436d0 1079864  3.53481     page_add_rmap
> c01438cc 1186938  3.8853      page_remove_rmap
> c0106f38 17763755 58.1476     poll_idle

Is that improved?

> pfn_to_nid() got lots of icache misses. Try using a macro.

What's the callsite?

Actually, just looking at mmzone.h, I have to say "ick".  The
non-NUMA case seems unnecessarily overdone.  eg:

#define page_to_pfn(page)
	((page - page_zone(page)->zone_mem_map) + page_zone(page)->zone_start_pfn)

Ouch.  Why can't we have the good old `page - mem_map' here?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.52-mm2
  2002-12-19 10:31       ` 2.5.52-mm2 Andrew Morton
@ 2002-12-19 10:51         ` William Lee Irwin III
  2002-12-19 15:22         ` 2.5.52-mm2 Martin J. Bligh
  1 sibling, 0 replies; 11+ messages in thread
From: William Lee Irwin III @ 2002-12-19 10:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, linux-mm

William Lee Irwin III wrote:
>> c014f9e9 502737   1.64565     .text.lock.file_table
>> c01b1890 719105   2.35391     __copy_to_user
>> c0135768 911894   2.98498     __get_page_state
>> c013ee50 952823   3.11895     do_anonymous_page
>> c01436d0 1079864  3.53481     page_add_rmap
>> c01438cc 1186938  3.8853      page_remove_rmap
>> c0106f38 17763755 58.1476     poll_idle

On Thu, Dec 19, 2002 at 02:31:32AM -0800, Andrew Morton wrote:
> Is that improved?

Only in the qualitative sense. No statistically significant differences
in running times are observable.


William Lee Irwin III wrote:
>> pfn_to_nid() got lots of icache misses. Try using a macro.

On Thu, Dec 19, 2002 at 02:31:32AM -0800, Andrew Morton wrote:
> What's the callsite?

I was not able to collect this information and would be much obliged
to hear of how to do so for instruction cache profiling.


On Thu, Dec 19, 2002 at 02:31:32AM -0800, Andrew Morton wrote:
> Actually, just looking at mmzone.h, I have to say "ick".  The
> non-NUMA case seems unnecessarily overdone.  eg:
> #define page_to_pfn(page)
> 	((page - page_zone(page)->zone_mem_map) + page_zone(page)->zone_start_pfn)
> Ouch.  Why can't we have the good old `page - mem_map' here?

There is no reason why it could not be re-established for the Pee Cee
case. IMHO It would be excellent to have the proper architecture
discrimination defines in place so that Pee Cees could utilize the vastly
simpler calculation. So long as this is devolved to arch code there is no
danger of page->virtual being required by arch-independent code.


Thanks,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.5.52-mm2
  2002-12-19 10:31       ` 2.5.52-mm2 Andrew Morton
  2002-12-19 10:51         ` 2.5.52-mm2 William Lee Irwin III
@ 2002-12-19 15:22         ` Martin J. Bligh
  1 sibling, 0 replies; 11+ messages in thread
From: Martin J. Bligh @ 2002-12-19 15:22 UTC (permalink / raw)
  To: Andrew Morton, William Lee Irwin III; +Cc: lkml, linux-mm

> Actually, just looking at mmzone.h, I have to say "ick".  The
> non-NUMA case seems unnecessarily overdone.  eg:
>
># define page_to_pfn(page)
> 	((page - page_zone(page)->zone_mem_map) +
> page_zone(page)->zone_start_pfn)
>
> Ouch.  Why can't we have the good old `page - mem_map' here?

Ummm .... mmzone.h:

#ifdef CONFIG_DISCONTIGMEM
....
#define page_to_pfn(page)       ((page - page_zone(page)->zone_mem_map) + 
page_zone(page)->zone_start_pfn)
....
#endif /* CONFIG_DISCONTIGMEM */

page.h:

#ifndef CONFIG_DISCONTIGMEM
#define page_to_pfn(page)       ((unsigned long)((page) - mem_map))
#endif /* !CONFIG_DISCONTIGMEM */


I'll admit the file obfuscation hides this from being easy to read, but
i'm not stupid enough to screw things up *that* badly. Well, not most
of the time ;-) Want me to reshuffle things around so that the same defines
end up in the same file, and people have a hope in hell of reading it?
If I do that, it'll probably be based on the struct page breakout patch,
and making these things all static inlines, so people stop blowing their
own feet off.

M.

PS. cscope is cool ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* mremap use-after-free [was Re: 2.5.52-mm2]
  2002-12-19  9:41 ` 2.5.52-mm2 Andrew Morton
  2002-12-19  9:50   ` 2.5.52-mm2 Andrew Morton
@ 2002-12-19 17:02   ` Hugh Dickins
  2002-12-19 18:18     ` Andrew Morton
  1 sibling, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2002-12-19 17:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, linux-mm

On Thu, 19 Dec 2002, Andrew Morton wrote:
> Andrew Morton wrote:
> > ...
> > slab-poisoning.patch
> >   more informative slab poisoning
> 
> This patch has exposed a quite long-standing use-after-free bug in
> mremap().  It make the machine go BUG when starting the X server if
> memory debugging is turned on.

Good catch, shame about the patch!
Please don't apply this, or its 2.4 sister, as is.

> The bug might be present in 2.4 as well..

I doubt that (but may be wrong, I haven't time right now to think as
twistedly as mremap demands).  The code (patently!) expects new_vma
to be good at the end, it certainly wasn't intending to unmap it;
but 2.5 split_vma has been through a couple of convulsions, either
of which might have resulted in the potential for new_vma to be
freed where before it was guaranteed to remain.

Do you know the vmas before and after, and the mremap which did this?
I couldn't reproduce it starting X here, and an example would help to
uncloud my mind.  But you can reasonably answer that one example won't
prove anything, and where there's doubt, the code must be defensive.
(Besides, I'll be offline shortly.)

On to the patch...

> --- 25/mm/mremap.c~move_vma-use-after-free	Thu Dec 19 00:51:49 2002
> +++ 25-akpm/mm/mremap.c	Thu Dec 19 01:08:45 2002
> @@ -183,14 +183,16 @@ static unsigned long move_vma(struct vm_
>  	next = find_vma_prev(mm, new_addr, &prev);
>  	if (next) {
>  		if (prev && prev->vm_end == new_addr &&
> -		    can_vma_merge(prev, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
> +				can_vma_merge(prev, vma->vm_flags) &&
> +				!(vma->vm_flags & VM_SHARED)) {
>  			spin_lock(&mm->page_table_lock);
>  			prev->vm_end = new_addr + new_len;
>  			spin_unlock(&mm->page_table_lock);
>  			new_vma = prev;
>  			if (next != prev->vm_next)
>  				BUG();
> -			if (prev->vm_end == next->vm_start && can_vma_merge(next, prev->vm_flags)) {
> +			if (prev->vm_end == next->vm_start &&
> +					can_vma_merge(next, prev->vm_flags)) {
>  				spin_lock(&mm->page_table_lock);
>  				prev->vm_end = next->vm_end;
>  				__vma_unlink(mm, next, prev);
> @@ -201,7 +203,8 @@ static unsigned long move_vma(struct vm_
>  				kmem_cache_free(vm_area_cachep, next);
>  			}
>  		} else if (next->vm_start == new_addr + new_len &&
> -			   can_vma_merge(next, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
> +					can_vma_merge(next, vma->vm_flags) &&
> +					!(vma->vm_flags & VM_SHARED)) {
>  			spin_lock(&mm->page_table_lock);
>  			next->vm_start = new_addr;
>  			spin_unlock(&mm->page_table_lock);
> @@ -210,7 +213,8 @@ static unsigned long move_vma(struct vm_
>  	} else {
>  		prev = find_vma(mm, new_addr-1);
>  		if (prev && prev->vm_end == new_addr &&
> -		    can_vma_merge(prev, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
> +				can_vma_merge(prev, vma->vm_flags) &&
> +				!(vma->vm_flags & VM_SHARED)) {
>  			spin_lock(&mm->page_table_lock);
>  			prev->vm_end = new_addr + new_len;
>  			spin_unlock(&mm->page_table_lock);

Hmmm.  Am I right to suppose that all the changes above are "cleanup"
which you couldn't resist making while you looked through this code,
but entirely irrelevant to the bug in question?  If those mods above
were right, they should be the subject of a separate patch.

There's certainly room for cleanup there, but my preference would be
to remove "can_vma_merge" completely, or at least its use in mremap.c,
using its explicit tests instead.  It looks like it was originally
quite appropriate for a use or two in mmap.c, but obscurely unhelpful
here - because in itself it is testing a bizarre asymmetric subset of
what's needed (that subset which remained to be tested in its original
use in mmap.c).

The problem with your changes above is, you've removed the !vma->vm_file
tests, presumably because you noticed that can_vma_merge already tests
!vma->vm_file.  But "vma" within can_vma_merge is "prev" or "next" here:
they are distinct tests, and you're now liable to merge an anonymous
mapping with a private file mapping - nice if it's from /dev/zero,
but otherwise not.  Please just cut those hunks out.

(Of course, I wouldn't have spotted this if I hadn't embarked on,
then retreated from, a similar cleanup myself a few months back.)

> @@ -227,12 +231,16 @@ static unsigned long move_vma(struct vm_
>  	}
>  
>  	if (!move_page_tables(vma, new_addr, addr, old_len)) {
> +		unsigned long must_fault_in;
> +		unsigned long fault_in_start;
> +		unsigned long fault_in_end;
> +
>  		if (allocated_vma) {
>  			*new_vma = *vma;
>  			INIT_LIST_HEAD(&new_vma->shared);
>  			new_vma->vm_start = new_addr;
>  			new_vma->vm_end = new_addr+new_len;
> -			new_vma->vm_pgoff += (addr - vma->vm_start) >> PAGE_SHIFT;
> +			new_vma->vm_pgoff += (addr-vma->vm_start) >> PAGE_SHIFT;

Hrrmph.

>  			if (new_vma->vm_file)
>  				get_file(new_vma->vm_file);
>  			if (new_vma->vm_ops && new_vma->vm_ops->open)
> @@ -251,19 +259,25 @@ static unsigned long move_vma(struct vm_
>  		} else
>  			vma = NULL;		/* nothing more to do */
>  
> -		do_munmap(current->mm, addr, old_len);
> -

Anguished cry!  There was careful manipulation of VM_ACCOUNT before and
after do_munmap, now you've for no reason moved do_munmap down outside.

>  		/* Restore VM_ACCOUNT if one or two pieces of vma left */
>  		if (vma) {
>  			vma->vm_flags |= VM_ACCOUNT;
>  			if (split)
>  				vma->vm_next->vm_flags |= VM_ACCOUNT;
>  		}
> +
> +		must_fault_in = new_vma->vm_flags & VM_LOCKED;
> +		fault_in_start = new_vma->vm_start;
> +		fault_in_end = new_vma->vm_end;
> +
> +		do_munmap(current->mm, addr, old_len);
> +
> +		/* new_vma could have been invalidated by do_munmap */
> +
>  		current->mm->total_vm += new_len >> PAGE_SHIFT;
> -		if (new_vma->vm_flags & VM_LOCKED) {
> +		if (must_fault_in) {
>  			current->mm->locked_vm += new_len >> PAGE_SHIFT;
> -			make_pages_present(new_vma->vm_start,
> -					   new_vma->vm_end);
> +			make_pages_present(fault_in_start, fault_in_end);
>  		}
>  		return new_addr;
>  	}

But the bugfix part of it looks good.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mremap use-after-free [was Re: 2.5.52-mm2]
  2002-12-19 17:02   ` mremap use-after-free [was Re: 2.5.52-mm2] Hugh Dickins
@ 2002-12-19 18:18     ` Andrew Morton
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2002-12-19 18:18 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: lkml, linux-mm

Hugh Dickins wrote:
> 
> ...
> I doubt that (but may be wrong, I haven't time right now to think as
> twistedly as mremap demands).

Maybe, maybe not.  Think about it - because VM_LOCKED is cleared
by slab poisoning, the chances of anyone noticing it are very small.
 
> The code (patently!) expects new_vma
> to be good at the end, it certainly wasn't intending to unmap it;
> but 2.5 split_vma has been through a couple of convulsions, either
> of which might have resulted in the potential for new_vma to be
> freed where before it was guaranteed to remain.

I see no "guarantees" around do_munmap() at all.  The whole thing
is foul; no wonder it keeps blowing up.

It died partway into kde startup.  It was in fact the first mremap
call.

> Do you know the vmas before and after, and the mremap which did this?

Breakpoint 1, move_vma (vma=0xcf1ed734, addr=1077805056, old_len=36864, new_len=204800, new_addr=1078050816) at mm/mremap.c:176
176     {
(gdb) n
183             next = find_vma_prev(mm, new_addr, &prev);
(gdb) 
177             struct mm_struct * mm = vma->vm_mm;
(gdb) 
180             int split = 0;
(gdb) 
182             new_vma = NULL;
(gdb) 
183             next = find_vma_prev(mm, new_addr, &prev);
(gdb) 
184             if (next) {
(gdb) p/x next
$1 = 0xce3bee84
(gdb) p/x *next
$2 = {vm_mm = 0xcf7cc624, vm_start = 0xbffcd000, vm_end = 0xc0000000, vm_next = 0x0, vm_page_prot = {pgprot = 0x25}, 
  vm_flags = 0x100177, vm_rb = {rb_parent = 0xcf1ed74c, rb_color = 0x1, rb_right = 0x0, rb_left = 0x0}, shared = {next = 0xce3beeac, 
    prev = 0xce3beeac}, vm_ops = 0x0, vm_pgoff = 0xffffffce, vm_file = 0x0, vm_private_data = 0x0}
(gdb) p prev
$3 = (struct vm_area_struct *) 0xcf1ed734
(gdb) p/x *prev
$4 = {vm_mm = 0xcf7cc624, vm_start = 0x403e0000, vm_end = 0x4041c000, vm_next = 0xce3bee84, vm_page_prot = {pgprot = 0x25}, 
  vm_flags = 0x100073, vm_rb = {rb_parent = 0xce3be4c4, rb_color = 0x0, rb_right = 0xce3bee9c, rb_left = 0xce3be1ac}, shared = {
    next = 0xcf1ed75c, prev = 0xcf1ed75c}, vm_ops = 0x0, vm_pgoff = 0x0, vm_file = 0x0, vm_private_data = 0x0}


> ...
> 
> Hmmm.  Am I right to suppose that all the changes above are "cleanup"
> which you couldn't resist making while you looked through this code,
> but entirely irrelevant to the bug in question?

to make the code readable so I could work on the bug, it then "accidentally"
got left in.

>  If those mods above
> were right, they should be the subject of a separate patch.

well yes it would be nice to clean that code up.  Like, documenting
it and working out what the locking rules are.

What does this do?
                        spin_lock(&mm->page_table_lock);
                        prev->vm_end = new_addr + new_len;
                        spin_unlock(&mm->page_table_lock);
and this?
                        spin_lock(&mm->page_table_lock);
                        next->vm_start = new_addr;
                        spin_unlock(&mm->page_table_lock);

clearly nothing.  OK, but is this effectively-unlocked code safe?

> There's certainly room for cleanup there, but my preference would be
> to remove "can_vma_merge" completely, or at least its use in mremap.c,
> using its explicit tests instead.  It looks like it was originally
> quite appropriate for a use or two in mmap.c, but obscurely unhelpful
> here - because in itself it is testing a bizarre asymmetric subset of
> what's needed (that subset which remained to be tested in its original
> use in mmap.c).

yes
 
> The problem with your changes above is, you've removed the !vma->vm_file
> tests, presumably because you noticed that can_vma_merge already tests
> !vma->vm_file.  But "vma" within can_vma_merge is "prev" or "next" here:
> they are distinct tests, and you're now liable to merge an anonymous
> mapping with a private file mapping - nice if it's from /dev/zero,
> but otherwise not.  Please just cut those hunks out.

ok
 
> ...
> > -             do_munmap(current->mm, addr, old_len);
> > -
> 
> Anguished cry!  There was careful manipulation of VM_ACCOUNT before and
> after do_munmap, now you've for no reason moved do_munmap down outside.

How do we know that *vma was not altered by the do_munmap() call?

With this change, nobody will touch any locally-cached vma*'s from
the do_munmap() right through to syscall exit.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-12-19 18:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-12-19  5:53 2.5.52-mm2 Andrew Morton
2002-12-19  8:54 ` 2.5.52-mm2 William Lee Irwin III
2002-12-19  9:28   ` 2.5.52-mm2 William Lee Irwin III
2002-12-19 10:12     ` 2.5.52-mm2 William Lee Irwin III
2002-12-19 10:31       ` 2.5.52-mm2 Andrew Morton
2002-12-19 10:51         ` 2.5.52-mm2 William Lee Irwin III
2002-12-19 15:22         ` 2.5.52-mm2 Martin J. Bligh
2002-12-19  9:41 ` 2.5.52-mm2 Andrew Morton
2002-12-19  9:50   ` 2.5.52-mm2 Andrew Morton
2002-12-19 17:02   ` mremap use-after-free [was Re: 2.5.52-mm2] Hugh Dickins
2002-12-19 18:18     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox