* Re: Updated Linux 2.4 Status/TODO List (from the ALS show)
@ 2000-10-13 0:20 davej
2000-10-13 0:29 ` David S. Miller
` (2 more replies)
0 siblings, 3 replies; 24+ messages in thread
From: davej @ 2000-10-13 0:20 UTC (permalink / raw)
To: tytso, Linus Torvalds; +Cc: Linux Kernel Mailing List, linux-mm
> 9. To Do
> * mm->rss is modified in some places without holding the
> page_table_lock (sct)
Any of the mm gurus give the patch below a quick once over ?
Is this adequate, or is there more to this than the description implies?
regards,
Dave.
--
| Dave Jones <davej@suse.de> http://www.suse.de/~davej
| SuSE Labs
diff -urN --exclude-from=/home/davej/.exclude linux/mm/memory.c linux.dj/mm/memory.c
--- linux/mm/memory.c Sat Sep 16 00:51:21 2000
+++ linux.dj/mm/memory.c Wed Oct 11 23:41:10 2000
@@ -370,7 +370,6 @@
address = (address + PGDIR_SIZE) & PGDIR_MASK;
dir++;
} while (address && (address < end));
- spin_unlock(&mm->page_table_lock);
/*
* Update rss for the mm_struct (not necessarily current->mm)
* Notice that rss is an unsigned long.
@@ -379,6 +378,7 @@
mm->rss -= freed;
else
mm->rss = 0;
+ spin_unlock(&mm->page_table_lock);
}
@@ -1074,7 +1074,9 @@
flush_icache_page(vma, page);
}
+ spin_lock(&mm->page_table_lock);
mm->rss++;
+ spin_unlock(&mm->page_table_lock);
pte = mk_pte(page, vma->vm_page_prot);
@@ -1113,7 +1115,9 @@
return -1;
clear_user_highpage(page, addr);
entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
+ spin_lock(&mm->page_table_lock);
mm->rss++;
+ spin_unlock(&mm->page_table_lock);
flush_page_to_ram(page);
}
set_pte(page_table, entry);
@@ -1152,7 +1156,9 @@
return 0;
if (new_page == NOPAGE_OOM)
return -1;
+ spin_lock(&mm->page_table_lock);
++mm->rss;
+ spin_unlock(&mm->page_table_lock);
/*
* This silly early PAGE_DIRTY setting removes a race
* due to the bad i386 page protection. But it's valid
diff -urN --exclude-from=/home/davej/.exclude linux/mm/mmap.c linux.dj/mm/mmap.c
--- linux/mm/mmap.c Tue Aug 29 20:41:12 2000
+++ linux.dj/mm/mmap.c Wed Oct 11 23:48:30 2000
@@ -844,7 +844,9 @@
vmlist_modify_lock(mm);
mm->mmap = mm->mmap_avl = mm->mmap_cache = NULL;
vmlist_modify_unlock(mm);
+ spin_lock (&mm->page_table_lock);
mm->rss = 0;
+ spin_unlock (&mm->page_table_lock);
mm->total_vm = 0;
mm->locked_vm = 0;
while (mpnt) {
diff -urN --exclude-from=/home/davej/.exclude linux/mm/vmscan.c linux.dj/mm/vmscan.c
--- linux/mm/vmscan.c Mon Oct 2 20:02:20 2000
+++ linux.dj/mm/vmscan.c Wed Oct 11 23:46:01 2000
@@ -102,7 +102,9 @@
set_pte(page_table, swp_entry_to_pte(entry));
drop_pte:
UnlockPage(page);
+ spin_lock (&mm->page_table_lock);
mm->rss--;
+ spin_unlock (&mm->page_table_lock);
flush_tlb_page(vma, address);
deactivate_page(page);
page_cache_release(page);
@@ -170,7 +172,9 @@
struct file *file = vma->vm_file;
if (file) get_file(file);
pte_clear(page_table);
+ spin_lock (&mm->page_table_lock);
mm->rss--;
+ spin_unlock (&mm->page_table_lock);
flush_tlb_page(vma, address);
vmlist_access_unlock(mm);
error = swapout(page, file);
@@ -202,7 +206,9 @@
add_to_swap_cache(page, entry);
/* Put the swap entry into the pte after the page is in swapcache */
+ spin_lock (&mm->page_table_lock);
mm->rss--;
+ spin_unlock (&mm->page_table_lock);
set_pte(page_table, swp_entry_to_pte(entry));
flush_tlb_page(vma, address);
vmlist_access_unlock(mm);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 0:20 Updated Linux 2.4 Status/TODO List (from the ALS show) davej @ 2000-10-13 0:29 ` David S. Miller 2000-10-13 5:02 ` Linus Torvalds 2000-10-13 11:45 ` Alan Cox 2000-10-13 4:34 ` Andrey Savochkin 2000-11-03 11:39 ` BUG FIX?: mm->rss is modified in some places without holding the page_table_lock tytso 2 siblings, 2 replies; 24+ messages in thread From: David S. Miller @ 2000-10-13 0:29 UTC (permalink / raw) To: davej; +Cc: tytso, torvalds, linux-kernel, linux-mm Any of the mm gurus give the patch below a quick once over ? Is this adequate, or is there more to this than the description implies? It might make more sense to just make rss an atomic_t. Later, David S. Miller davem@redhat.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 0:29 ` David S. Miller @ 2000-10-13 5:02 ` Linus Torvalds 2000-10-13 11:45 ` Alan Cox 1 sibling, 0 replies; 24+ messages in thread From: Linus Torvalds @ 2000-10-13 5:02 UTC (permalink / raw) To: David S. Miller; +Cc: davej, tytso, linux-kernel, linux-mm On Thu, 12 Oct 2000, David S. Miller wrote: > > Any of the mm gurus give the patch below a quick once over ? Is > this adequate, or is there more to this than the description > implies? > > It might make more sense to just make rss an atomic_t. Agreed. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 0:29 ` David S. Miller 2000-10-13 5:02 ` Linus Torvalds @ 2000-10-13 11:45 ` Alan Cox 2000-10-13 21:17 ` Richard Henderson 2000-10-13 21:29 ` David S. Miller 1 sibling, 2 replies; 24+ messages in thread From: Alan Cox @ 2000-10-13 11:45 UTC (permalink / raw) To: David S. Miller; +Cc: davej, tytso, torvalds, linux-kernel, linux-mm > Any of the mm gurus give the patch below a quick once over ? Is > this adequate, or is there more to this than the description > implies? > > It might make more sense to just make rss an atomic_t. Can we always be sure the rss will fit in an atomic_t - is it > 32bits on the ultrsparc/alpha ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 11:45 ` Alan Cox @ 2000-10-13 21:17 ` Richard Henderson 2000-10-13 21:19 ` Jakub Jelinek 2000-10-13 22:47 ` Alan Cox 2000-10-13 21:29 ` David S. Miller 1 sibling, 2 replies; 24+ messages in thread From: Richard Henderson @ 2000-10-13 21:17 UTC (permalink / raw) To: Alan Cox; +Cc: David S. Miller, davej, tytso, torvalds, linux-kernel, linux-mm On Fri, Oct 13, 2000 at 12:45:47PM +0100, Alan Cox wrote: > Can we always be sure the rss will fit in an atomic_t - is it > 32bits on the > ultrsparc/alpha ? It is not. r~ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 21:17 ` Richard Henderson @ 2000-10-13 21:19 ` Jakub Jelinek 2000-10-13 21:25 ` Linus Torvalds 2000-10-13 22:47 ` Alan Cox 1 sibling, 1 reply; 24+ messages in thread From: Jakub Jelinek @ 2000-10-13 21:19 UTC (permalink / raw) To: Richard Henderson Cc: Alan Cox, David S. Miller, davej, tytso, torvalds, linux-kernel, linux-mm On Fri, Oct 13, 2000 at 02:17:23PM -0700, Richard Henderson wrote: > On Fri, Oct 13, 2000 at 12:45:47PM +0100, Alan Cox wrote: > > Can we always be sure the rss will fit in an atomic_t - is it > 32bits on the > > ultrsparc/alpha ? > > It is not. It is not even 32bit on sparc32 (24bit only). Jakub -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 21:19 ` Jakub Jelinek @ 2000-10-13 21:25 ` Linus Torvalds 2000-10-13 22:56 ` Richard Henderson 0 siblings, 1 reply; 24+ messages in thread From: Linus Torvalds @ 2000-10-13 21:25 UTC (permalink / raw) To: Jakub Jelinek Cc: Richard Henderson, Alan Cox, David S. Miller, davej, tytso, linux-kernel, linux-mm On Fri, 13 Oct 2000, Jakub Jelinek wrote: > On Fri, Oct 13, 2000 at 02:17:23PM -0700, Richard Henderson wrote: > > On Fri, Oct 13, 2000 at 12:45:47PM +0100, Alan Cox wrote: > > > Can we always be sure the rss will fit in an atomic_t - is it > 32bits on the > > > ultrsparc/alpha ? > > > > It is not. > > It is not even 32bit on sparc32 (24bit only). But remember that "rss" counts in pages, so it's plenty for sparc32: only 32 bits of virtual address that can count towards the rss. And even on alpha, a 32-bit atomic_t means we cover 45 bits of virtual address space, which, btw, is more than you can cram into the current three-level page tables, I think. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 21:25 ` Linus Torvalds @ 2000-10-13 22:56 ` Richard Henderson 0 siblings, 0 replies; 24+ messages in thread From: Richard Henderson @ 2000-10-13 22:56 UTC (permalink / raw) To: Linus Torvalds Cc: Jakub Jelinek, Alan Cox, David S. Miller, davej, tytso, linux-kernel, linux-mm On Fri, Oct 13, 2000 at 02:25:45PM -0700, Linus Torvalds wrote: > And even on alpha, a 32-bit atomic_t means we cover 45 bits of virtual > address space, which, btw, is more than you can cram into the current > three-level page tables, I think. While that's true of Alpha, it's not true of Ultra III, in which all 64-bits are in theory available to the user. Dave hasn't implemented that yet, AFAIK. r~ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 21:17 ` Richard Henderson 2000-10-13 21:19 ` Jakub Jelinek @ 2000-10-13 22:47 ` Alan Cox 2000-10-13 22:57 ` Richard Henderson 1 sibling, 1 reply; 24+ messages in thread From: Alan Cox @ 2000-10-13 22:47 UTC (permalink / raw) To: Richard Henderson Cc: Alan Cox, David S. Miller, davej, tytso, torvalds, linux-kernel, linux-mm > On Fri, Oct 13, 2000 at 12:45:47PM +0100, Alan Cox wrote: > > Can we always be sure the rss will fit in an atomic_t - is it > 32bits on the > > ultrsparc/alpha ? > > It is not. Then we need to use locking to protect the rss since on a big 64bit box we can exceed 2^32 pages in theory and probably soon in practice. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 22:47 ` Alan Cox @ 2000-10-13 22:57 ` Richard Henderson 2000-10-14 0:20 ` David S. Miller 2000-10-14 12:36 ` Roman Zippel 0 siblings, 2 replies; 24+ messages in thread From: Richard Henderson @ 2000-10-13 22:57 UTC (permalink / raw) To: Alan Cox; +Cc: David S. Miller, davej, tytso, torvalds, linux-kernel, linux-mm On Fri, Oct 13, 2000 at 11:47:55PM +0100, Alan Cox wrote: > Then we need to use locking to protect the rss since on a big 64bit box > we can exceed 2^32 pages in theory and probably soon in practice. Either that or adjust how we do atomic operations. I can do 64-bit atomic widgetry, but not with the code as written. r~ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 22:57 ` Richard Henderson @ 2000-10-14 0:20 ` David S. Miller 2000-10-14 12:36 ` Roman Zippel 1 sibling, 0 replies; 24+ messages in thread From: David S. Miller @ 2000-10-14 0:20 UTC (permalink / raw) To: rth; +Cc: alan, davej, tytso, torvalds, linux-kernel, linux-mm Either that or adjust how we do atomic operations. I can do 64-bit atomic widgetry, but not with the code as written. Ultra can do it as well, and as far as I understand it ia64 64-bit atomic_t's shouldn't be a problem either. I would suggest we make a atomic64_t or similar different type. The space savings from using 32-bit normal atomic_t in all other situations is of real value. Later, David S. Miller davem@redhat.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 22:57 ` Richard Henderson 2000-10-14 0:20 ` David S. Miller @ 2000-10-14 12:36 ` Roman Zippel 1 sibling, 0 replies; 24+ messages in thread From: Roman Zippel @ 2000-10-14 12:36 UTC (permalink / raw) To: Richard Henderson Cc: Alan Cox, David S. Miller, davej, tytso, torvalds, linux-kernel, linux-mm Hi, On Fri, 13 Oct 2000, Richard Henderson wrote: > Either that or adjust how we do atomic operations. I can do > 64-bit atomic widgetry, but not with the code as written. It's probably more something for 2.5, but what about adding a lock argument to the atomic operations, then sparc could use that explicit lock and everyone else simply optimizes that away. That would allow us to use the full 32/64 bit. What we could get is a nice generic atomic exchange command like: atomic_exchange(lock, ptr, old, new); Where new can be a (simple) expression which can include old. Especially for risc system every atomic operation in atomic.h can be replaced with this. Or if you need more flexibility the same can be written as: atomic_enter(lock); __atomic_init(ptr, old); do { __atomic_reserve(ptr, old); } while (!__atomic_update(ptr, old, new)); atomic_leave(lock); atomic_enter/atomic_enter are either normal spinlocks or (in most cases) dummys. The other macros are either using RMW instructions or special load/store instructions. Using a lock makes it a bit more difficult to use and especially the last construction must never be required in normal drivers. On the other hand it gets way more flexible as we are not limited to a single atomic_t anymore. If anyone is interested how it could look like, I've put an example at http://zeus.fh-brandenburg.de/~zippel/linux/bin/atomic.tar.gz (It also includes a bit more documentation and some (a bit outdated) examples). Somewhere I also have a patch where I use this to write a spinlock free printk implementation, which is still interrupt and SMP safe. There are still some issues open (like ordering), but I'd like to know if there is a general interest in this. bye, Roman -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 11:45 ` Alan Cox 2000-10-13 21:17 ` Richard Henderson @ 2000-10-13 21:29 ` David S. Miller 1 sibling, 0 replies; 24+ messages in thread From: David S. Miller @ 2000-10-13 21:29 UTC (permalink / raw) To: alan; +Cc: davej, tytso, torvalds, linux-kernel, linux-mm > It might make more sense to just make rss an atomic_t. Can we always be sure the rss will fit in an atomic_t - is it > 32bits on the ultrsparc/alpha ? Yes, this issue occurred to me last night as well. It is 32-bit on Alpha/UltraSparc. However, given the fact that this number measures "pages", the PAGE_SIZE on Ultra/Alpha, and the size of the 64-bit user address space on Ultra and Alpha, it would actually end up working. This doesn't make it a good idea though. Later, David S. Miller davem@redhat.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 0:20 Updated Linux 2.4 Status/TODO List (from the ALS show) davej 2000-10-13 0:29 ` David S. Miller @ 2000-10-13 4:34 ` Andrey Savochkin 2000-10-13 4:25 ` David S. Miller 2000-11-03 11:39 ` BUG FIX?: mm->rss is modified in some places without holding the page_table_lock tytso 2 siblings, 1 reply; 24+ messages in thread From: Andrey Savochkin @ 2000-10-13 4:34 UTC (permalink / raw) To: davej, David S. Miller Cc: Linux Kernel Mailing List, linux-mm, tytso, Linus Torvalds Hello, On Fri, Oct 13, 2000 at 01:20:23AM +0100, davej@suse.de wrote: > > 9. To Do > > * mm->rss is modified in some places without holding the > > page_table_lock (sct) > > Any of the mm gurus give the patch below a quick once over ? > Is this adequate, or is there more to this than the description implies? The patch is basically ok, except one point. [snip] > diff -urN --exclude-from=/home/davej/.exclude linux/mm/vmscan.c linux.dj/mm/vmscan.c > --- linux/mm/vmscan.c Mon Oct 2 20:02:20 2000 > +++ linux.dj/mm/vmscan.c Wed Oct 11 23:46:01 2000 > @@ -102,7 +102,9 @@ > set_pte(page_table, swp_entry_to_pte(entry)); > drop_pte: > UnlockPage(page); > + spin_lock (&mm->page_table_lock); > mm->rss--; > + spin_unlock (&mm->page_table_lock); > flush_tlb_page(vma, address); > deactivate_page(page); > page_cache_release(page); > @@ -170,7 +172,9 @@ > struct file *file = vma->vm_file; > if (file) get_file(file); > pte_clear(page_table); > + spin_lock (&mm->page_table_lock); > mm->rss--; > + spin_unlock (&mm->page_table_lock); > flush_tlb_page(vma, address); > vmlist_access_unlock(mm); > error = swapout(page, file); > @@ -202,7 +206,9 @@ > add_to_swap_cache(page, entry); > > /* Put the swap entry into the pte after the page is in swapcache */ > + spin_lock (&mm->page_table_lock); > mm->rss--; > + spin_unlock (&mm->page_table_lock); > set_pte(page_table, swp_entry_to_pte(entry)); > flush_tlb_page(vma, address); > vmlist_access_unlock(mm); page_table_lock is supposed to protect normal page table activity (like what's done in page fault handler) from swapping out. However, grabbing this lock in swap-out code is completely missing! In vmscan.c the question is not only about rss protection, but about real protection for page table entries. It may be something like --- mm/vmscan.c.ptl Fri Oct 13 12:09:51 2000 +++ mm/vmscan.c Fri Oct 13 12:19:10 2000 @@ -150,6 +150,7 @@ if (file) get_file(file); pte_clear(page_table); vma->vm_mm->rss--; + spin_unlock(&mm->page_table_lock); flush_tlb_page(vma, address); vmlist_access_unlock(vma->vm_mm); error = swapout(page, file); @@ -182,6 +183,7 @@ /* Put the swap entry into the pte after the page is in swapcache */ vma->vm_mm->rss--; set_pte(page_table, swp_entry_to_pte(entry)); + spin_unlock(&mm->page_table_lock); flush_tlb_page(vma, address); vmlist_access_unlock(vma->vm_mm); @@ -324,6 +326,7 @@ if (address < vma->vm_start) address = vma->vm_start; + spin_lock(&mm->page_table_lock); for (;;) { int result = swap_out_vma(mm, vma, address, gfp_mask); if (result) @@ -333,6 +336,7 @@ break; address = vma->vm_start; } + spin_unlock(&mm->page_table_lock); } vmlist_access_unlock(mm); On Thu, Oct 12, 2000 at 05:29:39PM -0700, David S. Miller wrote: > From: davej@suse.de > Date: Fri, 13 Oct 2000 01:20:23 +0100 (BST) > > Any of the mm gurus give the patch below a quick once over ? Is > this adequate, or is there more to this than the description > implies? > > It might make more sense to just make rss an atomic_t. In most cases where rss is updated page_table_lock is already held. Best regards Andrey -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 4:34 ` Andrey Savochkin @ 2000-10-13 4:25 ` David S. Miller 2000-10-13 4:50 ` Andrey Savochkin 2000-10-13 5:05 ` Linus Torvalds 0 siblings, 2 replies; 24+ messages in thread From: David S. Miller @ 2000-10-13 4:25 UTC (permalink / raw) To: saw; +Cc: davej, linux-kernel, linux-mm, tytso, torvalds page_table_lock is supposed to protect normal page table activity (like what's done in page fault handler) from swapping out. However, grabbing this lock in swap-out code is completely missing! Audrey, vmlist_access_{un,}lock == unlocking/locking page_table_lock. You've totally missed this and thus your suggested-patch/analysis needs to be reevaluated :-) Later, David S. Miller davem@redhat.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 4:25 ` David S. Miller @ 2000-10-13 4:50 ` Andrey Savochkin 2000-10-13 5:05 ` Linus Torvalds 1 sibling, 0 replies; 24+ messages in thread From: Andrey Savochkin @ 2000-10-13 4:50 UTC (permalink / raw) To: David S. Miller; +Cc: davej, linux-kernel, linux-mm, tytso, torvalds On Thu, Oct 12, 2000 at 09:25:47PM -0700, David S. Miller wrote: > Date: Fri, 13 Oct 2000 12:34:30 +0800 > From: Andrey Savochkin <saw@saw.sw.com.sg> > > page_table_lock is supposed to protect normal page table activity (like > what's done in page fault handler) from swapping out. > However, grabbing this lock in swap-out code is completely missing! > > Audrey, vmlist_access_{un,}lock == unlocking/locking page_table_lock. > > You've totally missed this and thus your suggested-patch/analysis > needs to be reevaluated :-) Oops You're of course right. It looks as if somebody tried to separate this two locks and stopped in the middle... Andrey -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 4:25 ` David S. Miller 2000-10-13 4:50 ` Andrey Savochkin @ 2000-10-13 5:05 ` Linus Torvalds 2000-10-13 18:11 ` Rasmus Andersen 2000-10-13 18:19 ` Kanoj Sarcar 1 sibling, 2 replies; 24+ messages in thread From: Linus Torvalds @ 2000-10-13 5:05 UTC (permalink / raw) To: David S. Miller; +Cc: saw, davej, linux-kernel, linux-mm, tytso On Thu, 12 Oct 2000, David S. Miller wrote: > > page_table_lock is supposed to protect normal page table activity (like > what's done in page fault handler) from swapping out. > However, grabbing this lock in swap-out code is completely missing! > > Audrey, vmlist_access_{un,}lock == unlocking/locking page_table_lock. Yeah, it's an easy mistake to make. I've made it myself - grepping for page_table_lock and coming up empty in places where I expected it to be. In fact, if somebody sends me patches to remove the "vmlist_access_lock()" stuff completely, and replace them with explicit page_table_lock things, I'll apply it pretty much immediately. I don't like information hiding, and right now that's the only thing that the vmlist_access_lock() stuff is doing. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 5:05 ` Linus Torvalds @ 2000-10-13 18:11 ` Rasmus Andersen 2000-10-13 18:19 ` Kanoj Sarcar 1 sibling, 0 replies; 24+ messages in thread From: Rasmus Andersen @ 2000-10-13 18:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, linux-mm, riel > In fact, if somebody sends me patches to remove the "vmlist_access_lock()" > stuff completely, and replace them with explicit page_table_lock things, > I'll apply it pretty much immediately. I don't like information hiding, > and right now that's the only thing that the vmlist_access_lock() stuff is > doing. (Pruned the cc-list and added Rik van Riel since this touches his code domain.) Something like this? diff -aur linux-240-test10-pre1/fs/exec.c linux/fs/exec.c --- linux-240-test10-pre1/fs/exec.c Mon Oct 2 22:32:47 2000 +++ linux/fs/exec.c Fri Oct 13 16:01:29 2000 @@ -314,9 +314,9 @@ mpnt->vm_pgoff = 0; mpnt->vm_file = NULL; mpnt->vm_private_data = (void *) 0; - vmlist_modify_lock(current->mm); + spin_lock(¤t->mm->page_table_lock); insert_vm_struct(current->mm, mpnt); - vmlist_modify_unlock(current->mm); + spin_unlock(¤t->mm->page_table_lock); current->mm->total_vm = (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT; } diff -aur linux-240-test10-pre1/include/linux/mm.h linux/include/linux/mm.h --- linux-240-test10-pre1/include/linux/mm.h Tue Oct 3 22:12:52 2000 +++ linux/include/linux/mm.h Fri Oct 13 18:43:42 2000 @@ -527,11 +527,6 @@ #define pgcache_under_min() (atomic_read(&page_cache_size) * 100 < \ page_cache.min_percent * num_physpages) -#define vmlist_access_lock(mm) spin_lock(&mm->page_table_lock) -#define vmlist_access_unlock(mm) spin_unlock(&mm->page_table_lock) -#define vmlist_modify_lock(mm) vmlist_access_lock(mm) -#define vmlist_modify_unlock(mm) vmlist_access_unlock(mm) - #endif /* __KERNEL__ */ #endif diff -aur linux-240-test10-pre1/mm/filemap.c linux/mm/filemap.c --- linux-240-test10-pre1/mm/filemap.c Tue Oct 3 22:12:52 2000 +++ linux/mm/filemap.c Fri Oct 13 16:01:29 2000 @@ -1766,11 +1766,11 @@ get_file(n->vm_file); if (n->vm_ops && n->vm_ops->open) n->vm_ops->open(n); - vmlist_modify_lock(vma->vm_mm); + spin_lock(&vma->vm_mm->page_table_lock); vma->vm_pgoff += (end - vma->vm_start) >> PAGE_SHIFT; vma->vm_start = end; insert_vm_struct(current->mm, n); - vmlist_modify_unlock(vma->vm_mm); + spin_unlock(&vma->vm_mm->page_table_lock); return 0; } @@ -1790,10 +1790,10 @@ get_file(n->vm_file); if (n->vm_ops && n->vm_ops->open) n->vm_ops->open(n); - vmlist_modify_lock(vma->vm_mm); + spin_lock(&vma->vm_mm->page_table_lock); vma->vm_end = start; insert_vm_struct(current->mm, n); - vmlist_modify_unlock(vma->vm_mm); + spin_unlock(&vma->vm_mm->page_table_lock); return 0; } @@ -1823,7 +1823,7 @@ vma->vm_ops->open(left); vma->vm_ops->open(right); } - vmlist_modify_lock(vma->vm_mm); + spin_lock(&vma->vm_mm->page_table_lock); vma->vm_pgoff += (start - vma->vm_start) >> PAGE_SHIFT; vma->vm_start = start; vma->vm_end = end; @@ -1831,7 +1831,7 @@ vma->vm_raend = 0; insert_vm_struct(current->mm, left); insert_vm_struct(current->mm, right); - vmlist_modify_unlock(vma->vm_mm); + spin_unlock(&vma->vm_mm->page_table_lock); return 0; } diff -aur linux-240-test10-pre1/mm/mlock.c linux/mm/mlock.c --- linux-240-test10-pre1/mm/mlock.c Wed Mar 15 02:45:20 2000 +++ linux/mm/mlock.c Fri Oct 13 16:01:29 2000 @@ -14,9 +14,9 @@ static inline int mlock_fixup_all(struct vm_area_struct * vma, int newflags) { - vmlist_modify_lock(vma->vm_mm); + spin_lock(&vma->vm_mm->page_table_lock); vma->vm_flags = newflags; - vmlist_modify_unlock(vma->vm_mm); + spin_unlock(&vma->vm_mm->page_table_lock); return 0; } @@ -36,11 +36,11 @@ get_file(n->vm_file); if (n->vm_ops && n->vm_ops->open) n->vm_ops->open(n); - vmlist_modify_lock(vma->vm_mm); + spin_lock(&vma->vm_mm->page_table_lock); vma->vm_pgoff += (end - vma->vm_start) >> PAGE_SHIFT; vma->vm_start = end; insert_vm_struct(current->mm, n); - vmlist_modify_unlock(vma->vm_mm); + spin_unlock(&vma->vm_mm->page_table_lock); return 0; } @@ -61,10 +61,10 @@ get_file(n->vm_file); if (n->vm_ops && n->vm_ops->open) n->vm_ops->open(n); - vmlist_modify_lock(vma->vm_mm); + spin_lock(&vma->vm_mm->page_table_lock); vma->vm_end = start; insert_vm_struct(current->mm, n); - vmlist_modify_unlock(vma->vm_mm); + spin_unlock(&vma->vm_mm->page_table_lock); return 0; } @@ -96,7 +96,7 @@ vma->vm_ops->open(left); vma->vm_ops->open(right); } - vmlist_modify_lock(vma->vm_mm); + spin_lock(&vma->vm_mm->page_table_lock); vma->vm_pgoff += (start - vma->vm_start) >> PAGE_SHIFT; vma->vm_start = start; vma->vm_end = end; @@ -104,7 +104,7 @@ vma->vm_raend = 0; insert_vm_struct(current->mm, left); insert_vm_struct(current->mm, right); - vmlist_modify_unlock(vma->vm_mm); + spin_unlock(&vma->vm_mm->page_table_lock); return 0; } @@ -183,9 +183,9 @@ break; } } - vmlist_modify_lock(current->mm); + spin_lock(¤t->mm->page_table_lock); merge_segments(current->mm, start, end); - vmlist_modify_unlock(current->mm); + spin_unlock(¤t->mm->page_table_lock); return error; } @@ -257,9 +257,9 @@ if (error) break; } - vmlist_modify_lock(current->mm); + spin_lock(¤t->mm->page_table_lock); merge_segments(current->mm, 0, TASK_SIZE); - vmlist_modify_unlock(current->mm); + spin_unlock(¤t->mm->page_table_lock); return error; } diff -aur linux-240-test10-pre1/mm/mmap.c linux/mm/mmap.c --- linux-240-test10-pre1/mm/mmap.c Mon Oct 2 22:32:50 2000 +++ linux/mm/mmap.c Fri Oct 13 18:48:06 2000 @@ -317,12 +317,12 @@ */ flags = vma->vm_flags; addr = vma->vm_start; /* can addr have changed?? */ - vmlist_modify_lock(mm); + spin_lock(&mm->page_table_lock); insert_vm_struct(mm, vma); if (correct_wcount) atomic_inc(&file->f_dentry->d_inode->i_writecount); merge_segments(mm, vma->vm_start, vma->vm_end); - vmlist_modify_unlock(mm); + spin_unlock(&mm->page_table_lock); mm->total_vm += len >> PAGE_SHIFT; if (flags & VM_LOCKED) { @@ -534,11 +534,11 @@ /* Work out to one of the ends. */ if (end == area->vm_end) { area->vm_end = addr; - vmlist_modify_lock(mm); + spin_lock(&mm->page_table_lock); } else if (addr == area->vm_start) { area->vm_pgoff += (end - area->vm_start) >> PAGE_SHIFT; area->vm_start = end; - vmlist_modify_lock(mm); + spin_lock(&mm->page_table_lock); } else { /* Unmapping a hole: area->vm_start < addr <= end < area->vm_end */ /* Add end mapping -- leave beginning for below */ @@ -560,12 +560,12 @@ if (mpnt->vm_ops && mpnt->vm_ops->open) mpnt->vm_ops->open(mpnt); area->vm_end = addr; /* Truncate area */ - vmlist_modify_lock(mm); + spin_lock(&mm->page_table_lock); insert_vm_struct(mm, mpnt); } insert_vm_struct(mm, area); - vmlist_modify_unlock(mm); + spin_unlock(&mm->page_table_lock); return extra; } @@ -670,7 +670,7 @@ npp = (prev ? &prev->vm_next : &mm->mmap); free = NULL; - vmlist_modify_lock(mm); + spin_lock(&mm->page_table_lock); for ( ; mpnt && mpnt->vm_start < addr+len; mpnt = *npp) { *npp = mpnt->vm_next; mpnt->vm_next = free; @@ -679,7 +679,7 @@ avl_remove(mpnt, &mm->mmap_avl); } mm->mmap_cache = NULL; /* Kill the cache. */ - vmlist_modify_unlock(mm); + spin_unlock(&mm->page_table_lock); /* Ok - we have the memory areas we should free on the 'free' list, * so release them, and unmap the page range.. @@ -811,10 +811,10 @@ flags = vma->vm_flags; addr = vma->vm_start; - vmlist_modify_lock(mm); + spin_lock(&mm->page_table_lock); insert_vm_struct(mm, vma); merge_segments(mm, vma->vm_start, vma->vm_end); - vmlist_modify_unlock(mm); + spin_unlock(&mm->page_table_lock); mm->total_vm += len >> PAGE_SHIFT; if (flags & VM_LOCKED) { @@ -841,9 +841,9 @@ release_segments(mm); - vmlist_modify_lock(mm); + spin_lock(&mm->page_table_lock); mpnt = mm->mmap; mm->mmap = mm->mmap_avl = mm->mmap_cache = NULL; - vmlist_modify_unlock(mm); + spin_unlock(&mm->page_table_lock); mm->rss = 0; mm->total_vm = 0; mm->locked_vm = 0; @@ -985,9 +985,9 @@ if (mpnt->vm_ops && mpnt->vm_ops->close) { mpnt->vm_pgoff += (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT; mpnt->vm_start = mpnt->vm_end; - vmlist_modify_unlock(mm); + spin_unlock(&mm->page_table_lock); mpnt->vm_ops->close(mpnt); - vmlist_modify_lock(mm); + spin_lock(&mm->page_table_lock); } mm->map_count--; remove_shared_vm_struct(mpnt); diff -aur linux-240-test10-pre1/mm/mprotect.c linux/mm/mprotect.c --- linux-240-test10-pre1/mm/mprotect.c Wed Mar 15 02:45:21 2000 +++ linux/mm/mprotect.c Fri Oct 13 16:01:29 2000 @@ -86,10 +86,10 @@ static inline int mprotect_fixup_all(struct vm_area_struct * vma, int newflags, pgprot_t prot) { - vmlist_modify_lock(vma->vm_mm); + spin_lock(&vma->vm_mm->page_table_lock); vma->vm_flags = newflags; vma->vm_page_prot = prot; - vmlist_modify_unlock(vma->vm_mm); + spin_unlock(&vma->vm_mm->page_table_lock); return 0; } @@ -111,11 +111,11 @@ get_file(n->vm_file); if (n->vm_ops && n->vm_ops->open) n->vm_ops->open(n); - vmlist_modify_lock(vma->vm_mm); + spin_lock(&vma->vm_mm->page_table_lock); vma->vm_pgoff += (end - vma->vm_start) >> PAGE_SHIFT; vma->vm_start = end; insert_vm_struct(current->mm, n); - vmlist_modify_unlock(vma->vm_mm); + spin_unlock(&vma->vm_mm->page_table_lock); return 0; } @@ -138,10 +138,10 @@ get_file(n->vm_file); if (n->vm_ops && n->vm_ops->open) n->vm_ops->open(n); - vmlist_modify_lock(vma->vm_mm); + spin_lock(&vma->vm_mm->page_table_lock); vma->vm_end = start; insert_vm_struct(current->mm, n); - vmlist_modify_unlock(vma->vm_mm); + spin_unlock(&vma->vm_mm->page_table_lock); return 0; } @@ -172,7 +172,7 @@ vma->vm_ops->open(left); vma->vm_ops->open(right); } - vmlist_modify_lock(vma->vm_mm); + spin_lock(&vma->vm_mm->page_table_lock); vma->vm_pgoff += (start - vma->vm_start) >> PAGE_SHIFT; vma->vm_start = start; vma->vm_end = end; @@ -181,7 +181,7 @@ vma->vm_page_prot = prot; insert_vm_struct(current->mm, left); insert_vm_struct(current->mm, right); - vmlist_modify_unlock(vma->vm_mm); + spin_unlock(&vma->vm_mm->page_table_lock); return 0; } @@ -263,9 +263,9 @@ break; } } - vmlist_modify_lock(current->mm); + spin_lock(¤t->mm->page_table_lock); merge_segments(current->mm, start, end); - vmlist_modify_unlock(current->mm); + spin_unlock(¤t->mm->page_table_lock); out: up(¤t->mm->mmap_sem); return error; diff -aur linux-240-test10-pre1/mm/mremap.c linux/mm/mremap.c --- linux-240-test10-pre1/mm/mremap.c Tue Oct 3 22:12:52 2000 +++ linux/mm/mremap.c Fri Oct 13 16:01:29 2000 @@ -141,10 +141,10 @@ get_file(new_vma->vm_file); if (new_vma->vm_ops && new_vma->vm_ops->open) new_vma->vm_ops->open(new_vma); - vmlist_modify_lock(current->mm); + spin_lock(¤t->mm->page_table_lock); insert_vm_struct(current->mm, new_vma); merge_segments(current->mm, new_vma->vm_start, new_vma->vm_end); - vmlist_modify_unlock(current->mm); + spin_unlock(¤t->mm->page_table_lock); do_munmap(current->mm, addr, old_len); current->mm->total_vm += new_len >> PAGE_SHIFT; if (new_vma->vm_flags & VM_LOCKED) { @@ -258,9 +258,9 @@ /* can we just expand the current mapping? */ if (max_addr - addr >= new_len) { int pages = (new_len - old_len) >> PAGE_SHIFT; - vmlist_modify_lock(vma->vm_mm); + spin_lock(&vma->vm_mm->page_table_lock); vma->vm_end = addr + new_len; - vmlist_modify_unlock(vma->vm_mm); + spin_unlock(&vma->vm_mm->page_table_lock); current->mm->total_vm += pages; if (vma->vm_flags & VM_LOCKED) { current->mm->locked_vm += pages; diff -aur linux-240-test10-pre1/mm/swapfile.c linux/mm/swapfile.c --- linux-240-test10-pre1/mm/swapfile.c Mon Oct 2 22:31:57 2000 +++ linux/mm/swapfile.c Fri Oct 13 19:06:08 2000 @@ -315,12 +315,12 @@ */ if (!mm) return; - vmlist_access_lock(mm); + spin_lock(&mm->page_table_lock); for (vma = mm->mmap; vma; vma = vma->vm_next) { pgd_t * pgd = pgd_offset(mm, vma->vm_start); unuse_vma(vma, pgd, entry, page); } - vmlist_access_unlock(mm); + spin_unlock(&mm->page_table_lock); return; } --- linux-240-test10-pre1/mm/vmscan.c Tue Oct 10 19:12:23 2000 +++ linux/mm/vmscan.c Fri Oct 13 19:49:40 2000 @@ -172,7 +172,7 @@ pte_clear(page_table); mm->rss--; flush_tlb_page(vma, address); - vmlist_access_unlock(mm); + spin_unlock(&mm->page_table_lock); error = swapout(page, file); UnlockPage(page); if (file) fput(file); @@ -205,7 +205,7 @@ mm->rss--; set_pte(page_table, swp_entry_to_pte(entry)); flush_tlb_page(vma, address); - vmlist_access_unlock(mm); + spin_unlock(&mm->page_table_lock); /* OK, do a physical asynchronous write to swap. */ rw_swap_page(WRITE, page, 0); @@ -341,7 +341,7 @@ * Find the proper vm-area after freezing the vma chain * and ptes. */ - vmlist_access_lock(mm); + spin_lock(&mm->page_table_lock); vma = find_vma(mm, address); if (vma) { if (address < vma->vm_start) @@ -364,7 +364,7 @@ mm->swap_cnt = 0; out_unlock: - vmlist_access_unlock(mm); + spin_unlock(&mm->page_table_lock); /* We didn't find anything for the process */ return 0; -- Regards, Rasmus(rasmus@jaquet.dk) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Updated Linux 2.4 Status/TODO List (from the ALS show) 2000-10-13 5:05 ` Linus Torvalds 2000-10-13 18:11 ` Rasmus Andersen @ 2000-10-13 18:19 ` Kanoj Sarcar 1 sibling, 0 replies; 24+ messages in thread From: Kanoj Sarcar @ 2000-10-13 18:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: David S. Miller, saw, davej, linux-kernel, linux-mm, tytso > > > On Thu, 12 Oct 2000, David S. Miller wrote: > > > > page_table_lock is supposed to protect normal page table activity (like > > what's done in page fault handler) from swapping out. > > However, grabbing this lock in swap-out code is completely missing! > > > > Audrey, vmlist_access_{un,}lock == unlocking/locking page_table_lock. > > Yeah, it's an easy mistake to make. > > I've made it myself - grepping for page_table_lock and coming up empty in > places where I expected it to be. > > In fact, if somebody sends me patches to remove the "vmlist_access_lock()" > stuff completely, and replace them with explicit page_table_lock things, > I'll apply it pretty much immediately. I don't like information hiding, > and right now that's the only thing that the vmlist_access_lock() stuff is > doing. Linus, I came up with the vmlist_access_lock/vmlist_modify_lock names early in 2.3. The reasoning behind that was that in most places where the "vmlist lock" was being taken was to protect the vmlist chain, vma_t fields or mm_struct fields. The fact that implementation wise this lock could be the same as page_table_lock was a good idea that you suggested. Nevertheless, the name was chosen to indicate what type of things it was guarding. For example, in the future, you might actually have a different (possibly sleeping) lock to guard the vmachain etc, but still have a spin lock for the page_table_lock (No, I don't want to be drawn into a discussion of why this might be needed right now). Some of this is mentioned in Documentation/vm/locking. Just thought I would mention, in case you don't recollect some of this history. Of course, I understand the "information hiding" part. Kanoj -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* BUG FIX?: mm->rss is modified in some places without holding the page_table_lock 2000-10-13 0:20 Updated Linux 2.4 Status/TODO List (from the ALS show) davej 2000-10-13 0:29 ` David S. Miller 2000-10-13 4:34 ` Andrey Savochkin @ 2000-11-03 11:39 ` tytso 2000-11-03 11:33 ` David S. Miller 2 siblings, 1 reply; 24+ messages in thread From: tytso @ 2000-11-03 11:39 UTC (permalink / raw) To: davej; +Cc: torvalds, linux-kernel, linux-mm I'm in the process of updating the 2.4 bug list for test10, and as near as I can tell, there was a lot of discussion about two different ways of fixing this bug: > 9. To Do > * mm->rss is modified in some places without holding the > page_table_lock (sct) To wit, either making mm->rss an atomic_t (which doesn't quite work for some architectures which need it to be 64 bits), or putting in some locks as Dave Jones suggested. As far as I can tell, neither patch has gone in, and discussion seems to have dropped in the past two weeks or so. An examination of the kernel seems to indicate that neither fixed has been taken. What's the status of this? Given that we don't have a 64-bit atomic_t type, what do people think of Davej's patch? (attached, below) - Ted diff -urN --exclude-from=/home/davej/.exclude linux/mm/memory.c linux.dj/mm/memory.c --- linux/mm/memory.c Sat Sep 16 00:51:21 2000 +++ linux.dj/mm/memory.c Wed Oct 11 23:41:10 2000 @@ -370,7 +370,6 @@ address = (address + PGDIR_SIZE) & PGDIR_MASK; dir++; } while (address && (address < end)); - spin_unlock(&mm->page_table_lock); /* * Update rss for the mm_struct (not necessarily current->mm) * Notice that rss is an unsigned long. @@ -379,6 +378,7 @@ mm->rss -= freed; else mm->rss = 0; + spin_unlock(&mm->page_table_lock); } @@ -1074,7 +1074,9 @@ flush_icache_page(vma, page); } + spin_lock(&mm->page_table_lock); mm->rss++; + spin_unlock(&mm->page_table_lock); pte = mk_pte(page, vma->vm_page_prot); @@ -1113,7 +1115,9 @@ return -1; clear_user_highpage(page, addr); entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))); + spin_lock(&mm->page_table_lock); mm->rss++; + spin_unlock(&mm->page_table_lock); flush_page_to_ram(page); } set_pte(page_table, entry); @@ -1152,7 +1156,9 @@ return 0; if (new_page == NOPAGE_OOM) return -1; + spin_lock(&mm->page_table_lock); ++mm->rss; + spin_unlock(&mm->page_table_lock); /* * This silly early PAGE_DIRTY setting removes a race * due to the bad i386 page protection. But it's valid diff -urN --exclude-from=/home/davej/.exclude linux/mm/mmap.c linux.dj/mm/mmap.c --- linux/mm/mmap.c Tue Aug 29 20:41:12 2000 +++ linux.dj/mm/mmap.c Wed Oct 11 23:48:30 2000 @@ -844,7 +844,9 @@ vmlist_modify_lock(mm); mm->mmap = mm->mmap_avl = mm->mmap_cache = NULL; vmlist_modify_unlock(mm); + spin_lock (&mm->page_table_lock); mm->rss = 0; + spin_unlock (&mm->page_table_lock); mm->total_vm = 0; mm->locked_vm = 0; while (mpnt) { diff -urN --exclude-from=/home/davej/.exclude linux/mm/vmscan.c linux.dj/mm/vmscan.c --- linux/mm/vmscan.c Mon Oct 2 20:02:20 2000 +++ linux.dj/mm/vmscan.c Wed Oct 11 23:46:01 2000 @@ -102,7 +102,9 @@ set_pte(page_table, swp_entry_to_pte(entry)); drop_pte: UnlockPage(page); + spin_lock (&mm->page_table_lock); mm->rss--; + spin_unlock (&mm->page_table_lock); flush_tlb_page(vma, address); deactivate_page(page); page_cache_release(page); @@ -170,7 +172,9 @@ struct file *file = vma->vm_file; if (file) get_file(file); pte_clear(page_table); + spin_lock (&mm->page_table_lock); mm->rss--; + spin_unlock (&mm->page_table_lock); flush_tlb_page(vma, address); vmlist_access_unlock(mm); error = swapout(page, file); @@ -202,7 +206,9 @@ add_to_swap_cache(page, entry); /* Put the swap entry into the pte after the page is in swapcache */ + spin_lock (&mm->page_table_lock); mm->rss--; + spin_unlock (&mm->page_table_lock); set_pte(page_table, swp_entry_to_pte(entry)); flush_tlb_page(vma, address); vmlist_access_unlock(mm); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: BUG FIX?: mm->rss is modified in some places without holding the page_table_lock 2000-11-03 11:39 ` BUG FIX?: mm->rss is modified in some places without holding the page_table_lock tytso @ 2000-11-03 11:33 ` David S. Miller 2000-11-03 14:56 ` Theodore Y. Ts'o 0 siblings, 1 reply; 24+ messages in thread From: David S. Miller @ 2000-11-03 11:33 UTC (permalink / raw) To: tytso; +Cc: davej, torvalds, linux-kernel, linux-mm Given that we don't have a 64-bit atomic_t type, what do people think of Davej's patch? (attached, below) Broken, in 9 out of 10 places where he adds page_table_lock acquisitions, this lock is already held --> instant deadlock. This report is complicated by the fact that people were forgetting that vmlist_*_{lock,unlock}(mm) was actually just spin_{lock,unlock} on mm->page_table_lock. I fixed that already by removing the dumb vmlist locking macros which were causing all of this confusion. Later, David S. Miller davem@redhat.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: BUG FIX?: mm->rss is modified in some places without holding the page_table_lock 2000-11-03 11:33 ` David S. Miller @ 2000-11-03 14:56 ` Theodore Y. Ts'o 2000-11-03 14:51 ` David S. Miller 0 siblings, 1 reply; 24+ messages in thread From: Theodore Y. Ts'o @ 2000-11-03 14:56 UTC (permalink / raw) To: David S. Miller; +Cc: davej, torvalds, linux-kernel, linux-mm Given that we don't have a 64-bit atomic_t type, what do people think of Davej's patch? (attached, below) Broken, in 9 out of 10 places where he adds page_table_lock acquisitions, this lock is already held --> instant deadlock. This report is complicated by the fact that people were forgetting that vmlist_*_{lock,unlock}(mm) was actually just spin_{lock,unlock} on mm->page_table_lock. I fixed that already by removing the dumb vmlist locking macros which were causing all of this confusion. Are you saying that the original bug report may not actually be a problem? Is ms->rss actually protected in _all_ of the right places, but people got confused because of the syntactic sugar? - Ted -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: BUG FIX?: mm->rss is modified in some places without holding the page_table_lock 2000-11-03 14:56 ` Theodore Y. Ts'o @ 2000-11-03 14:51 ` David S. Miller 2000-11-04 23:37 ` Rasmus Andersen 0 siblings, 1 reply; 24+ messages in thread From: David S. Miller @ 2000-11-03 14:51 UTC (permalink / raw) To: tytso; +Cc: davej, torvalds, linux-kernel, linux-mm Are you saying that the original bug report may not actually be a problem? Is ms->rss actually protected in _all_ of the right places, but people got confused because of the syntactic sugar? I don't know if all of them are ok, most are. Someone would need to do the analysis. David's patch could be used as a good starting point. A quick glance at mm/memory.c shows these spots need to be fixed: 1) End of zap_page_range() 2) Middle of do_swap_page 3) do_anonymous_page 4) do_no_page Later, David S. Miller davem@redhat.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: BUG FIX?: mm->rss is modified in some places without holding the page_table_lock 2000-11-03 14:51 ` David S. Miller @ 2000-11-04 23:37 ` Rasmus Andersen 0 siblings, 0 replies; 24+ messages in thread From: Rasmus Andersen @ 2000-11-04 23:37 UTC (permalink / raw) To: David S. Miller; +Cc: tytso, davej, torvalds, linux-kernel, linux-mm On Fri, Nov 03, 2000 at 06:51:05AM -0800, David S. Miller wrote: > Are you saying that the original bug report may not actually be a > problem? Is ms->rss actually protected in _all_ of the right > places, but people got confused because of the syntactic sugar? > > I don't know if all of them are ok, most are. > Would this do? This is a subset of Davej's patch. I also noted that fs/{exec.c,binfmt_aout.c,binfmt_elf.c} modifies rss without holding the lock. I think exec.c needs it, but am at a loss whether the binfmt_* does too. The second patch below adds the lock to fs/exec.c. Comments? diff -ura linux-240-t10-clean/mm/memory.c linux/mm/memory.c --- linux-240-t10-clean/mm/memory.c Sat Nov 4 23:27:17 2000 +++ linux/mm/memory.c Sun Nov 5 00:13:59 2000 @@ -369,7 +369,6 @@ address = (address + PGDIR_SIZE) & PGDIR_MASK; dir++; } while (address && (address < end)); - spin_unlock(&mm->page_table_lock); /* * Update rss for the mm_struct (not necessarily current->mm) * Notice that rss is an unsigned long. @@ -378,6 +377,7 @@ mm->rss -= freed; else mm->rss = 0; + spin_unlock(&mm->page_table_lock); } @@ -1074,7 +1074,9 @@ flush_icache_page(vma, page); } + spin_lock(&mm->page_table_lock); mm->rss++; + spin_unlock(&mm->page_table_lock); pte = mk_pte(page, vma->vm_page_prot); @@ -1113,7 +1115,9 @@ return -1; clear_user_highpage(page, addr); entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))); + spin_lock(&mm->page_table_lock); mm->rss++; + spin_unlock(&mm->page_table_lock); flush_page_to_ram(page); } set_pte(page_table, entry); @@ -1152,7 +1156,9 @@ return 0; if (new_page == NOPAGE_OOM) return -1; + spin_lock(&mm->page_table_lock); ++mm->rss; + spin_unlock(&mm->page_table_lock); /* * This silly early PAGE_DIRTY setting removes a race * due to the bad i386 page protection. But it's valid diff -ura linux-240-t10-clean/mm/mmap.c linux/mm/mmap.c --- linux-240-t10-clean/mm/mmap.c Sat Nov 4 23:27:17 2000 +++ linux/mm/mmap.c Sat Nov 4 23:53:49 2000 @@ -843,8 +843,8 @@ spin_lock(&mm->page_table_lock); mpnt = mm->mmap; mm->mmap = mm->mmap_avl = mm->mmap_cache = NULL; - spin_unlock(&mm->page_table_lock); mm->rss = 0; + spin_unlock(&mm->page_table_lock); mm->total_vm = 0; mm->locked_vm = 0; while (mpnt) { diff -ura linux-240-t10-clean/mm/swapfile.c linux/mm/swapfile.c --- linux-240-t10-clean/mm/swapfile.c Sat Nov 4 23:27:17 2000 +++ linux/mm/swapfile.c Sun Nov 5 00:19:15 2000 @@ -231,7 +231,9 @@ set_pte(dir, pte_mkdirty(mk_pte(page, vma->vm_page_prot))); swap_free(entry); get_page(page); + spin_lock(&vma->vm_mm->page_table_lock); ++vma->vm_mm->rss; + spin_unlock(&vma->vm_mm->page_table_lock); } static inline void unuse_pmd(struct vm_area_struct * vma, pmd_t *dir, diff -ura linux-240-t10-clean/mm/vmscan.c linux/mm/vmscan.c --- linux-240-t10-clean/mm/vmscan.c Sat Nov 4 23:27:17 2000 +++ linux/mm/vmscan.c Sun Nov 5 00:19:48 2000 @@ -95,7 +95,9 @@ set_pte(page_table, swp_entry_to_pte(entry)); drop_pte: UnlockPage(page); + spin_lock(&mm->page_table_lock); mm->rss--; + spin_unlock(&mm->page_table_lock); flush_tlb_page(vma, address); deactivate_page(page); page_cache_release(page); Second patch: --- linux-240-t10-clean/fs/exec.c Sat Nov 4 23:27:14 2000 +++ linux/fs/exec.c Sat Nov 4 23:55:37 2000 @@ -324,7 +324,9 @@ struct page *page = bprm->page[i]; if (page) { bprm->page[i] = NULL; + spin_lock(mm->page_table_lock); current->mm->rss++; + spin_unlock(mm->page_table_lock); put_dirty_page(current,page,stack_base); } stack_base += PAGE_SIZE; -- Regards, Rasmus(rasmus@jaquet.dk) Duct tape is like the force; it has a light side and a dark side, and it holds the universe together. -- Anonymous -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2000-11-04 23:37 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2000-10-13 0:20 Updated Linux 2.4 Status/TODO List (from the ALS show) davej 2000-10-13 0:29 ` David S. Miller 2000-10-13 5:02 ` Linus Torvalds 2000-10-13 11:45 ` Alan Cox 2000-10-13 21:17 ` Richard Henderson 2000-10-13 21:19 ` Jakub Jelinek 2000-10-13 21:25 ` Linus Torvalds 2000-10-13 22:56 ` Richard Henderson 2000-10-13 22:47 ` Alan Cox 2000-10-13 22:57 ` Richard Henderson 2000-10-14 0:20 ` David S. Miller 2000-10-14 12:36 ` Roman Zippel 2000-10-13 21:29 ` David S. Miller 2000-10-13 4:34 ` Andrey Savochkin 2000-10-13 4:25 ` David S. Miller 2000-10-13 4:50 ` Andrey Savochkin 2000-10-13 5:05 ` Linus Torvalds 2000-10-13 18:11 ` Rasmus Andersen 2000-10-13 18:19 ` Kanoj Sarcar 2000-11-03 11:39 ` BUG FIX?: mm->rss is modified in some places without holding the page_table_lock tytso 2000-11-03 11:33 ` David S. Miller 2000-11-03 14:56 ` Theodore Y. Ts'o 2000-11-03 14:51 ` David S. Miller 2000-11-04 23:37 ` Rasmus Andersen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox