* [PATCH] swap_state.c thinko
@ 2001-04-05 15:56 Ben LaHaise
2001-04-05 16:05 ` Rik van Riel
2001-04-05 17:21 ` Hugh Dickins
0 siblings, 2 replies; 33+ messages in thread
From: Ben LaHaise @ 2001-04-05 15:56 UTC (permalink / raw)
To: arjanv, alan, torvalds; +Cc: linux-mm
Hey folks,
Here's another one liner that closes an smp race that could corrupt
things.
-ben
diff -urN v2.4.3/mm/swap_state.c work-2.4.3/mm/swap_state.c
--- v2.4.3/mm/swap_state.c Fri Dec 29 18:04:27 2000
+++ work-2.4.3/mm/swap_state.c Thu Apr 5 11:55:00 2001
@@ -140,7 +140,7 @@
/*
* If we are the only user, then try to free up the swap cache.
*/
- if (PageSwapCache(page) && !TryLockPage(page)) {
+ if (!TryLockPage(page) && PageSwapCache(page)) {
if (!is_page_shared(page)) {
delete_from_swap_cache_nolock(page);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH] swap_state.c thinko 2001-04-05 15:56 [PATCH] swap_state.c thinko Ben LaHaise @ 2001-04-05 16:05 ` Rik van Riel 2001-04-05 17:11 ` Ben LaHaise 2001-04-05 17:21 ` Hugh Dickins 1 sibling, 1 reply; 33+ messages in thread From: Rik van Riel @ 2001-04-05 16:05 UTC (permalink / raw) To: Ben LaHaise; +Cc: arjanv, alan, torvalds, linux-mm On Thu, 5 Apr 2001, Ben LaHaise wrote: > Here's another one liner that closes an smp race that could corrupt > things. > - if (PageSwapCache(page) && !TryLockPage(page)) { > + if (!TryLockPage(page) && PageSwapCache(page)) { > if (!is_page_shared(page)) { > delete_from_swap_cache_nolock(page); > } I sure hope the page is unlocked afterwards, regardless of whether it's (still) in the swap cache or not ... regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-05 16:05 ` Rik van Riel @ 2001-04-05 17:11 ` Ben LaHaise 2001-04-05 23:40 ` Andrea Arcangeli 2001-04-06 0:32 ` Linus Torvalds 0 siblings, 2 replies; 33+ messages in thread From: Ben LaHaise @ 2001-04-05 17:11 UTC (permalink / raw) To: Rik van Riel; +Cc: arjanv, alan, torvalds, linux-mm On Thu, 5 Apr 2001, Rik van Riel wrote: > I sure hope the page is unlocked afterwards, regardless of > whether it's (still) in the swap cache or not ... You're right. Here's the hopefully correct version. -ben diff -ur v2.4.3/mm/swap_state.c work-2.4.3/mm/swap_state.c --- v2.4.3/mm/swap_state.c Fri Dec 29 18:04:27 2000 +++ work-2.4.3/mm/swap_state.c Thu Apr 5 13:10:27 2001 @@ -140,10 +140,9 @@ /* * If we are the only user, then try to free up the swap cache. */ - if (PageSwapCache(page) && !TryLockPage(page)) { - if (!is_page_shared(page)) { + if (!TryLockPage(page)) { + if (PageSwapCache(page) && !is_page_shared(page)) delete_from_swap_cache_nolock(page); - } UnlockPage(page); } page_cache_release(page); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-05 17:11 ` Ben LaHaise @ 2001-04-05 23:40 ` Andrea Arcangeli 2001-04-06 0:32 ` Linus Torvalds 1 sibling, 0 replies; 33+ messages in thread From: Andrea Arcangeli @ 2001-04-05 23:40 UTC (permalink / raw) To: Ben LaHaise; +Cc: Rik van Riel, arjanv, alan, torvalds, linux-mm On Thu, Apr 05, 2001 at 01:11:30PM -0400, Ben LaHaise wrote: > diff -ur v2.4.3/mm/swap_state.c work-2.4.3/mm/swap_state.c > --- v2.4.3/mm/swap_state.c Fri Dec 29 18:04:27 2000 > +++ work-2.4.3/mm/swap_state.c Thu Apr 5 13:10:27 2001 > @@ -140,10 +140,9 @@ > /* > * If we are the only user, then try to free up the swap cache. > */ > - if (PageSwapCache(page) && !TryLockPage(page)) { > - if (!is_page_shared(page)) { > + if (!TryLockPage(page)) { > + if (PageSwapCache(page) && !is_page_shared(page)) > delete_from_swap_cache_nolock(page); > - } > UnlockPage(page); > } > page_cache_release(page); swap cache pages should not be freeable by the memory balancing code because if you're running at that point the reference count of the swap cache has to be > 1. swapoff will grab the pagetable spinlock before dropping the swap cache so it shouldn't run under such code either (and swapoff was used to have other window for races anyways). could you elaborate what can eat the swap cache from under you if you don't first lock down the page before checking the swapcache bit? I thought the reason for grabbing the lock there is just to do the trylock instead of lock_page(): we can't use the delete_from_swap_cache that could otherwise sleep if the page was for example locked down by the memory balancing code while we were running there (if we fail we simply left some more spurious swap cache). Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-05 17:11 ` Ben LaHaise 2001-04-05 23:40 ` Andrea Arcangeli @ 2001-04-06 0:32 ` Linus Torvalds 2001-04-06 16:31 ` Hugh Dickins 1 sibling, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2001-04-06 0:32 UTC (permalink / raw) To: Ben LaHaise; +Cc: Rik van Riel, arjanv, alan, linux-mm On Thu, 5 Apr 2001, Ben LaHaise wrote: > > You're right. Here's the hopefully correct version. I'd prefer something more along these lines: it gets rid of free_page_and_swap_cache() altogether, along with "is_page_shared()", realizing that "is_page_shared()" was only validly used on swap-cache pages anyway and thus getting rid of the generic tests it had for other kinds of pages. It also fixes the swap sharing criteria to properly accept the case of a page that has buffers but no other usage (which it pretty much always will have, if it was truly read in from disk). As far as I can tell, the lack of buffer-testing meant that we almost _never_ just re-used the cache page directly on a read-swapin followed by a write to the page. Can that really be true? That should be the common case, and would have made the swap cache much less effective than it should be. Anybody see any thinko's here? Linus ------ diff -u --recursive --new-file v2.4.3/linux/arch/sparc/mm/generic.c linux/arch/sparc/mm/generic.c --- v2.4.3/linux/arch/sparc/mm/generic.c Wed Aug 9 13:49:55 2000 +++ linux/arch/sparc/mm/generic.c Thu Apr 5 14:38:29 2001 @@ -21,11 +21,7 @@ struct page *ptpage = pte_page(page); if ((!VALID_PAGE(ptpage)) || PageReserved(ptpage)) return; - /* - * free_page() used to be able to clear swap cache - * entries. We may now have to do it manually. - */ - free_page_and_swap_cache(ptpage); + page_cache_release(page); return; } swap_free(pte_to_swp_entry(page)); diff -u --recursive --new-file v2.4.3/linux/arch/sparc64/mm/generic.c linux/arch/sparc64/mm/generic.c --- v2.4.3/linux/arch/sparc64/mm/generic.c Mon Mar 26 15:42:57 2001 +++ linux/arch/sparc64/mm/generic.c Thu Apr 5 14:38:45 2001 @@ -21,11 +21,7 @@ struct page *ptpage = pte_page(page); if ((!VALID_PAGE(ptpage)) || PageReserved(ptpage)) return; - /* - * free_page() used to be able to clear swap cache - * entries. We may now have to do it manually. - */ - free_page_and_swap_cache(ptpage); + page_cache_release(ptpage); return; } swap_free(pte_to_swp_entry(page)); diff -u --recursive --new-file v2.4.3/linux/include/linux/swap.h linux/include/linux/swap.h --- v2.4.3/linux/include/linux/swap.h Mon Mar 26 15:48:11 2001 +++ linux/include/linux/swap.h Thu Apr 5 15:44:52 2001 @@ -134,7 +134,6 @@ extern void __delete_from_swap_cache(struct page *page); extern void delete_from_swap_cache(struct page *page); extern void delete_from_swap_cache_nolock(struct page *page); -extern void free_page_and_swap_cache(struct page *page); /* linux/mm/swapfile.c */ extern unsigned int nr_swapfiles; @@ -166,23 +165,6 @@ extern unsigned long swap_cache_find_total; extern unsigned long swap_cache_find_success; #endif - -/* - * Work out if there are any other processes sharing this page, ignoring - * any page reference coming from the swap cache, or from outstanding - * swap IO on this page. (The page cache _does_ count as another valid - * reference to the page, however.) - */ -static inline int is_page_shared(struct page *page) -{ - unsigned int count; - if (PageReserved(page)) - return 1; - count = page_count(page); - if (PageSwapCache(page)) - count += swap_count(page) - 2 - !!page->buffers; - return count > 1; -} extern spinlock_t pagemap_lru_lock; diff -u --recursive --new-file v2.4.3/linux/mm/memory.c linux/mm/memory.c --- v2.4.3/linux/mm/memory.c Mon Mar 26 11:02:24 2001 +++ linux/mm/memory.c Thu Apr 5 16:41:17 2001 @@ -274,7 +274,7 @@ */ if (pte_dirty(pte) && page->mapping) set_page_dirty(page); - free_page_and_swap_cache(page); + page_cache_release(page); return 1; } swap_free(pte_to_swp_entry(pte)); @@ -815,6 +815,24 @@ } /* + * Work out if there are any other processes sharing this + * swap cache page. Never mind the buffers. + */ +static inline int exclusive_swap_page(struct page *page) +{ + unsigned int count; + + if (!PageLocked(page)) + BUG(); + if (!PageSwapCache(page)) + return 0; + count = page_count(page) - !!page->buffers; /* 2: us + swap cache */ + count += swap_count(page); /* +1: just swap cache */ + return count == 3; /* =3: total */ +} + + +/* * This routine handles present pages, when users try to write * to a shared page. It is done by copying the page to a new address * and decrementing the shared-page counter for the old page. @@ -853,19 +871,21 @@ * marked dirty). */ switch (page_count(old_page)) { + int can_reuse; + case 3: + if (!old_page->buffers) + break; + /* FallThrough */ case 2: - /* - * Lock the page so that no one can look it up from - * the swap cache, grab a reference and start using it. - * Can not do lock_page, holding page_table_lock. - */ - if (!PageSwapCache(old_page) || TryLockPage(old_page)) + if (!PageSwapCache(old_page)) break; - if (is_page_shared(old_page)) { - UnlockPage(old_page); + if (TryLockPage(old_page)) break; - } + /* Recheck swapcachedness once the page is locked */ + can_reuse = exclusive_swap_page(old_page); UnlockPage(old_page); + if (!can_reuse) + break; /* FallThrough */ case 1: if (PageReserved(old_page)) @@ -903,8 +923,7 @@ return -1; } -static void vmtruncate_list(struct vm_area_struct *mpnt, - unsigned long pgoff, unsigned long partial) +static void vmtruncate_list(struct vm_area_struct *mpnt, unsigned long pgoff) { do { struct mm_struct *mm = mpnt->vm_mm; @@ -947,7 +966,7 @@ */ void vmtruncate(struct inode * inode, loff_t offset) { - unsigned long partial, pgoff; + unsigned long pgoff; struct address_space *mapping = inode->i_mapping; unsigned long limit; @@ -959,19 +978,15 @@ goto out_unlock; pgoff = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; - partial = (unsigned long)offset & (PAGE_CACHE_SIZE - 1); - if (mapping->i_mmap != NULL) - vmtruncate_list(mapping->i_mmap, pgoff, partial); + vmtruncate_list(mapping->i_mmap, pgoff); if (mapping->i_mmap_shared != NULL) - vmtruncate_list(mapping->i_mmap_shared, pgoff, partial); + vmtruncate_list(mapping->i_mmap_shared, pgoff); out_unlock: spin_unlock(&mapping->i_shared_lock); truncate_inode_pages(mapping, offset); - if (inode->i_op && inode->i_op->truncate) - inode->i_op->truncate(inode); - return; + goto out_truncate; do_expand: limit = current->rlim[RLIMIT_FSIZE].rlim_cur; @@ -986,8 +1001,13 @@ } } inode->i_size = offset; - if (inode->i_op && inode->i_op->truncate) + +out_truncate: + if (inode->i_op && inode->i_op->truncate) { + lock_kernel(); inode->i_op->truncate(inode); + unlock_kernel(); + } out: return; } @@ -1077,7 +1097,7 @@ pte = mk_pte(page, vma->vm_page_prot); swap_free(entry); - if (write_access && !is_page_shared(page)) + if (write_access && exclusive_swap_page(page)) pte = pte_mkwrite(pte_mkdirty(pte)); UnlockPage(page); diff -u --recursive --new-file v2.4.3/linux/mm/swap_state.c linux/mm/swap_state.c --- v2.4.3/linux/mm/swap_state.c Fri Dec 29 15:04:27 2000 +++ linux/mm/swap_state.c Thu Apr 5 14:35:08 2001 @@ -17,8 +17,22 @@ #include <asm/pgtable.h> +/* + * We may have stale swap cache pages in memory: notice + * them here and get rid of the unnecessary final write. + */ static int swap_writepage(struct page *page) { + /* One for the page cache, one for this user, one for page->buffers */ + if (page_count(page) > 2 + !!page->buffers) + goto in_use; + if (swap_count(page) > 1) + goto in_use; + + /* We could remove it here, but page_launder will do it anyway */ + return 0; + +in_use: rw_swap_page(WRITE, page, 0); return 0; } @@ -129,26 +143,6 @@ delete_from_swap_cache_nolock(page); UnlockPage(page); } - -/* - * Perform a free_page(), also freeing any swap cache associated with - * this page if it is the last user of the page. Can not do a lock_page, - * as we are holding the page_table_lock spinlock. - */ -void free_page_and_swap_cache(struct page *page) -{ - /* - * If we are the only user, then try to free up the swap cache. - */ - if (PageSwapCache(page) && !TryLockPage(page)) { - if (!is_page_shared(page)) { - delete_from_swap_cache_nolock(page); - } - UnlockPage(page); - } - page_cache_release(page); -} - /* * Lookup a swap entry in the swap cache. A found page will be returned -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 0:32 ` Linus Torvalds @ 2001-04-06 16:31 ` Hugh Dickins 2001-04-06 17:21 ` Linus Torvalds 0 siblings, 1 reply; 33+ messages in thread From: Hugh Dickins @ 2001-04-06 16:31 UTC (permalink / raw) To: Linus Torvalds Cc: Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Thu, 5 Apr 2001, Linus Torvalds wrote: > > I'd prefer something more along these lines: it gets rid of > free_page_and_swap_cache() altogether, along with "is_page_shared()", > realizing that "is_page_shared()" was only validly used on swap-cache > pages anyway and thus getting rid of the generic tests it had for other > kinds of pages. I like this direction, but (if I understand the issues better today than I did yesterday) the patch you posted looks seriously incomplete to me. While it deals with one of the issues raised by Rich Jerrell (writing dead swap pages), doesn't it exacerbate his other issue? That after a process exits (unmaps), if a page was in the swap cache, its swap slot will remain allocated until vmscan.c gets to deal with it; which would be okay, except that vm_enough_memory() may give false negatives meanwhile, because there's no count of such pages (and Rich gave the nice example that immediately starting up the same program again was liable to fail because of this). Exacerbates, because that problem was just on the !pte_present entries, now with your patch it would be on the pte_present entries too. But I don't agree with the way Rich adds his nr_swap_cache_pages to "free" in his vm_enough_memory(), because cached pages are all already counted into "free" from page_cache_size - so I believe he's double- accounting all the swap cache pages as free, when it should just be those which (could) have been freed on exit/unmap. And to count those, I think you'd have to reinstate code like free_page_and_swap_cache(). Instead, perhaps vm_enough_memory() should force a scan to free before failing? And would need to register its own memory pressure, so the scan tries hard enough to provide what will be needed? Sorry, we've moved well away from Ben's "swap_state.c thinko". Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 16:31 ` Hugh Dickins @ 2001-04-06 17:21 ` Linus Torvalds 2001-04-06 18:23 ` Hugh Dickins 2001-04-06 18:47 ` Andrea Arcangeli 0 siblings, 2 replies; 33+ messages in thread From: Linus Torvalds @ 2001-04-06 17:21 UTC (permalink / raw) To: Hugh Dickins Cc: Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, 6 Apr 2001, Hugh Dickins wrote: > > I like this direction, but (if I understand the issues better today > than I did yesterday) the patch you posted looks seriously incomplete > to me. While it deals with one of the issues raised by Rich Jerrell > (writing dead swap pages), doesn't it exacerbate his other issue? Yes. However, I'm assuming most of that is just "statistics get buggered", in that swap pages tend to stay around for longer than they really are used. It was true before, but it would be _consistently_ true now. I don't agree with your vm_enough_memory() worry - it should be correct already, because it shows up as page cache pages (and that, in turn, is already taken care of). In fact, the swap cache pages shouldn't even create any new special cases: they are exactly equivalent to already- existing page cache pages. So if this patch generates any problems, then those problems were pre-existing, and the patch just triggers them more easily. For example, it may shift the load to look a bit more like there are lots of dirty shared pages, and maybe we haven't handled that load well before. In that sense this is one of those "can trigger cases we haven't tested well enough" patches, but I don't think it should introduce _new_ problems. > Instead, perhaps vm_enough_memory() should force a scan to free > before failing? And would need to register its own memory pressure, > so the scan tries hard enough to provide what will be needed? I don't think so. It _already_ considers every single page cache page to be completely droppable. So I don't think vm_enough_memory() can fail any more due to my change - the only thing that happened was that pages that were counted as "free" got moved to "page cache", but the math ends up giving the exact same answer anyway: ... free += atomic_read(&page_cache_size); free += nr_free_pages(); ... Does anybody actually see failures with this path? Maybe they are due to "page_launder()" not getting rid of the page swap pages aggressively enough.. (I considered moving the swap-cache page earlier in page_launder(), but I'd just be happier if we could have this all in swap_writepage() and not pollute any of the rest of the VM at all. Pipe-dream, maybe). Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 17:21 ` Linus Torvalds @ 2001-04-06 18:23 ` Hugh Dickins 2001-04-06 18:57 ` Linus Torvalds 2001-04-06 18:47 ` Andrea Arcangeli 1 sibling, 1 reply; 33+ messages in thread From: Hugh Dickins @ 2001-04-06 18:23 UTC (permalink / raw) To: Linus Torvalds Cc: Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, 6 Apr 2001, Linus Torvalds wrote: > > On Fri, 6 Apr 2001, Hugh Dickins wrote: > > > > I like this direction, but (if I understand the issues better today > > than I did yesterday) the patch you posted looks seriously incomplete > > to me. While it deals with one of the issues raised by Rich Jerrell > > (writing dead swap pages), doesn't it exacerbate his other issue? > > Yes. However, I'm assuming most of that is just "statistics get buggered", > in that swap pages tend to stay around for longer than they really are > used. It was true before, but it would be _consistently_ true now. Yes, I like it that the pte_present route becomes consistent with the !pte_present route, and I share your belief that any problems won't be _new_ ones. But I supposed that Rich was describing a practical problem, not just temporarily buggered statistics. > I don't agree with your vm_enough_memory() worry - it should be correct > already, because it shows up as page cache pages (and that, in turn, is > already taken care of). In fact, the swap cache pages shouldn't even > create any new special cases: they are exactly equivalent to already- > existing page cache pages. It is, of course, remotely conceivable that I'm confused, but... I realize that the page cache pages (including those of swap) are already added into "free" by vm_enough_memory(). But it's also adding in nr_swap_pages, and that number is significantly less than what it should be, because freeable swap slots have not been freed. Therefore I think we need to add in that (sadly unknown) number of should-have-been-freed slots - not because the memory hasn't been properly counted, but because the swap hasn't been properly counted. If this is not the case, then I (again) don't understand Rich's difficulty in running the program just after it exited. > (I considered moving the swap-cache page earlier in page_launder(), but > I'd just be happier if we could have this all in swap_writepage() and not > pollute any of the rest of the VM at all. Pipe-dream, maybe). Aside from the vm_enough_memory() issue, if you leave page_launder() to clean up, then some reordering there might well be good: isn't it liable to clean and free some aged but potentially useful pages (e.g. cached pages of live data on swap) before the entirely useless cached pages of dead process data? Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 18:23 ` Hugh Dickins @ 2001-04-06 18:57 ` Linus Torvalds 2001-04-06 19:06 ` Rik van Riel 0 siblings, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2001-04-06 18:57 UTC (permalink / raw) To: Hugh Dickins Cc: Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, 6 Apr 2001, Hugh Dickins wrote: > > It is, of course, remotely conceivable that I'm confused, but... > I realize that the page cache pages (including those of swap) > are already added into "free" by vm_enough_memory(). But it's also > adding in nr_swap_pages .. Ahh. Right you are. We did not just move it from the "free page" to the "swap cache", we also didn't release the space in the actual swap space bitmaps, and you're right, that certainly changes the accounting. Mea culpa. Ideas? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 18:57 ` Linus Torvalds @ 2001-04-06 19:06 ` Rik van Riel 0 siblings, 0 replies; 33+ messages in thread From: Rik van Riel @ 2001-04-06 19:06 UTC (permalink / raw) To: Linus Torvalds Cc: Hugh Dickins, Ben LaHaise, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, 6 Apr 2001, Linus Torvalds wrote: > On Fri, 6 Apr 2001, Hugh Dickins wrote: > > > > It is, of course, remotely conceivable that I'm confused, but... > > I realize that the page cache pages (including those of swap) > > are already added into "free" by vm_enough_memory(). But it's also > > adding in nr_swap_pages > > .. Ahh. Right you are. We did not just move it from the "free > page" to the "swap cache", we also didn't release the space in > the actual swap space bitmaps, and you're right, that certainly > changes the accounting. >From all swap space, the following is available (in principle): - free swap - swapcached space (except that we currently cannot reclaim it) The only problem with this calculation is that it is also too optimistic, since we've already counted all swapcached space as free memory as well ... regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 17:21 ` Linus Torvalds 2001-04-06 18:23 ` Hugh Dickins @ 2001-04-06 18:47 ` Andrea Arcangeli 2001-04-06 18:37 ` Hugh Dickins 1 sibling, 1 reply; 33+ messages in thread From: Andrea Arcangeli @ 2001-04-06 18:47 UTC (permalink / raw) To: Linus Torvalds Cc: Hugh Dickins, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, Apr 06, 2001 at 10:21:38AM -0700, Linus Torvalds wrote: > I don't agree with your vm_enough_memory() worry - it should be correct > already, because it shows up as page cache pages (and that, in turn, is > already taken care of). In fact, the swap cache pages shouldn't even > create any new special cases: they are exactly equivalent to already- > existing page cache pages. swap cache also decrease the amount free-swap-space, that will be reclaimed as soon as we collect the swap cache. so we must add the swap cache size to the amount of virtual memory available (in addition to the in-core pagecachesize) to take care of the swap side. I suggested that as the fix for the failed malloc issue to the missioncritical guys when they asked me about that. However I think I seen some overkill patch floating around, the fix is just a one liner: --- 2.4.3aa/mm/mmap.c.~1~ Fri Apr 6 05:10:16 2001 +++ 2.4.3aa/mm/mmap.c Fri Apr 6 20:44:18 2001 @@ -64,6 +64,7 @@ free += atomic_read(&page_cache_size); free += nr_free_pages(); free += nr_swap_pages; + free += swapper_space.nrpages; /* * The code below doesn't account for free space in the inode * and dentry slab cache, slab cache fragmentation, inodes and Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 18:47 ` Andrea Arcangeli @ 2001-04-06 18:37 ` Hugh Dickins 2001-04-06 19:09 ` Andrea Arcangeli ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Hugh Dickins @ 2001-04-06 18:37 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, 6 Apr 2001, Andrea Arcangeli wrote: > > swap cache also decrease the amount free-swap-space, that will be reclaimed as > soon as we collect the swap cache. so we must add the swap cache size to the > amount of virtual memory available (in addition to the in-core pagecachesize) > to take care of the swap side. I suggested that as the fix for the failed > malloc issue to the missioncritical guys when they asked me about that. swapper_space.nrpages, that's neat, but I insist it's not right. You're then double counting into "free" all the swap cache pages (already included in page_cache_size) which correspond to pages of swap for running processes - erring in the opposite direction to the present code. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 18:37 ` Hugh Dickins @ 2001-04-06 19:09 ` Andrea Arcangeli 2001-04-06 18:53 ` Hugh Dickins 2001-04-06 19:14 ` Andrea Arcangeli 2001-04-06 19:12 ` Richard Jerrell 2001-04-06 19:52 ` Linus Torvalds 2 siblings, 2 replies; 33+ messages in thread From: Andrea Arcangeli @ 2001-04-06 19:09 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, Apr 06, 2001 at 07:37:55PM +0100, Hugh Dickins wrote: > swapper_space.nrpages, that's neat, but I insist it's not right. > You're then double counting into "free" all the swap cache pages > (already included in page_cache_size) which correspond to pages > of swap for running processes - erring in the opposite direction > to the present code. The whole point is that errirng in the opposite direction is perfectly fine, that's expected. Understimating is a bug instead. Period. We always overstimate anyways, we have to because we don't have information about the really freeable memory (think at the buffer cache pinned in the superblock metadata of ext2, do you expect to be able to reclaim it somehow?). Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 19:09 ` Andrea Arcangeli @ 2001-04-06 18:53 ` Hugh Dickins 2001-04-06 19:14 ` Andrea Arcangeli 1 sibling, 0 replies; 33+ messages in thread From: Hugh Dickins @ 2001-04-06 18:53 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, 6 Apr 2001, Andrea Arcangeli wrote: > On Fri, Apr 06, 2001 at 07:37:55PM +0100, Hugh Dickins wrote: > > swapper_space.nrpages, that's neat, but I insist it's not right. > > You're then double counting into "free" all the swap cache pages > > (already included in page_cache_size) which correspond to pages > > of swap for running processes - erring in the opposite direction > > to the present code. > > The whole point is that errirng in the opposite direction is perfectly fine, > that's expected. Understimating is a bug instead. Period. > > We always overstimate anyways, we have to because we don't have information > about the really freeable memory (think at the buffer cache pinned in the > superblock metadata of ext2, do you expect to be able to reclaim it somehow?). Well, that's an interesting point of view... but if it's so okay to overestimate, couldn't we simplify vm_enough_pages() somewhat? Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 19:09 ` Andrea Arcangeli 2001-04-06 18:53 ` Hugh Dickins @ 2001-04-06 19:14 ` Andrea Arcangeli 2001-04-06 19:03 ` Hugh Dickins 1 sibling, 1 reply; 33+ messages in thread From: Andrea Arcangeli @ 2001-04-06 19:14 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, Apr 06, 2001 at 09:09:08PM +0200, Andrea Arcangeli wrote: > We always overstimate anyways, we have to because we don't have information > about the really freeable memory (think at the buffer cache pinned in the ah, and btw, even if we would have information about the really freeable memory in the cache and swap cache that would still useless in real life because we don't reserve memory for the previous malloc calls (endless overcommit discussion), so allocation could still fail during page fault and process will have to be killed the linux way. Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 19:14 ` Andrea Arcangeli @ 2001-04-06 19:03 ` Hugh Dickins 2001-04-06 20:03 ` Andrea Arcangeli 0 siblings, 1 reply; 33+ messages in thread From: Hugh Dickins @ 2001-04-06 19:03 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, 6 Apr 2001, Andrea Arcangeli wrote: > On Fri, Apr 06, 2001 at 09:09:08PM +0200, Andrea Arcangeli wrote: > > We always overstimate anyways, we have to because we don't have information > > about the really freeable memory (think at the buffer cache pinned in the > > ah, and btw, even if we would have information about the really freeable memory > in the cache and swap cache that would still useless in real life because we > don't reserve memory for the previous malloc calls (endless overcommit > discussion), so allocation could still fail during page fault and process will > have to be killed the linux way. How indelicate you are! I've been careful to avoid all mention of that... Seriously, there's good debate to have there, but it's another issue (and I don't think the drawbacks of the overcommit strategy excuse sloppy accounting on its own terms). Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 19:03 ` Hugh Dickins @ 2001-04-06 20:03 ` Andrea Arcangeli 0 siblings, 0 replies; 33+ messages in thread From: Andrea Arcangeli @ 2001-04-06 20:03 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, Apr 06, 2001 at 08:03:08PM +0100, Hugh Dickins wrote: > On Fri, 6 Apr 2001, Andrea Arcangeli wrote: > > On Fri, Apr 06, 2001 at 09:09:08PM +0200, Andrea Arcangeli wrote: > > > We always overstimate anyways, we have to because we don't have information > > > about the really freeable memory (think at the buffer cache pinned in the > > > > ah, and btw, even if we would have information about the really freeable memory > > in the cache and swap cache that would still useless in real life because we > > don't reserve memory for the previous malloc calls (endless overcommit > > discussion), so allocation could still fail during page fault and process will > > have to be killed the linux way. > > How indelicate you are! I've been careful to avoid all mention of that... > Seriously, there's good debate to have there, but it's another issue This is not another issue. It is the same issue. If you don't do reservation for the userspace memory you can only overstimate, if you understimate you are wrong because you are not even able to use all the resources in your machine and that is the current bug (while overstimating and overcommit may allow you to better optimize the memory usage with the downside that you can have to kill a task during a page fault and that is why it should be optional behaviour). So I repeat: if you don't reserve the memory to make sure you will always be able to allocate the malloced memory you just overstimate, and so the vm_enough_memory automatically become a very imprecise guess, it's an heuristic that is only meant to avoid ridicolous big allocations and that must _never_ understimate. This is my last email about this issue. Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 18:37 ` Hugh Dickins 2001-04-06 19:09 ` Andrea Arcangeli @ 2001-04-06 19:12 ` Richard Jerrell 2001-04-06 19:52 ` Linus Torvalds 2 siblings, 0 replies; 33+ messages in thread From: Richard Jerrell @ 2001-04-06 19:12 UTC (permalink / raw) To: Hugh Dickins Cc: Andrea Arcangeli, Linus Torvalds, Ben LaHaise, Rik van Riel, Stephen Tweedie, arjanv, alan, linux-mm > swapper_space.nrpages, that's neat, but I insist it's not right. > You're then double counting into "free" all the swap cache pages > (already included in page_cache_size) which correspond to pages > of swap for running processes - erring in the opposite direction > to the present code. Right. We still have the same problem. If you count the swap cache pages once, you are underestimating the free memory. If you count them twice, you are overcommitting memory. What we would need is to check in swap_free for swap_map[i] count of 1 and a page->count of 1. The cell references the page, and the page references the cell. These pages are freeable _twice_ because they are sitting on twice as much memory as they should be. I'd say that overestimating your memory is better than denying allocation when it's possible. Rich -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 18:37 ` Hugh Dickins 2001-04-06 19:09 ` Andrea Arcangeli 2001-04-06 19:12 ` Richard Jerrell @ 2001-04-06 19:52 ` Linus Torvalds 2001-04-06 20:22 ` Andrea Arcangeli 2001-04-06 20:48 ` Hugh Dickins 2 siblings, 2 replies; 33+ messages in thread From: Linus Torvalds @ 2001-04-06 19:52 UTC (permalink / raw) To: Hugh Dickins Cc: Andrea Arcangeli, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, 6 Apr 2001, Hugh Dickins wrote: > > swapper_space.nrpages, that's neat, but I insist it's not right. It's not "right", but I suspect it's actually good enough. Also, note that when if get _really_ low on memory, the swap cache effect should be going away: if we still have the swap cache pages in memory, we've obviously not paged everything out yet. So the double accounting should have a limit error of zero as we approach being truly low on memory. And that, I suspect, is the most important thing - making sure that we allow programs to run when they can, but at least having _some_ concept of "enough is enough". Note that we should probably also have a small "negative" count: it might not be a bad idea to say "we always want to have X MB free in _some_ form, be it swap or RAM. So I don't think it would necessarily be wrong to say something like free -= num_physpages >> 6; to approximate the notion of "keep 1 percent slop" (remember, the 1% may well be on the swap device, not actually kept as free memory). vm_enough_memory() is a heuristic, nothing more. We want it to reflect _some_ view of reality, but the Linux VM is _fundamentally_ based on the notion of over-commit, and that won't change. vm_enough_memory() is only meant to give a first-order appearance of not overcommitting wildly. It has never been anything more than that. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 19:52 ` Linus Torvalds @ 2001-04-06 20:22 ` Andrea Arcangeli 2001-04-06 21:04 ` Rik van Riel 2001-04-09 18:16 ` Alan Cox 2001-04-06 20:48 ` Hugh Dickins 1 sibling, 2 replies; 33+ messages in thread From: Andrea Arcangeli @ 2001-04-06 20:22 UTC (permalink / raw) To: Linus Torvalds Cc: Hugh Dickins, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, Apr 06, 2001 at 12:52:26PM -0700, Linus Torvalds wrote: > vm_enough_memory() is a heuristic, nothing more. We want it to reflect > _some_ view of reality, but the Linux VM is _fundamentally_ based on the > notion of over-commit, and that won't change. vm_enough_memory() is only > meant to give a first-order appearance of not overcommitting wildly. It > has never been anything more than that. 200% agreed. Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 20:22 ` Andrea Arcangeli @ 2001-04-06 21:04 ` Rik van Riel 2001-04-07 1:27 ` Andrea Arcangeli 2001-04-09 18:16 ` Alan Cox 1 sibling, 1 reply; 33+ messages in thread From: Rik van Riel @ 2001-04-06 21:04 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, Hugh Dickins, Ben LaHaise, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, 6 Apr 2001, Andrea Arcangeli wrote: > On Fri, Apr 06, 2001 at 12:52:26PM -0700, Linus Torvalds wrote: > > vm_enough_memory() is a heuristic, nothing more. We want it to reflect > > _some_ view of reality, but the Linux VM is _fundamentally_ based on the > > notion of over-commit, and that won't change. vm_enough_memory() is only > > meant to give a first-order appearance of not overcommitting wildly. It > > has never been anything more than that. > > 200% agreed. I don't think we should approximate THAT roughly ;)) Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 21:04 ` Rik van Riel @ 2001-04-07 1:27 ` Andrea Arcangeli 0 siblings, 0 replies; 33+ messages in thread From: Andrea Arcangeli @ 2001-04-07 1:27 UTC (permalink / raw) To: Rik van Riel Cc: Linus Torvalds, Hugh Dickins, Ben LaHaise, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, Apr 06, 2001 at 06:04:58PM -0300, Rik van Riel wrote: > On Fri, 6 Apr 2001, Andrea Arcangeli wrote: > > On Fri, Apr 06, 2001 at 12:52:26PM -0700, Linus Torvalds wrote: > > > vm_enough_memory() is a heuristic, nothing more. We want it to reflect > > > _some_ view of reality, but the Linux VM is _fundamentally_ based on the > > > notion of over-commit, and that won't change. vm_enough_memory() is only > > > meant to give a first-order appearance of not overcommitting wildly. It > > > has never been anything more than that. > > > > 200% agreed. > > I don't think we should approximate THAT roughly ;)) ;);) Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 20:22 ` Andrea Arcangeli 2001-04-06 21:04 ` Rik van Riel @ 2001-04-09 18:16 ` Alan Cox 2001-04-09 18:45 ` Andrea Arcangeli 2001-04-09 20:32 ` Linus Torvalds 1 sibling, 2 replies; 33+ messages in thread From: Alan Cox @ 2001-04-09 18:16 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, Hugh Dickins, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm > On Fri, Apr 06, 2001 at 12:52:26PM -0700, Linus Torvalds wrote: > > vm_enough_memory() is a heuristic, nothing more. We want it to reflect > > _some_ view of reality, but the Linux VM is _fundamentally_ based on the > > notion of over-commit, and that won't change. vm_enough_memory() is only > > meant to give a first-order appearance of not overcommitting wildly. It > > has never been anything more than that. > > 200% agreed. Given that strict address space management is not that hard would you accept patches to allow optional non-overcommit in 2.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-09 18:16 ` Alan Cox @ 2001-04-09 18:45 ` Andrea Arcangeli 2001-04-09 20:32 ` Linus Torvalds 1 sibling, 0 replies; 33+ messages in thread From: Andrea Arcangeli @ 2001-04-09 18:45 UTC (permalink / raw) To: Alan Cox Cc: Linus Torvalds, Hugh Dickins, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, linux-mm On Mon, Apr 09, 2001 at 02:16:59PM -0400, Alan Cox wrote: > > On Fri, Apr 06, 2001 at 12:52:26PM -0700, Linus Torvalds wrote: > > > vm_enough_memory() is a heuristic, nothing more. We want it to reflect > > > _some_ view of reality, but the Linux VM is _fundamentally_ based on the > > > notion of over-commit, and that won't change. vm_enough_memory() is only > > > meant to give a first-order appearance of not overcommitting wildly. It > > > has never been anything more than that. > > > > 200% agreed. > > Given that strict address space management is not that hard would you > accept patches to allow optional non-overcommit in 2.5 Since we are not able to estimate how much cache is really freeable a simple early implementation will have to shrink the cache at mmap time, instead of page fault time and that should be acceptable to the people who needs it. I suggest three modes: 1) non overcommit (optional) 2) 2.4 default (default) 3) vm_enough_memory always returns 1, equivalent to 2.4 with overcommit set to 1 (optional) Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-09 18:16 ` Alan Cox 2001-04-09 18:45 ` Andrea Arcangeli @ 2001-04-09 20:32 ` Linus Torvalds 2001-04-09 20:54 ` David L. Parsley 2001-04-10 21:07 ` James Antill 1 sibling, 2 replies; 33+ messages in thread From: Linus Torvalds @ 2001-04-09 20:32 UTC (permalink / raw) To: Alan Cox Cc: Andrea Arcangeli, Hugh Dickins, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, linux-mm On Mon, 9 Apr 2001, Alan Cox wrote: > > Given that strict address space management is not that hard would you > accept patches to allow optional non-overcommit in 2.5 I really doubt anybody wants to use a truly non-overcommit system. It would basically imply counting every single vma that is privately writable, and assuming it becomes totally non-shared. Try this on your system as root: cat /proc/*/maps | grep ' .w.p ' and see how much memory that is. On my machine, running X, that's about 53M with just a few windows open if I did my script right. It grew to 159M when starting StarOffice. (I'm oldfashioned, and not a perl person, so: cat /proc/*/maps | grep 'w.p ' | cut -d' ' -f1 | tr '-' ' ' | while read i j; do export k=$(($k + 0x$j-0x$i)) ; echo $k; done I haven't verified that it gets it right. And that's not counting the really hardwired pages at all, only th epages that might be pageable). It would disallow a lot of stuff that actually _does_ work in practice. But maybe some people do want this. I agree that it shouldn't be fundamentally hard to do accounting at the vma level. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-09 20:32 ` Linus Torvalds @ 2001-04-09 20:54 ` David L. Parsley 2001-04-10 21:07 ` James Antill 1 sibling, 0 replies; 33+ messages in thread From: David L. Parsley @ 2001-04-09 20:54 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Cox, Andrea Arcangeli, Hugh Dickins, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, linux-mm Linus Torvalds wrote: > > On Mon, 9 Apr 2001, Alan Cox wrote: > > > > Given that strict address space management is not that hard would you > > accept patches to allow optional non-overcommit in 2.5 > > I really doubt anybody wants to use a truly non-overcommit system. Eh, how about embedded developers. Like, say, Transmeta. ;-) Things get real ugly when my X terminal runs out of RAM - I gotta think it would be better for mallocs to just fail in userspace. regards, David -- David L. Parsley Network Administrator Roanoke College -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-09 20:32 ` Linus Torvalds 2001-04-09 20:54 ` David L. Parsley @ 2001-04-10 21:07 ` James Antill 2001-04-10 22:20 ` Jeff Garzik 1 sibling, 1 reply; 33+ messages in thread From: James Antill @ 2001-04-10 21:07 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Cox, Andrea Arcangeli, Hugh Dickins, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, linux-mm > On Mon, 9 Apr 2001, Alan Cox wrote: > > > > Given that strict address space management is not that hard would you > > accept patches to allow optional non-overcommit in 2.5 > > I really doubt anybody wants to use a truly non-overcommit system. > > It would basically imply counting every single vma that is privately > writable, and assuming it becomes totally non-shared. > > Try this on your system as root: > > cat /proc/*/maps | grep ' .w.p ' > > and see how much memory that is. > > On my machine, running X, that's about 53M with just a few windows open if > I did my script right. It grew to 159M when starting StarOffice. Disk space is cheap(tm), in comparison to a couple of years ago I have more disk space than $DIETY. # cat /proc/swaps Filename Type Size Used Priority /dev/hda3 partition 979956 0 1 /dev/sda2 partition 976888 21524 4 /dev/sdb1 partition 976872 21452 4 If I could have a sysctl for non-overcommit[1], I'd be pretty happy. I'd imagine that a _lot_ of people in the server space would prefer non-overcommit. [1] Assuming that it doesn't kill performance by allocating non shared mappings, or chunks of swap etc. Ie. it just knows that it can allocate swap when it needs it later on. -- # James Antill -- james@and.org :0: * ^From: .*james@and\.org /dev/null -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-10 21:07 ` James Antill @ 2001-04-10 22:20 ` Jeff Garzik 0 siblings, 0 replies; 33+ messages in thread From: Jeff Garzik @ 2001-04-10 22:20 UTC (permalink / raw) To: James Antill Cc: Linus Torvalds, Alan Cox, Andrea Arcangeli, Hugh Dickins, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, linux-mm James Antill wrote: > [1] Assuming that it doesn't kill performance by allocating non shared > mappings, or chunks of swap etc. Ie. it just knows that it can > allocate swap when it needs it later on. Just FWIW... from my VM-ignorant standpoint, it seems like for the no-overcommit case "reserving" swap space is a much cheaper operation than unconditionally allocating swap space.... -- Jeff Garzik | Sam: "Mind if I drive?" Building 1024 | Max: "Not if you don't mind me clawing at the dash MandrakeSoft | and shrieking like a cheerleader." -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-06 19:52 ` Linus Torvalds 2001-04-06 20:22 ` Andrea Arcangeli @ 2001-04-06 20:48 ` Hugh Dickins 1 sibling, 0 replies; 33+ messages in thread From: Hugh Dickins @ 2001-04-06 20:48 UTC (permalink / raw) To: Linus Torvalds Cc: Andrea Arcangeli, Ben LaHaise, Rik van Riel, Richard Jerrrell, Stephen Tweedie, arjanv, alan, linux-mm On Fri, 6 Apr 2001, Linus Torvalds wrote: > On Fri, 6 Apr 2001, Hugh Dickins wrote: > > > > swapper_space.nrpages, that's neat, but I insist it's not right. > > It's not "right", but I suspect it's actually good enough. Yes, even before Andrea's final email, I found myself warming to his point of view. As you both point out, vm_enough_pages() is merely a heuristic for rejecting the impossible, and overcommit laughs in the face of strict calculation here. And it does a better job than the comparison with infinity I was implying. > Also, note that when if get _really_ low on memory, the swap cache effect > should be going away: if we still have the swap cache pages in memory, > we've obviously not paged everything out yet. So the double accounting > should have a limit error of zero as we approach being truly low on > memory. And that, I suspect, is the most important thing - making sure > that we allow programs to run when they can, but at least having _some_ > concept of "enough is enough". I've been rehearsing this same "limit error of zero" argument to myself here, since your call for "Ideas?" which shut us up. It's plausible, I bet it's not strictly true, but it feels good enough: we all know there are cases where it will go wrong, nothing new there. But maybe a comment in vm_enough_pages() to make the false accounting explicit? And pace those who hate multiple returns, why add all those pages every time, including call to nr_free_pages(), when most often it can succeed right away? I haven't got your "num_physpages >> 6" in there: sounds very reasonable - but there's a 23/11/98 NJC comment (omitted from mine below, since no such code recently) to suggest it was tried once before, anyone remember why that was abandoned? Hugh int vm_enough_memory(long pages) { /* Stupid algorithm to decide if we have enough memory: while * simple, it hopefully works in most obvious cases.. Easy to * fool it, but this should catch most mistakes. */ long free; /* Sometimes we want to use more memory than we have. */ if (sysctl_overcommit_memory) return 1; free = nr_swap_pages; if (free > pages) return 1; free += atomic_read(&page_cache_size); if (free > pages) return 1; free += atomic_read(&buffermem_pages); if (free > pages) return 1; /* * swapper_space.nrpages is the number of swap pages cached: * they have already been included in page_cache_size, but * this compensates for recently unmapped and freeable pages * of swap not yet included in nr_swap_pages: when in doubt, * let vm_enough_memory() err towards success. */ free += swapper_space.nrpages; if (free > pages) return 1; free += nr_free_pages(); if (free > pages) return 1; /* * The code below doesn't account for free space in the inode * and dentry slab cache, slab cache fragmentation, inodes and * dentries which will become freeable under VM load, etc. * Lets just hope all these (complex) factors balance out... */ free += (dentry_stat.nr_unused * sizeof(struct dentry)) >> PAGE_SHIFT; free += (inodes_stat.nr_unused * sizeof(struct inode)) >> PAGE_SHIFT; return free > pages; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-05 15:56 [PATCH] swap_state.c thinko Ben LaHaise 2001-04-05 16:05 ` Rik van Riel @ 2001-04-05 17:21 ` Hugh Dickins 2001-04-05 21:39 ` Richard Jerrell 1 sibling, 1 reply; 33+ messages in thread From: Hugh Dickins @ 2001-04-05 17:21 UTC (permalink / raw) To: Ben LaHaise; +Cc: arjanv, alan, torvalds, sct, jerrell, linux-mm On Thu, 5 Apr 2001, Ben LaHaise wrote: > > Here's another one liner that closes an smp race that could corrupt > things. > > diff -urN v2.4.3/mm/swap_state.c work-2.4.3/mm/swap_state.c > --- v2.4.3/mm/swap_state.c Fri Dec 29 18:04:27 2000 > +++ work-2.4.3/mm/swap_state.c Thu Apr 5 11:55:00 2001 > @@ -140,7 +140,7 @@ > /* > * If we are the only user, then try to free up the swap cache. > */ > - if (PageSwapCache(page) && !TryLockPage(page)) { > + if (!TryLockPage(page) && PageSwapCache(page)) { > if (!is_page_shared(page)) { > delete_from_swap_cache_nolock(page); > } I agree that PageSwapCache(page) needs to be retested when(if) the page lock is acquired, but isn't it best to check PageSwapCache(page) first as at present - won't it very often fail? won't the overhead of TryLocking and Unlocking every page slow down a hot path? And isn't this free_page_and_swap_cache(), precisely the area that's currently subject to debate and patches, because swap pages are not getting freed soon enough? I haven't been following that discussion with full understanding, and haven't seen a full explanation of the problem to be solved; but I'd rather _imagined_ it was that the page would here be on an LRU list, raising its count and causing the is_page_shared(page) test to succeed despite not really shared. So I'd been expecting a patch to remove this code completely. Forgive me if way off base... Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko 2001-04-05 17:21 ` Hugh Dickins @ 2001-04-05 21:39 ` Richard Jerrell 0 siblings, 0 replies; 33+ messages in thread From: Richard Jerrell @ 2001-04-05 21:39 UTC (permalink / raw) To: Hugh Dickins; +Cc: linux-mm > I agree that PageSwapCache(page) needs to be retested when(if) the > page lock is acquired, but isn't it best to check PageSwapCache(page) > first as at present - won't it very often fail? won't the overhead of > TryLocking and Unlocking every page slow down a hot path? Yes and no. It's only a couple of bit operations, so it's probably a pretty negligable slow-down. But, yes it will quite often fail especially in low memory usage situations where there isn't much swapping going on. Adding a second test after the lock would slow down the infrequent case and make it much more like the way lookup_swap_cache works. > And isn't this free_page_and_swap_cache(), precisely the area that's > currently subject to debate and patches, because swap pages are not > getting freed soon enough? What I think you are talking about are three seperate problems, somewhat related. 1) swap cache pages aren't counted in vm_enough_memory as free, meaning that you can fail when trying to allocate memory merely because a lot of pages have already been swapped out but not yet reclaimed or possibly even laundered. Because these pages are already in the swap cache, we know that they can be freed if the normal path of kswapd is followed. 2) we waste time by laundering swap cache pages that are no longer being referenced by either ptes or indirectly through the swap cell references. Problem 1 is what is causing quite a few people to fail prematurely when trying to allocate memory. Problem 2 is just wasting our time. Combined, however, the two problems have dead swap cache pages eating up swap cells, memory, and time. > problem to be solved; but I'd rather _imagined_ it was that the page > would here be on an LRU list, raising its count and causing the > is_page_shared(page) test to succeed despite not really shared. is_page_shared is expecting that someone has one of the references to the page and is trying to determine whether or not other people are interested in it. Being on the LRU isn't necessarily going to have the page's count bumped by one. As a matter of practice, though, the pages on the LRU are all in the swap cache which by way of having the page referenced by the swap cell will bump the count by one. All this is not really part of the problem. The problem is just that 1) swap cache pages are freeable and 2) we didn't check to make sure anyone wanted that page before writing it to disk. Rich -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] swap_state.c thinko
@ 2001-04-06 20:20 Bulent Abali
2001-04-06 20:33 ` Jeff Garzik
0 siblings, 1 reply; 33+ messages in thread
From: Bulent Abali @ 2001-04-06 20:20 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Rik van Riel, linux-mm
>So I don't think it would necessarily be wrong to say
>something like
>
> free -= num_physpages >> 6;
>
>to approximate the notion of "keep 1 percent slop" (remember, the 1% may
>well be on the swap device, not actually kept as free memory).
Hi,
I suggested the same thing to Rik but he rightfully said that it would
not work well for diskless (or swap-less) machines. You may want to
consider the following instead.
free -= (nr_swap_pages)? num_physpages >> 6 : 0;
By the way, disk space is cheap why not give more than 1 percent slop?
This is really accounted in the swap space and not the memory.
It will also help system out of oom_killer's radar.
Bulent Abali (abali@us.ibm.com)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH] swap_state.c thinko 2001-04-06 20:20 Bulent Abali @ 2001-04-06 20:33 ` Jeff Garzik 0 siblings, 0 replies; 33+ messages in thread From: Jeff Garzik @ 2001-04-06 20:33 UTC (permalink / raw) To: Bulent Abali; +Cc: Linus Torvalds, Rik van Riel, linux-mm Bulent Abali wrote: > By the way, disk space is cheap why not give more than 1 percent slop? > This is really accounted in the swap space and not the memory. > It will also help system out of oom_killer's radar. Dumb question... is the OOM killer accounting for icache and dcache memory usage? -- Jeff Garzik | Sam: "Mind if I drive?" Building 1024 | Max: "Not if you don't mind me clawing at the dash MandrakeSoft | and shrieking like a cheerleader." -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2001-04-10 22:20 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-04-05 15:56 [PATCH] swap_state.c thinko Ben LaHaise 2001-04-05 16:05 ` Rik van Riel 2001-04-05 17:11 ` Ben LaHaise 2001-04-05 23:40 ` Andrea Arcangeli 2001-04-06 0:32 ` Linus Torvalds 2001-04-06 16:31 ` Hugh Dickins 2001-04-06 17:21 ` Linus Torvalds 2001-04-06 18:23 ` Hugh Dickins 2001-04-06 18:57 ` Linus Torvalds 2001-04-06 19:06 ` Rik van Riel 2001-04-06 18:47 ` Andrea Arcangeli 2001-04-06 18:37 ` Hugh Dickins 2001-04-06 19:09 ` Andrea Arcangeli 2001-04-06 18:53 ` Hugh Dickins 2001-04-06 19:14 ` Andrea Arcangeli 2001-04-06 19:03 ` Hugh Dickins 2001-04-06 20:03 ` Andrea Arcangeli 2001-04-06 19:12 ` Richard Jerrell 2001-04-06 19:52 ` Linus Torvalds 2001-04-06 20:22 ` Andrea Arcangeli 2001-04-06 21:04 ` Rik van Riel 2001-04-07 1:27 ` Andrea Arcangeli 2001-04-09 18:16 ` Alan Cox 2001-04-09 18:45 ` Andrea Arcangeli 2001-04-09 20:32 ` Linus Torvalds 2001-04-09 20:54 ` David L. Parsley 2001-04-10 21:07 ` James Antill 2001-04-10 22:20 ` Jeff Garzik 2001-04-06 20:48 ` Hugh Dickins 2001-04-05 17:21 ` Hugh Dickins 2001-04-05 21:39 ` Richard Jerrell 2001-04-06 20:20 Bulent Abali 2001-04-06 20:33 ` Jeff Garzik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox