From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from luxury.wat.veritas.com([10.10.185.122]) (29365 bytes) by megami.veritas.com via sendmail with P:esmtp/R:smart_host/T:smtp (sender: ) id for ; Thu, 16 Aug 2001 15:28:45 -0700 (PDT) (Smail-3.2.0.101 1997-Dec-17 #4 built 1999-Aug-24) Date: Thu, 16 Aug 2001 23:30:11 +0100 (BST) From: Hugh Dickins Subject: [PATCH] more swapoff fixes In-Reply-To: <020901c1251a$e9386850$bef7020a@mammon> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Jeremy Linton Cc: linux-mm@kvack.org List-ID: Jeremy and list, Below is my patch of further bugfixes and speedups to swapoff, based on 2.4.9. For now it's an all-in-one patch: one of these days when Linus gets back I'll cut it into little pieces, with justification for each of those pieces. Like most of VM, I think it's very much a "Linus" thing rather than an "Alan" thing, so I won't be pressing for inclusion in -ac. I'm mostly happy with it, but there's one issue outstanding: need to make sure we scan child _after_ parent if caught in dup_mmap(). I'll come back to that when I'm fresher: I've kept your "repeat until empty", so I believe it's safe unless SWAP_MAP_MAX. Please would someone explain to me how the SWAP_MAP_MAX case can arise, I confess I don't get it. I see how a shared file page only needs 256 processes mapping in 256 places to overflow an unsigned short; but I don't see how we get there with swap pages. It does govern the approach quite strictly, and I found the current algorithm much better suited to its constraints than I'd originally supposed: no radical departure, sorry, even the BKL is still needed (probably quite easy to eliminate, but more a 2.5 thing). The most serious bugfix, I believe, is to the "dirty" handling. try_to_unuse() used to remove page from pagecache after the loop filling in ptes, but at some point it's been moved before the loop. I prefer it before the loop, because that keeps try_to_swap_out() from duping the entry while we're unduping it; but try_to_swap_out() was discarding a non-swapcache page if its pte was not dirty. So if a swapcache page was already faulted in read-only, and swap_out() reached it after freed from swapcache, before unuse_pte() got there, your data would be wiped - or have I misunderstood? (I should say that the races I found were all long-standing, no problems from Jeremy's 2.4.8 version.) Faster? please decide for yourself. Review and testing very welcome! Thanks, Hugh --- linux-2.4.9/include/linux/pagemap.h Wed Aug 15 22:21:21 2001 +++ linux-swapoff/include/linux/pagemap.h Thu Aug 16 22:06:26 2001 @@ -81,11 +81,6 @@ #define find_lock_page(mapping, index) \ __find_lock_page(mapping, index, page_hash(mapping, index)) -extern struct page * __find_get_swapcache_page (struct address_space * mapping, - unsigned long index, struct page **hash); -#define find_get_swapcache_page(mapping, index) \ - __find_get_swapcache_page(mapping, index, page_hash(mapping, index)) - extern void __add_page_to_hash_queue(struct page * page, struct page **p); extern void add_to_page_cache(struct page * page, struct address_space *mapping, unsigned long index); --- linux-2.4.9/kernel/fork.c Wed Jul 18 02:23:28 2001 +++ linux-swapoff/kernel/fork.c Thu Aug 16 22:06:26 2001 @@ -8,7 +8,7 @@ * 'fork.c' contains the help-routines for the 'fork' system call * (see also entry.S and others). * Fork is rather simple, once you get the hang of it, but the memory - * management can be a bitch. See 'mm/memory.c': 'copy_page_tables()' + * management can be a bitch. See 'mm/memory.c': 'copy_page_range()' */ #include @@ -134,9 +134,22 @@ mm->mmap_avl = NULL; mm->mmap_cache = NULL; mm->map_count = 0; + mm->rss = 0; mm->cpu_vm_mask = 0; mm->swap_address = 0; pprev = &mm->mmap; + + /* + * Add it to the mmlist after the parent. + * Doing it this way means that we can order the list, + * and fork() won't mess up the ordering significantly. + * Add it first so that swapoff can see any swap entries. + */ + spin_lock(&mmlist_lock); + list_add(&mm->mmlist, ¤t->mm->mmlist); + mmlist_nr++; + spin_unlock(&mmlist_lock); + for (mpnt = current->mm->mmap ; mpnt ; mpnt = mpnt->vm_next) { struct file *file; @@ -149,7 +162,6 @@ *tmp = *mpnt; tmp->vm_flags &= ~VM_LOCKED; tmp->vm_mm = mm; - mm->map_count++; tmp->vm_next = NULL; file = tmp->vm_file; if (file) { @@ -168,17 +180,19 @@ spin_unlock(&inode->i_mapping->i_shared_lock); } - /* Copy the pages, but defer checking for errors */ - retval = copy_page_range(mm, current->mm, tmp); - if (!retval && tmp->vm_ops && tmp->vm_ops->open) - tmp->vm_ops->open(tmp); - /* - * Link in the new vma even if an error occurred, - * so that exit_mmap() can clean up the mess. + * Link in the new vma and copy the page table entries: + * link in first so that swapoff can see swap entries. */ + spin_lock(&mm->page_table_lock); *pprev = tmp; pprev = &tmp->vm_next; + mm->map_count++; + retval = copy_page_range(mm, current->mm, tmp); + spin_unlock(&mm->page_table_lock); + + if (tmp->vm_ops && tmp->vm_ops->open) + tmp->vm_ops->open(tmp); if (retval) goto fail_nomem; @@ -319,18 +333,6 @@ down_write(&oldmm->mmap_sem); retval = dup_mmap(mm); up_write(&oldmm->mmap_sem); - - /* - * Add it to the mmlist after the parent. - * - * Doing it this way means that we can order - * the list, and fork() won't mess up the - * ordering significantly. - */ - spin_lock(&mmlist_lock); - list_add(&mm->mmlist, &oldmm->mmlist); - mmlist_nr++; - spin_unlock(&mmlist_lock); if (retval) goto free_pt; --- linux-2.4.9/mm/filemap.c Thu Aug 16 19:12:07 2001 +++ linux-swapoff/mm/filemap.c Thu Aug 16 22:06:26 2001 @@ -682,34 +682,6 @@ } /* - * Find a swapcache page (and get a reference) or return NULL. - * The SwapCache check is protected by the pagecache lock. - */ -struct page * __find_get_swapcache_page(struct address_space *mapping, - unsigned long offset, struct page **hash) -{ - struct page *page; - - /* - * We need the LRU lock to protect against page_launder(). - */ - - spin_lock(&pagecache_lock); - page = __find_page_nolock(mapping, offset, *hash); - if (page) { - spin_lock(&pagemap_lru_lock); - if (PageSwapCache(page)) - page_cache_get(page); - else - page = NULL; - spin_unlock(&pagemap_lru_lock); - } - spin_unlock(&pagecache_lock); - - return page; -} - -/* * Same as the above, but lock the page too, verifying that * it's still valid once we own it. */ --- linux-2.4.9/mm/memory.c Tue Aug 14 00:16:41 2001 +++ linux-swapoff/mm/memory.c Thu Aug 16 22:06:26 2001 @@ -148,6 +148,9 @@ * * 08Jan98 Merged into one routine from several inline routines to reduce * variable count and make things faster. -jj + * + * dst->page_table_lock is held on entry and exit, + * but may be dropped within pmd_alloc() and pte_alloc(). */ int copy_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *vma) @@ -159,8 +162,7 @@ src_pgd = pgd_offset(src, address)-1; dst_pgd = pgd_offset(dst, address)-1; - - spin_lock(&dst->page_table_lock); + for (;;) { pmd_t * src_pmd, * dst_pmd; @@ -234,6 +236,7 @@ pte = pte_mkclean(pte); pte = pte_mkold(pte); get_page(ptepage); + dst->rss++; cont_copy_pte_range: set_pte(dst_pte, pte); cont_copy_pte_range_noset: address += PAGE_SIZE; @@ -251,11 +254,8 @@ out_unlock: spin_unlock(&src->page_table_lock); out: - spin_unlock(&dst->page_table_lock); return 0; - nomem: - spin_unlock(&dst->page_table_lock); return -ENOMEM; } @@ -1080,15 +1080,13 @@ /* Don't block on I/O for read-ahead */ if (atomic_read(&nr_async_pages) >= pager_daemon.swap_cluster * (1 << page_cluster)) { - while (i++ < num) - swap_free(SWP_ENTRY(SWP_TYPE(entry), offset++)); break; } /* Ok, do the async read-ahead now */ new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset)); - if (new_page != NULL) - page_cache_release(new_page); - swap_free(SWP_ENTRY(SWP_TYPE(entry), offset)); + if (new_page == NULL) + break; + page_cache_release(new_page); } return; } --- linux-2.4.9/mm/swap_state.c Wed Jul 18 23:18:15 2001 +++ linux-swapoff/mm/swap_state.c Thu Aug 16 22:06:26 2001 @@ -29,7 +29,7 @@ if (swap_count(page) > 1) goto in_use; - /* We could remove it here, but page_launder will do it anyway */ + delete_from_swap_cache_nolock(page); UnlockPage(page); return 0; @@ -79,40 +79,35 @@ BUG(); if (page->mapping) BUG(); - flags = page->flags & ~((1 << PG_error) | (1 << PG_arch_1)); + + /* clear PG_dirty so a subsequent set_page_dirty takes effect */ + flags = page->flags & ~((1 << PG_error) | (1 << PG_dirty) | (1 << PG_arch_1)); page->flags = flags | (1 << PG_uptodate); page->age = PAGE_AGE_START; add_to_page_cache_locked(page, &swapper_space, entry.val); } -static inline void remove_from_swap_cache(struct page *page) -{ - struct address_space *mapping = page->mapping; - - if (mapping != &swapper_space) - BUG(); - if (!PageSwapCache(page) || !PageLocked(page)) - PAGE_BUG(page); - - PageClearSwapCache(page); - ClearPageDirty(page); - __remove_inode_page(page); -} - /* * This must be called only on pages that have * been verified to be in the swap cache. */ void __delete_from_swap_cache(struct page *page) { + struct address_space *mapping = page->mapping; swp_entry_t entry; - entry.val = page->index; - #ifdef SWAP_CACHE_INFO swap_cache_del_total++; #endif - remove_from_swap_cache(page); + if (mapping != &swapper_space) + BUG(); + if (!PageSwapCache(page) || !PageLocked(page)) + BUG(); + + entry.val = page->index; + PageClearSwapCache(page); + ClearPageDirty(page); + __remove_inode_page(page); swap_free(entry); } @@ -129,7 +124,6 @@ lru_cache_del(page); spin_lock(&pagecache_lock); - ClearPageDirty(page); __delete_from_swap_cache(page); spin_unlock(&pagecache_lock); page_cache_release(page); @@ -169,14 +163,12 @@ page_cache_release(page); } - /* * Lookup a swap entry in the swap cache. A found page will be returned * unlocked and with its refcount incremented - we rely on the kernel * lock getting page table operations atomic even if we drop the page * lock before returning. */ - struct page * lookup_swap_cache(swp_entry_t entry) { struct page *found; @@ -184,22 +176,20 @@ #ifdef SWAP_CACHE_INFO swap_cache_find_total++; #endif - while (1) { - /* - * Right now the pagecache is 32-bit only. But it's a 32 bit index. =) - */ - found = find_get_swapcache_page(&swapper_space, entry.val); - if (!found) - return 0; - if (!PageSwapCache(found)) - BUG(); - if (found->mapping != &swapper_space) - BUG(); + found = find_get_page(&swapper_space, entry.val); + if (!found) + return 0; + /* + * If SMP, it is unsafe to assert PageSwapCache and mapping + * here: nothing prevents swapoff from deleting this page from + * the swap cache at this moment. find_lock_page would prevent + * that, but no need to change: we _have_ got the right page + * (but would not recognize it as an exclusive swap page later). + */ #ifdef SWAP_CACHE_INFO - swap_cache_find_success++; + swap_cache_find_success++; #endif - return found; - } + return found; } /* @@ -210,33 +200,41 @@ * A failure return means that either the page allocation failed or that * the swap entry is no longer in use. */ - struct page * read_swap_cache_async(swp_entry_t entry) { - struct page *found_page = 0, *new_page; + struct page *found_page, *new_page; + struct page **hash; /* - * Make sure the swap entry is still in use. - */ - if (!swap_duplicate(entry)) /* Account for the swap cache */ - goto out; - /* - * Look for the page in the swap cache. + * Look for the page in the swap cache. Since we normally call + * this only after lookup_swap_cache() failed, re-calling that + * would confuse the statistics: use __find_get_page() directly. */ - found_page = lookup_swap_cache(entry); + hash = page_hash(&swapper_space, entry.val); + found_page = __find_get_page(&swapper_space, entry.val, hash); if (found_page) - goto out_free_swap; + goto out; new_page = alloc_page(GFP_HIGHUSER); if (!new_page) - goto out_free_swap; /* Out of memory */ + goto out; /* Out of memory */ /* * Check the swap cache again, in case we stalled above. + * The BKL is guarding against races between this check + * and where the new page is added to the swap cache below. */ - found_page = lookup_swap_cache(entry); + found_page = __find_get_page(&swapper_space, entry.val, hash); if (found_page) goto out_free_page; + + /* + * Make sure the swap entry is still in use. It could have gone + * while caller waited for BKL, or while allocating a page. + */ + if (!swap_duplicate(entry)) /* Account for the swap cache */ + goto out_free_page; + /* * Add it to the swap cache and read its contents. */ @@ -248,8 +246,6 @@ out_free_page: page_cache_release(new_page); -out_free_swap: - swap_free(entry); out: return found_page; } --- linux-2.4.9/mm/swapfile.c Sat Aug 11 02:02:42 2001 +++ linux-swapoff/mm/swapfile.c Thu Aug 16 22:06:26 2001 @@ -229,33 +229,23 @@ * share this swap entry, so be cautious and let do_wp_page work out * what to do if a write is requested later. */ -/* tasklist_lock and vma->vm_mm->page_table_lock are held */ +/* BKL, mmlist_lock and vma->vm_mm->page_table_lock are held */ static inline void unuse_pte(struct vm_area_struct * vma, unsigned long address, pte_t *dir, swp_entry_t entry, struct page* page) { pte_t pte = *dir; - if (pte_none(pte)) + if (pte_none(pte) || pte_present(pte)) return; - if (pte_present(pte)) { - /* If this entry is swap-cached, then page must already - hold the right address for any copies in physical - memory */ - if (pte_page(pte) != page) - return; - /* We will be removing the swap cache in a moment, so... */ - ptep_mkdirty(dir); - return; - } if (pte_to_swp_entry(pte).val != entry.val) return; - set_pte(dir, pte_mkdirty(mk_pte(page, vma->vm_page_prot))); - swap_free(entry); get_page(page); + set_pte(dir, pte_mkold(mk_pte(page, vma->vm_page_prot))); + swap_free(entry); ++vma->vm_mm->rss; } -/* tasklist_lock and vma->vm_mm->page_table_lock are held */ +/* BKL, mmlist_lock and vma->vm_mm->page_table_lock are held */ static inline void unuse_pmd(struct vm_area_struct * vma, pmd_t *dir, unsigned long address, unsigned long size, unsigned long offset, swp_entry_t entry, struct page* page) @@ -283,7 +273,7 @@ } while (address && (address < end)); } -/* tasklist_lock and vma->vm_mm->page_table_lock are held */ +/* BKL, mmlist_lock and vma->vm_mm->page_table_lock are held */ static inline void unuse_pgd(struct vm_area_struct * vma, pgd_t *dir, unsigned long address, unsigned long size, swp_entry_t entry, struct page* page) @@ -314,7 +304,7 @@ } while (address && (address < end)); } -/* tasklist_lock and vma->vm_mm->page_table_lock are held */ +/* BKL, mmlist_lock and vma->vm_mm->page_table_lock are held */ static void unuse_vma(struct vm_area_struct * vma, pgd_t *pgdir, swp_entry_t entry, struct page* page) { @@ -337,8 +327,6 @@ /* * Go through process' page directory. */ - if (!mm) - return; spin_lock(&mm->page_table_lock); for (vma = mm->mmap; vma; vma = vma->vm_next) { pgd_t * pgd = pgd_offset(mm, vma->vm_start); @@ -348,54 +336,34 @@ return; } -/* - * this is called when we find a page in the swap list - * all the locks have been dropped at this point which - * isn't a problem because we rescan the swap map - * and we _don't_ clear the refrence count if for - * some reason it isn't 0 - */ - -static inline int free_found_swap_entry(unsigned int type, int i) +static int find_next_to_unuse(struct swap_info_struct *si, int prev) { - struct task_struct *p; - struct page *page; - swp_entry_t entry; + int i = prev; + int count; - entry = SWP_ENTRY(type, i); - - /* - * Get a page for the entry, using the existing swap - * cache page if there is one. Otherwise, get a clean - * page and read the swap into it. - */ - page = read_swap_cache_async(entry); - if (!page) { - swap_free(entry); - return -ENOMEM; + swap_device_lock(si); + for (;;) { + if (++i >= si->max) { + i = 0; + break; + } + count = si->swap_map[i]; + if (count && count != SWAP_MAP_BAD) { + /* + * Prevent swaphandle from being completely + * unused by swap_free while we are trying + * to read in the page - this prevents warning + * messages from rw_swap_page_base. + */ + if (count != SWAP_MAP_MAX) + si->swap_map[i] = count + 1; + break; + } } - lock_page(page); - if (PageSwapCache(page)) - delete_from_swap_cache_nolock(page); - UnlockPage(page); - read_lock(&tasklist_lock); - for_each_task(p) - unuse_process(p->mm, entry, page); - read_unlock(&tasklist_lock); - shmem_unuse(entry, page); - /* - * Now get rid of the extra reference to the temporary - * page we've been using. - */ - page_cache_release(page); - /* - * Check for and clear any overflowed swap map counts. - */ - swap_free(entry); - return 0; + swap_device_unlock(si); + return i; } - /* * We completely avoid races by reading each swap page in advance, * and then search for the process using it. All the necessary @@ -404,80 +372,161 @@ static int try_to_unuse(unsigned int type) { struct swap_info_struct * si = &swap_info[type]; - int ret, foundpage; + struct mm_struct *last_mm; + struct page *page; + struct page *nextpage = NULL; /* keep compiler quiet */ + unsigned short *swap_map; + swp_entry_t entry; + int i, next = 0; + int retval = 0; - do { - int i; + /* + * When searching mms for an entry, a plausible strategy is + * to start at the last mm we freed the preceding entry from + * (though actually we don't observe whether we or another + * freed the entry). Initialize this last_mm with a hold. + */ + last_mm = &init_mm; + atomic_inc(&init_mm.mm_users); + + for (;;) { + if (!next) { + /* + * Start a fresh sweep of the swap_map. We're + * done when no entry found in a single sweep. + */ + next = find_next_to_unuse(si, 0); + if (!next) + break; + entry = SWP_ENTRY(type, next); + nextpage = read_swap_cache_async(entry); + if (!nextpage) { + swap_free(entry); + retval = -ENOMEM; + break; + } + } /* - * The algorithm is inefficient but seldomly used - * - * Find a swap page in use and read it in. + * Set page to nextpage, with readahead + * to a new nextpage in the background. */ - foundpage = 0; - swap_device_lock(si); - for (i = 1; i < si->max ; i++) { - int count = si->swap_map[i]; - if (!count || count == SWAP_MAP_BAD) - continue; + i = next; + page = nextpage; + next = find_next_to_unuse(si, next); + if (next) { + entry = SWP_ENTRY(type, next); + nextpage = read_swap_cache_async(entry); + if (!nextpage) { + swap_free(entry); + swap_free(SWP_ENTRY(type, i)); + page_cache_release(page); + retval = -ENOMEM; + break; + } + } - /* - * Prevent swaphandle from being completely - * unused by swap_free while we are trying - * to read in the page - this prevents warning - * messages from rw_swap_page_base. - */ - foundpage = 1; - if (count != SWAP_MAP_MAX) - si->swap_map[i] = count + 1; + /* + * Don't hold on to last_mm if it looks like exiting. + * Can mmput ever block? if so, then we cannot risk + * it between deleting the page from the swap cache, + * and completing the search through mms (and cannot + * use it to avoid the long hold on mmlist_lock there). + */ + if (atomic_read(&last_mm->mm_users) == 1) { + mmput(last_mm); + last_mm = &init_mm; + atomic_inc(&init_mm.mm_users); + } - swap_device_unlock(si); - ret = free_found_swap_entry(type,i); - if (ret) - return ret; + /* + * While next swap entry is read into the swap cache + * asynchronously (if it was not already present), wait + * for and lock page. Remove it from the swap cache so + * try_to_swap_out won't bump swap count. Mark it dirty + * so try_to_swap_out will preserve it without us having + * to mark any present ptes as dirty: so we can skip + * searching processes once swap count has all gone. + */ + lock_page(page); + if (PageSwapCache(page)) + delete_from_swap_cache_nolock(page); + SetPageDirty(page); + UnlockPage(page); + flush_page_to_ram(page); - /* - * we pick up the swap_list_lock() to guard the nr_swap_pages, - * si->swap_map[] should only be changed if it is SWAP_MAP_MAX - * otherwise ugly stuff can happen with other people who are in - * the middle of a swap operation to this device. This kind of - * operation can sometimes be detected with the undead swap - * check. Don't worry about these 'undead' entries for now - * they will be caught the next time though the top loop. - * Do worry, about the weak locking that allows this to happen - * because if it happens to a page that is SWAP_MAP_MAX - * then bad stuff can happen. - */ + /* + * Remove all references to entry (without blocking). + */ + entry = SWP_ENTRY(type, i); + swap_map = &si->swap_map[i]; + swap_free(entry); + if (*swap_map) { + if (last_mm == &init_mm) + shmem_unuse(entry, page); + else + unuse_process(last_mm, entry, page); + } + if (*swap_map) { + struct mm_struct *mm = last_mm; + struct list_head *p = &mm->mmlist; + + spin_lock(&mmlist_lock); + while (*swap_map && (p = p->next) != &last_mm->mmlist) { + mm = list_entry(p, struct mm_struct, mmlist); + if (mm == &init_mm) + shmem_unuse(entry, page); + else + unuse_process(mm, entry, page); + } + atomic_inc(&mm->mm_users); + spin_unlock(&mmlist_lock); + mmput(last_mm); + last_mm = mm; + } + page_cache_release(page); + + /* + * There's could still be an outstanding reference: + * if we checked the child before the parent while + * fork was in dup_mmap; or if mmput has taken the mm + * off mmlist, but exit_mmap has not yet completed. + * Until that's fixed, loop back and recheck at the end: + * we are done once none found in a single locked sweep. + * That handles the normal case; but if SWAP_MAP_MAX we + * only _hope_ we got them all in one, and reset anyway. + */ + if (*swap_map == SWAP_MAP_MAX) { swap_list_lock(); swap_device_lock(si); - if (si->swap_map[i] > 0) { - /* normally this would just kill the swap page if - * it still existed, it appears though that the locks - * are a little fuzzy - */ - if (si->swap_map[i] != SWAP_MAP_MAX) { - printk("VM: Undead swap entry %08lx\n", - SWP_ENTRY(type, i).val); - } else { - nr_swap_pages++; - si->swap_map[i] = 0; - } - } + *swap_map = 0; + nr_swap_pages++; swap_device_unlock(si); swap_list_unlock(); - + } + if (*swap_map) { /* - * This lock stuff is ulgy! - * Make sure that we aren't completely killing - * interactive performance. + * Useful message while testing, but we know + * this can happen, so should be removed soon. */ - if (current->need_resched) - schedule(); - swap_device_lock(si); + printk("VM: Undead swap entry %08lx\n", entry.val); } - swap_device_unlock(si); - } while (foundpage); - return 0; + + /* + * Make sure that we aren't completely killing + * interactive performance. Interruptible check on + * signal_pending() would be nice, but changes the spec? + */ + if (current->need_resched) + schedule(); + else { + unlock_kernel(); + lock_kernel(); + } + } + + mmput(last_mm); + return retval; } asmlinkage long sys_swapoff(const char * specialfile) @@ -557,6 +606,7 @@ nd.mnt = p->swap_vfsmnt; p->swap_vfsmnt = NULL; p->swap_device = 0; + p->max = 0; vfree(p->swap_map); p->swap_map = NULL; p->flags = 0; @@ -637,7 +687,7 @@ union swap_header *swap_header = 0; int swap_header_version; int nr_good_pages = 0; - unsigned long maxpages; + unsigned long maxpages = 1; int swapfilesize; struct block_device *bdev = NULL; @@ -752,17 +802,17 @@ if (!p->lowest_bit) p->lowest_bit = i; p->highest_bit = i; - p->max = i+1; + maxpages = i+1; j++; } } nr_good_pages = j; - p->swap_map = vmalloc(p->max * sizeof(short)); + p->swap_map = vmalloc(maxpages * sizeof(short)); if (!p->swap_map) { error = -ENOMEM; goto bad_swap; } - for (i = 1 ; i < p->max ; i++) { + for (i = 1 ; i < maxpages ; i++) { if (test_bit(i,(char *) swap_header)) p->swap_map[i] = 0; else @@ -783,24 +833,22 @@ p->lowest_bit = 1; p->highest_bit = swap_header->info.last_page - 1; - p->max = swap_header->info.last_page; - - maxpages = SWP_OFFSET(SWP_ENTRY(0,~0UL)); - if (p->max >= maxpages) - p->max = maxpages-1; + maxpages = SWP_OFFSET(SWP_ENTRY(0,~0UL)) - 1; + if (maxpages > swap_header->info.last_page) + maxpages = swap_header->info.last_page; error = -EINVAL; if (swap_header->info.nr_badpages > MAX_SWAP_BADPAGES) goto bad_swap; /* OK, set up the swap map and apply the bad block list */ - if (!(p->swap_map = vmalloc (p->max * sizeof(short)))) { + if (!(p->swap_map = vmalloc(maxpages * sizeof(short)))) { error = -ENOMEM; goto bad_swap; } error = 0; - memset(p->swap_map, 0, p->max * sizeof(short)); + memset(p->swap_map, 0, maxpages * sizeof(short)); for (i=0; iinfo.nr_badpages; i++) { int page = swap_header->info.badpages[i]; if (page <= 0 || page >= swap_header->info.last_page) @@ -815,7 +863,7 @@ goto bad_swap; } - if (swapfilesize && p->max > swapfilesize) { + if (swapfilesize && maxpages > swapfilesize) { printk(KERN_WARNING "Swap area shorter than signature indicates\n"); error = -EINVAL; @@ -827,6 +875,7 @@ goto bad_swap; } p->swap_map[0] = SWAP_MAP_BAD; + p->max = maxpages; p->flags = SWP_WRITEOK; p->pages = nr_good_pages; swap_list_lock(); @@ -856,6 +905,7 @@ if (bdev) blkdev_put(bdev, BDEV_SWAP); bad_swap_2: + p->max = 0; if (p->swap_map) vfree(p->swap_map); nd.mnt = p->swap_vfsmnt; @@ -947,10 +997,12 @@ printk("Bad swap file entry %08lx\n", entry.val); goto out; bad_offset: - printk("Bad swap offset entry %08lx\n", entry.val); + /* Don't report: can happen in read_swap_cache_async after swapoff */ + /* printk("Bad swap offset entry %08lx\n", entry.val); */ goto out; bad_unused: - printk("Unused swap offset entry in swap_dup %08lx\n", entry.val); + /* Don't report: can happen in read_swap_cache_async after sleeping */ + /* printk("Unused swap offset entry in swap_dup %08lx\n", entry.val); */ goto out; } @@ -1037,8 +1089,8 @@ } /* - * Kernel_lock protects against swap device deletion. Grab an extra - * reference on the swaphandle so that it dos not become unused. + * Kernel_lock protects against swap device deletion. Don't grab an extra + * reference on the swaphandle, it doesn't matter if it becomes unused. */ int valid_swaphandles(swp_entry_t entry, unsigned long *offset) { @@ -1059,7 +1111,6 @@ break; if (swapdev->swap_map[toff] == SWAP_MAP_BAD) break; - swapdev->swap_map[toff]++; toff++; ret++; } while (--i); --- linux-2.4.9/mm/vmscan.c Wed Aug 15 10:37:07 2001 +++ linux-swapoff/mm/vmscan.c Thu Aug 16 22:06:26 2001 @@ -111,6 +111,7 @@ * is needed on CPUs which update the accessed and dirty * bits in hardware. */ + flush_cache_page(vma, address); pte = ptep_get_and_clear(page_table); flush_tlb_page(vma, address); @@ -148,20 +149,13 @@ * Basically, this just makes it possible for us to do * some real work in the future in "refill_inactive()". */ - flush_cache_page(vma, address); - if (!pte_dirty(pte)) - goto drop_pte; - - /* - * Ok, it's really dirty. That means that - * we should either create a new swap cache - * entry for it, or we should write it back - * to its own backing store. - */ if (page->mapping) { - set_page_dirty(page); + if (pte_dirty(pte)) + set_page_dirty(page); goto drop_pte; } + if (!pte_dirty(pte) && !PageDirty(page)) + goto drop_pte; /* * This is a dirty, swappable page. First of all, @@ -539,8 +533,12 @@ * last copy.. */ if (PageDirty(page)) { - int (*writepage)(struct page *) = page->mapping->a_ops->writepage; + int (*writepage)(struct page *); + /* Can a page get here without page->mapping? */ + if (!page->mapping) + goto page_active; + writepage = page->mapping->a_ops->writepage; if (!writepage) goto page_active; @@ -779,7 +777,7 @@ { pg_data_t *pgdat; unsigned int global_target = freepages.high + inactive_target; - unsigned int global_incative = 0; + unsigned int global_inactive = 0; pgdat = pgdat_list; do { @@ -799,13 +797,13 @@ if (inactive < zone->pages_high) return 1; - global_incative += inactive; + global_inactive += inactive; } pgdat = pgdat->node_next; } while (pgdat); /* Global shortage? */ - return global_incative < global_target; + return global_inactive < global_target; } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/