* [PATCH] more swapoff fixes
[not found] <020901c1251a$e9386850$bef7020a@mammon>
@ 2001-08-16 22:30 ` Hugh Dickins
2001-08-17 18:02 ` Jeremy Linton
0 siblings, 1 reply; 2+ messages in thread
From: Hugh Dickins @ 2001-08-16 22:30 UTC (permalink / raw)
To: Jeremy Linton; +Cc: linux-mm
Jeremy and list,
Below is my patch of further bugfixes and speedups to swapoff,
based on 2.4.9. For now it's an all-in-one patch: one of these
days when Linus gets back I'll cut it into little pieces, with
justification for each of those pieces. Like most of VM, I think
it's very much a "Linus" thing rather than an "Alan" thing,
so I won't be pressing for inclusion in -ac.
I'm mostly happy with it, but there's one issue outstanding: need
to make sure we scan child _after_ parent if caught in dup_mmap().
I'll come back to that when I'm fresher: I've kept your "repeat
until empty", so I believe it's safe unless SWAP_MAP_MAX.
Please would someone explain to me how the SWAP_MAP_MAX case
can arise, I confess I don't get it. I see how a shared file
page only needs 256 processes mapping in 256 places to overflow
an unsigned short; but I don't see how we get there with swap pages.
It does govern the approach quite strictly, and I found the current
algorithm much better suited to its constraints than I'd originally
supposed: no radical departure, sorry, even the BKL is still needed
(probably quite easy to eliminate, but more a 2.5 thing).
The most serious bugfix, I believe, is to the "dirty" handling.
try_to_unuse() used to remove page from pagecache after the loop
filling in ptes, but at some point it's been moved before the loop.
I prefer it before the loop, because that keeps try_to_swap_out()
from duping the entry while we're unduping it; but try_to_swap_out()
was discarding a non-swapcache page if its pte was not dirty. So
if a swapcache page was already faulted in read-only, and swap_out()
reached it after freed from swapcache, before unuse_pte() got there,
your data would be wiped - or have I misunderstood? (I should say
that the races I found were all long-standing, no problems from
Jeremy's 2.4.8 version.)
Faster? please decide for yourself.
Review and testing very welcome!
Thanks,
Hugh
--- linux-2.4.9/include/linux/pagemap.h Wed Aug 15 22:21:21 2001
+++ linux-swapoff/include/linux/pagemap.h Thu Aug 16 22:06:26 2001
@@ -81,11 +81,6 @@
#define find_lock_page(mapping, index) \
__find_lock_page(mapping, index, page_hash(mapping, index))
-extern struct page * __find_get_swapcache_page (struct address_space * mapping,
- unsigned long index, struct page **hash);
-#define find_get_swapcache_page(mapping, index) \
- __find_get_swapcache_page(mapping, index, page_hash(mapping, index))
-
extern void __add_page_to_hash_queue(struct page * page, struct page **p);
extern void add_to_page_cache(struct page * page, struct address_space *mapping, unsigned long index);
--- linux-2.4.9/kernel/fork.c Wed Jul 18 02:23:28 2001
+++ linux-swapoff/kernel/fork.c Thu Aug 16 22:06:26 2001
@@ -8,7 +8,7 @@
* 'fork.c' contains the help-routines for the 'fork' system call
* (see also entry.S and others).
* Fork is rather simple, once you get the hang of it, but the memory
- * management can be a bitch. See 'mm/memory.c': 'copy_page_tables()'
+ * management can be a bitch. See 'mm/memory.c': 'copy_page_range()'
*/
#include <linux/config.h>
@@ -134,9 +134,22 @@
mm->mmap_avl = NULL;
mm->mmap_cache = NULL;
mm->map_count = 0;
+ mm->rss = 0;
mm->cpu_vm_mask = 0;
mm->swap_address = 0;
pprev = &mm->mmap;
+
+ /*
+ * Add it to the mmlist after the parent.
+ * Doing it this way means that we can order the list,
+ * and fork() won't mess up the ordering significantly.
+ * Add it first so that swapoff can see any swap entries.
+ */
+ spin_lock(&mmlist_lock);
+ list_add(&mm->mmlist, ¤t->mm->mmlist);
+ mmlist_nr++;
+ spin_unlock(&mmlist_lock);
+
for (mpnt = current->mm->mmap ; mpnt ; mpnt = mpnt->vm_next) {
struct file *file;
@@ -149,7 +162,6 @@
*tmp = *mpnt;
tmp->vm_flags &= ~VM_LOCKED;
tmp->vm_mm = mm;
- mm->map_count++;
tmp->vm_next = NULL;
file = tmp->vm_file;
if (file) {
@@ -168,17 +180,19 @@
spin_unlock(&inode->i_mapping->i_shared_lock);
}
- /* Copy the pages, but defer checking for errors */
- retval = copy_page_range(mm, current->mm, tmp);
- if (!retval && tmp->vm_ops && tmp->vm_ops->open)
- tmp->vm_ops->open(tmp);
-
/*
- * Link in the new vma even if an error occurred,
- * so that exit_mmap() can clean up the mess.
+ * Link in the new vma and copy the page table entries:
+ * link in first so that swapoff can see swap entries.
*/
+ spin_lock(&mm->page_table_lock);
*pprev = tmp;
pprev = &tmp->vm_next;
+ mm->map_count++;
+ retval = copy_page_range(mm, current->mm, tmp);
+ spin_unlock(&mm->page_table_lock);
+
+ if (tmp->vm_ops && tmp->vm_ops->open)
+ tmp->vm_ops->open(tmp);
if (retval)
goto fail_nomem;
@@ -319,18 +333,6 @@
down_write(&oldmm->mmap_sem);
retval = dup_mmap(mm);
up_write(&oldmm->mmap_sem);
-
- /*
- * Add it to the mmlist after the parent.
- *
- * Doing it this way means that we can order
- * the list, and fork() won't mess up the
- * ordering significantly.
- */
- spin_lock(&mmlist_lock);
- list_add(&mm->mmlist, &oldmm->mmlist);
- mmlist_nr++;
- spin_unlock(&mmlist_lock);
if (retval)
goto free_pt;
--- linux-2.4.9/mm/filemap.c Thu Aug 16 19:12:07 2001
+++ linux-swapoff/mm/filemap.c Thu Aug 16 22:06:26 2001
@@ -682,34 +682,6 @@
}
/*
- * Find a swapcache page (and get a reference) or return NULL.
- * The SwapCache check is protected by the pagecache lock.
- */
-struct page * __find_get_swapcache_page(struct address_space *mapping,
- unsigned long offset, struct page **hash)
-{
- struct page *page;
-
- /*
- * We need the LRU lock to protect against page_launder().
- */
-
- spin_lock(&pagecache_lock);
- page = __find_page_nolock(mapping, offset, *hash);
- if (page) {
- spin_lock(&pagemap_lru_lock);
- if (PageSwapCache(page))
- page_cache_get(page);
- else
- page = NULL;
- spin_unlock(&pagemap_lru_lock);
- }
- spin_unlock(&pagecache_lock);
-
- return page;
-}
-
-/*
* Same as the above, but lock the page too, verifying that
* it's still valid once we own it.
*/
--- linux-2.4.9/mm/memory.c Tue Aug 14 00:16:41 2001
+++ linux-swapoff/mm/memory.c Thu Aug 16 22:06:26 2001
@@ -148,6 +148,9 @@
*
* 08Jan98 Merged into one routine from several inline routines to reduce
* variable count and make things faster. -jj
+ *
+ * dst->page_table_lock is held on entry and exit,
+ * but may be dropped within pmd_alloc() and pte_alloc().
*/
int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
struct vm_area_struct *vma)
@@ -159,8 +162,7 @@
src_pgd = pgd_offset(src, address)-1;
dst_pgd = pgd_offset(dst, address)-1;
-
- spin_lock(&dst->page_table_lock);
+
for (;;) {
pmd_t * src_pmd, * dst_pmd;
@@ -234,6 +236,7 @@
pte = pte_mkclean(pte);
pte = pte_mkold(pte);
get_page(ptepage);
+ dst->rss++;
cont_copy_pte_range: set_pte(dst_pte, pte);
cont_copy_pte_range_noset: address += PAGE_SIZE;
@@ -251,11 +254,8 @@
out_unlock:
spin_unlock(&src->page_table_lock);
out:
- spin_unlock(&dst->page_table_lock);
return 0;
-
nomem:
- spin_unlock(&dst->page_table_lock);
return -ENOMEM;
}
@@ -1080,15 +1080,13 @@
/* Don't block on I/O for read-ahead */
if (atomic_read(&nr_async_pages) >= pager_daemon.swap_cluster
* (1 << page_cluster)) {
- while (i++ < num)
- swap_free(SWP_ENTRY(SWP_TYPE(entry), offset++));
break;
}
/* Ok, do the async read-ahead now */
new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset));
- if (new_page != NULL)
- page_cache_release(new_page);
- swap_free(SWP_ENTRY(SWP_TYPE(entry), offset));
+ if (new_page == NULL)
+ break;
+ page_cache_release(new_page);
}
return;
}
--- linux-2.4.9/mm/swap_state.c Wed Jul 18 23:18:15 2001
+++ linux-swapoff/mm/swap_state.c Thu Aug 16 22:06:26 2001
@@ -29,7 +29,7 @@
if (swap_count(page) > 1)
goto in_use;
- /* We could remove it here, but page_launder will do it anyway */
+ delete_from_swap_cache_nolock(page);
UnlockPage(page);
return 0;
@@ -79,40 +79,35 @@
BUG();
if (page->mapping)
BUG();
- flags = page->flags & ~((1 << PG_error) | (1 << PG_arch_1));
+
+ /* clear PG_dirty so a subsequent set_page_dirty takes effect */
+ flags = page->flags & ~((1 << PG_error) | (1 << PG_dirty) | (1 << PG_arch_1));
page->flags = flags | (1 << PG_uptodate);
page->age = PAGE_AGE_START;
add_to_page_cache_locked(page, &swapper_space, entry.val);
}
-static inline void remove_from_swap_cache(struct page *page)
-{
- struct address_space *mapping = page->mapping;
-
- if (mapping != &swapper_space)
- BUG();
- if (!PageSwapCache(page) || !PageLocked(page))
- PAGE_BUG(page);
-
- PageClearSwapCache(page);
- ClearPageDirty(page);
- __remove_inode_page(page);
-}
-
/*
* This must be called only on pages that have
* been verified to be in the swap cache.
*/
void __delete_from_swap_cache(struct page *page)
{
+ struct address_space *mapping = page->mapping;
swp_entry_t entry;
- entry.val = page->index;
-
#ifdef SWAP_CACHE_INFO
swap_cache_del_total++;
#endif
- remove_from_swap_cache(page);
+ if (mapping != &swapper_space)
+ BUG();
+ if (!PageSwapCache(page) || !PageLocked(page))
+ BUG();
+
+ entry.val = page->index;
+ PageClearSwapCache(page);
+ ClearPageDirty(page);
+ __remove_inode_page(page);
swap_free(entry);
}
@@ -129,7 +124,6 @@
lru_cache_del(page);
spin_lock(&pagecache_lock);
- ClearPageDirty(page);
__delete_from_swap_cache(page);
spin_unlock(&pagecache_lock);
page_cache_release(page);
@@ -169,14 +163,12 @@
page_cache_release(page);
}
-
/*
* Lookup a swap entry in the swap cache. A found page will be returned
* unlocked and with its refcount incremented - we rely on the kernel
* lock getting page table operations atomic even if we drop the page
* lock before returning.
*/
-
struct page * lookup_swap_cache(swp_entry_t entry)
{
struct page *found;
@@ -184,22 +176,20 @@
#ifdef SWAP_CACHE_INFO
swap_cache_find_total++;
#endif
- while (1) {
- /*
- * Right now the pagecache is 32-bit only. But it's a 32 bit index. =)
- */
- found = find_get_swapcache_page(&swapper_space, entry.val);
- if (!found)
- return 0;
- if (!PageSwapCache(found))
- BUG();
- if (found->mapping != &swapper_space)
- BUG();
+ found = find_get_page(&swapper_space, entry.val);
+ if (!found)
+ return 0;
+ /*
+ * If SMP, it is unsafe to assert PageSwapCache and mapping
+ * here: nothing prevents swapoff from deleting this page from
+ * the swap cache at this moment. find_lock_page would prevent
+ * that, but no need to change: we _have_ got the right page
+ * (but would not recognize it as an exclusive swap page later).
+ */
#ifdef SWAP_CACHE_INFO
- swap_cache_find_success++;
+ swap_cache_find_success++;
#endif
- return found;
- }
+ return found;
}
/*
@@ -210,33 +200,41 @@
* A failure return means that either the page allocation failed or that
* the swap entry is no longer in use.
*/
-
struct page * read_swap_cache_async(swp_entry_t entry)
{
- struct page *found_page = 0, *new_page;
+ struct page *found_page, *new_page;
+ struct page **hash;
/*
- * Make sure the swap entry is still in use.
- */
- if (!swap_duplicate(entry)) /* Account for the swap cache */
- goto out;
- /*
- * Look for the page in the swap cache.
+ * Look for the page in the swap cache. Since we normally call
+ * this only after lookup_swap_cache() failed, re-calling that
+ * would confuse the statistics: use __find_get_page() directly.
*/
- found_page = lookup_swap_cache(entry);
+ hash = page_hash(&swapper_space, entry.val);
+ found_page = __find_get_page(&swapper_space, entry.val, hash);
if (found_page)
- goto out_free_swap;
+ goto out;
new_page = alloc_page(GFP_HIGHUSER);
if (!new_page)
- goto out_free_swap; /* Out of memory */
+ goto out; /* Out of memory */
/*
* Check the swap cache again, in case we stalled above.
+ * The BKL is guarding against races between this check
+ * and where the new page is added to the swap cache below.
*/
- found_page = lookup_swap_cache(entry);
+ found_page = __find_get_page(&swapper_space, entry.val, hash);
if (found_page)
goto out_free_page;
+
+ /*
+ * Make sure the swap entry is still in use. It could have gone
+ * while caller waited for BKL, or while allocating a page.
+ */
+ if (!swap_duplicate(entry)) /* Account for the swap cache */
+ goto out_free_page;
+
/*
* Add it to the swap cache and read its contents.
*/
@@ -248,8 +246,6 @@
out_free_page:
page_cache_release(new_page);
-out_free_swap:
- swap_free(entry);
out:
return found_page;
}
--- linux-2.4.9/mm/swapfile.c Sat Aug 11 02:02:42 2001
+++ linux-swapoff/mm/swapfile.c Thu Aug 16 22:06:26 2001
@@ -229,33 +229,23 @@
* share this swap entry, so be cautious and let do_wp_page work out
* what to do if a write is requested later.
*/
-/* tasklist_lock and vma->vm_mm->page_table_lock are held */
+/* BKL, mmlist_lock and vma->vm_mm->page_table_lock are held */
static inline void unuse_pte(struct vm_area_struct * vma, unsigned long address,
pte_t *dir, swp_entry_t entry, struct page* page)
{
pte_t pte = *dir;
- if (pte_none(pte))
+ if (pte_none(pte) || pte_present(pte))
return;
- if (pte_present(pte)) {
- /* If this entry is swap-cached, then page must already
- hold the right address for any copies in physical
- memory */
- if (pte_page(pte) != page)
- return;
- /* We will be removing the swap cache in a moment, so... */
- ptep_mkdirty(dir);
- return;
- }
if (pte_to_swp_entry(pte).val != entry.val)
return;
- set_pte(dir, pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
- swap_free(entry);
get_page(page);
+ set_pte(dir, pte_mkold(mk_pte(page, vma->vm_page_prot)));
+ swap_free(entry);
++vma->vm_mm->rss;
}
-/* tasklist_lock and vma->vm_mm->page_table_lock are held */
+/* BKL, mmlist_lock and vma->vm_mm->page_table_lock are held */
static inline void unuse_pmd(struct vm_area_struct * vma, pmd_t *dir,
unsigned long address, unsigned long size, unsigned long offset,
swp_entry_t entry, struct page* page)
@@ -283,7 +273,7 @@
} while (address && (address < end));
}
-/* tasklist_lock and vma->vm_mm->page_table_lock are held */
+/* BKL, mmlist_lock and vma->vm_mm->page_table_lock are held */
static inline void unuse_pgd(struct vm_area_struct * vma, pgd_t *dir,
unsigned long address, unsigned long size,
swp_entry_t entry, struct page* page)
@@ -314,7 +304,7 @@
} while (address && (address < end));
}
-/* tasklist_lock and vma->vm_mm->page_table_lock are held */
+/* BKL, mmlist_lock and vma->vm_mm->page_table_lock are held */
static void unuse_vma(struct vm_area_struct * vma, pgd_t *pgdir,
swp_entry_t entry, struct page* page)
{
@@ -337,8 +327,6 @@
/*
* Go through process' page directory.
*/
- if (!mm)
- return;
spin_lock(&mm->page_table_lock);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
pgd_t * pgd = pgd_offset(mm, vma->vm_start);
@@ -348,54 +336,34 @@
return;
}
-/*
- * this is called when we find a page in the swap list
- * all the locks have been dropped at this point which
- * isn't a problem because we rescan the swap map
- * and we _don't_ clear the refrence count if for
- * some reason it isn't 0
- */
-
-static inline int free_found_swap_entry(unsigned int type, int i)
+static int find_next_to_unuse(struct swap_info_struct *si, int prev)
{
- struct task_struct *p;
- struct page *page;
- swp_entry_t entry;
+ int i = prev;
+ int count;
- entry = SWP_ENTRY(type, i);
-
- /*
- * Get a page for the entry, using the existing swap
- * cache page if there is one. Otherwise, get a clean
- * page and read the swap into it.
- */
- page = read_swap_cache_async(entry);
- if (!page) {
- swap_free(entry);
- return -ENOMEM;
+ swap_device_lock(si);
+ for (;;) {
+ if (++i >= si->max) {
+ i = 0;
+ break;
+ }
+ count = si->swap_map[i];
+ if (count && count != SWAP_MAP_BAD) {
+ /*
+ * Prevent swaphandle from being completely
+ * unused by swap_free while we are trying
+ * to read in the page - this prevents warning
+ * messages from rw_swap_page_base.
+ */
+ if (count != SWAP_MAP_MAX)
+ si->swap_map[i] = count + 1;
+ break;
+ }
}
- lock_page(page);
- if (PageSwapCache(page))
- delete_from_swap_cache_nolock(page);
- UnlockPage(page);
- read_lock(&tasklist_lock);
- for_each_task(p)
- unuse_process(p->mm, entry, page);
- read_unlock(&tasklist_lock);
- shmem_unuse(entry, page);
- /*
- * Now get rid of the extra reference to the temporary
- * page we've been using.
- */
- page_cache_release(page);
- /*
- * Check for and clear any overflowed swap map counts.
- */
- swap_free(entry);
- return 0;
+ swap_device_unlock(si);
+ return i;
}
-
/*
* We completely avoid races by reading each swap page in advance,
* and then search for the process using it. All the necessary
@@ -404,80 +372,161 @@
static int try_to_unuse(unsigned int type)
{
struct swap_info_struct * si = &swap_info[type];
- int ret, foundpage;
+ struct mm_struct *last_mm;
+ struct page *page;
+ struct page *nextpage = NULL; /* keep compiler quiet */
+ unsigned short *swap_map;
+ swp_entry_t entry;
+ int i, next = 0;
+ int retval = 0;
- do {
- int i;
+ /*
+ * When searching mms for an entry, a plausible strategy is
+ * to start at the last mm we freed the preceding entry from
+ * (though actually we don't observe whether we or another
+ * freed the entry). Initialize this last_mm with a hold.
+ */
+ last_mm = &init_mm;
+ atomic_inc(&init_mm.mm_users);
+
+ for (;;) {
+ if (!next) {
+ /*
+ * Start a fresh sweep of the swap_map. We're
+ * done when no entry found in a single sweep.
+ */
+ next = find_next_to_unuse(si, 0);
+ if (!next)
+ break;
+ entry = SWP_ENTRY(type, next);
+ nextpage = read_swap_cache_async(entry);
+ if (!nextpage) {
+ swap_free(entry);
+ retval = -ENOMEM;
+ break;
+ }
+ }
/*
- * The algorithm is inefficient but seldomly used
- *
- * Find a swap page in use and read it in.
+ * Set page to nextpage, with readahead
+ * to a new nextpage in the background.
*/
- foundpage = 0;
- swap_device_lock(si);
- for (i = 1; i < si->max ; i++) {
- int count = si->swap_map[i];
- if (!count || count == SWAP_MAP_BAD)
- continue;
+ i = next;
+ page = nextpage;
+ next = find_next_to_unuse(si, next);
+ if (next) {
+ entry = SWP_ENTRY(type, next);
+ nextpage = read_swap_cache_async(entry);
+ if (!nextpage) {
+ swap_free(entry);
+ swap_free(SWP_ENTRY(type, i));
+ page_cache_release(page);
+ retval = -ENOMEM;
+ break;
+ }
+ }
- /*
- * Prevent swaphandle from being completely
- * unused by swap_free while we are trying
- * to read in the page - this prevents warning
- * messages from rw_swap_page_base.
- */
- foundpage = 1;
- if (count != SWAP_MAP_MAX)
- si->swap_map[i] = count + 1;
+ /*
+ * Don't hold on to last_mm if it looks like exiting.
+ * Can mmput ever block? if so, then we cannot risk
+ * it between deleting the page from the swap cache,
+ * and completing the search through mms (and cannot
+ * use it to avoid the long hold on mmlist_lock there).
+ */
+ if (atomic_read(&last_mm->mm_users) == 1) {
+ mmput(last_mm);
+ last_mm = &init_mm;
+ atomic_inc(&init_mm.mm_users);
+ }
- swap_device_unlock(si);
- ret = free_found_swap_entry(type,i);
- if (ret)
- return ret;
+ /*
+ * While next swap entry is read into the swap cache
+ * asynchronously (if it was not already present), wait
+ * for and lock page. Remove it from the swap cache so
+ * try_to_swap_out won't bump swap count. Mark it dirty
+ * so try_to_swap_out will preserve it without us having
+ * to mark any present ptes as dirty: so we can skip
+ * searching processes once swap count has all gone.
+ */
+ lock_page(page);
+ if (PageSwapCache(page))
+ delete_from_swap_cache_nolock(page);
+ SetPageDirty(page);
+ UnlockPage(page);
+ flush_page_to_ram(page);
- /*
- * we pick up the swap_list_lock() to guard the nr_swap_pages,
- * si->swap_map[] should only be changed if it is SWAP_MAP_MAX
- * otherwise ugly stuff can happen with other people who are in
- * the middle of a swap operation to this device. This kind of
- * operation can sometimes be detected with the undead swap
- * check. Don't worry about these 'undead' entries for now
- * they will be caught the next time though the top loop.
- * Do worry, about the weak locking that allows this to happen
- * because if it happens to a page that is SWAP_MAP_MAX
- * then bad stuff can happen.
- */
+ /*
+ * Remove all references to entry (without blocking).
+ */
+ entry = SWP_ENTRY(type, i);
+ swap_map = &si->swap_map[i];
+ swap_free(entry);
+ if (*swap_map) {
+ if (last_mm == &init_mm)
+ shmem_unuse(entry, page);
+ else
+ unuse_process(last_mm, entry, page);
+ }
+ if (*swap_map) {
+ struct mm_struct *mm = last_mm;
+ struct list_head *p = &mm->mmlist;
+
+ spin_lock(&mmlist_lock);
+ while (*swap_map && (p = p->next) != &last_mm->mmlist) {
+ mm = list_entry(p, struct mm_struct, mmlist);
+ if (mm == &init_mm)
+ shmem_unuse(entry, page);
+ else
+ unuse_process(mm, entry, page);
+ }
+ atomic_inc(&mm->mm_users);
+ spin_unlock(&mmlist_lock);
+ mmput(last_mm);
+ last_mm = mm;
+ }
+ page_cache_release(page);
+
+ /*
+ * There's could still be an outstanding reference:
+ * if we checked the child before the parent while
+ * fork was in dup_mmap; or if mmput has taken the mm
+ * off mmlist, but exit_mmap has not yet completed.
+ * Until that's fixed, loop back and recheck at the end:
+ * we are done once none found in a single locked sweep.
+ * That handles the normal case; but if SWAP_MAP_MAX we
+ * only _hope_ we got them all in one, and reset anyway.
+ */
+ if (*swap_map == SWAP_MAP_MAX) {
swap_list_lock();
swap_device_lock(si);
- if (si->swap_map[i] > 0) {
- /* normally this would just kill the swap page if
- * it still existed, it appears though that the locks
- * are a little fuzzy
- */
- if (si->swap_map[i] != SWAP_MAP_MAX) {
- printk("VM: Undead swap entry %08lx\n",
- SWP_ENTRY(type, i).val);
- } else {
- nr_swap_pages++;
- si->swap_map[i] = 0;
- }
- }
+ *swap_map = 0;
+ nr_swap_pages++;
swap_device_unlock(si);
swap_list_unlock();
-
+ }
+ if (*swap_map) {
/*
- * This lock stuff is ulgy!
- * Make sure that we aren't completely killing
- * interactive performance.
+ * Useful message while testing, but we know
+ * this can happen, so should be removed soon.
*/
- if (current->need_resched)
- schedule();
- swap_device_lock(si);
+ printk("VM: Undead swap entry %08lx\n", entry.val);
}
- swap_device_unlock(si);
- } while (foundpage);
- return 0;
+
+ /*
+ * Make sure that we aren't completely killing
+ * interactive performance. Interruptible check on
+ * signal_pending() would be nice, but changes the spec?
+ */
+ if (current->need_resched)
+ schedule();
+ else {
+ unlock_kernel();
+ lock_kernel();
+ }
+ }
+
+ mmput(last_mm);
+ return retval;
}
asmlinkage long sys_swapoff(const char * specialfile)
@@ -557,6 +606,7 @@
nd.mnt = p->swap_vfsmnt;
p->swap_vfsmnt = NULL;
p->swap_device = 0;
+ p->max = 0;
vfree(p->swap_map);
p->swap_map = NULL;
p->flags = 0;
@@ -637,7 +687,7 @@
union swap_header *swap_header = 0;
int swap_header_version;
int nr_good_pages = 0;
- unsigned long maxpages;
+ unsigned long maxpages = 1;
int swapfilesize;
struct block_device *bdev = NULL;
@@ -752,17 +802,17 @@
if (!p->lowest_bit)
p->lowest_bit = i;
p->highest_bit = i;
- p->max = i+1;
+ maxpages = i+1;
j++;
}
}
nr_good_pages = j;
- p->swap_map = vmalloc(p->max * sizeof(short));
+ p->swap_map = vmalloc(maxpages * sizeof(short));
if (!p->swap_map) {
error = -ENOMEM;
goto bad_swap;
}
- for (i = 1 ; i < p->max ; i++) {
+ for (i = 1 ; i < maxpages ; i++) {
if (test_bit(i,(char *) swap_header))
p->swap_map[i] = 0;
else
@@ -783,24 +833,22 @@
p->lowest_bit = 1;
p->highest_bit = swap_header->info.last_page - 1;
- p->max = swap_header->info.last_page;
-
- maxpages = SWP_OFFSET(SWP_ENTRY(0,~0UL));
- if (p->max >= maxpages)
- p->max = maxpages-1;
+ maxpages = SWP_OFFSET(SWP_ENTRY(0,~0UL)) - 1;
+ if (maxpages > swap_header->info.last_page)
+ maxpages = swap_header->info.last_page;
error = -EINVAL;
if (swap_header->info.nr_badpages > MAX_SWAP_BADPAGES)
goto bad_swap;
/* OK, set up the swap map and apply the bad block list */
- if (!(p->swap_map = vmalloc (p->max * sizeof(short)))) {
+ if (!(p->swap_map = vmalloc(maxpages * sizeof(short)))) {
error = -ENOMEM;
goto bad_swap;
}
error = 0;
- memset(p->swap_map, 0, p->max * sizeof(short));
+ memset(p->swap_map, 0, maxpages * sizeof(short));
for (i=0; i<swap_header->info.nr_badpages; i++) {
int page = swap_header->info.badpages[i];
if (page <= 0 || page >= swap_header->info.last_page)
@@ -815,7 +863,7 @@
goto bad_swap;
}
- if (swapfilesize && p->max > swapfilesize) {
+ if (swapfilesize && maxpages > swapfilesize) {
printk(KERN_WARNING
"Swap area shorter than signature indicates\n");
error = -EINVAL;
@@ -827,6 +875,7 @@
goto bad_swap;
}
p->swap_map[0] = SWAP_MAP_BAD;
+ p->max = maxpages;
p->flags = SWP_WRITEOK;
p->pages = nr_good_pages;
swap_list_lock();
@@ -856,6 +905,7 @@
if (bdev)
blkdev_put(bdev, BDEV_SWAP);
bad_swap_2:
+ p->max = 0;
if (p->swap_map)
vfree(p->swap_map);
nd.mnt = p->swap_vfsmnt;
@@ -947,10 +997,12 @@
printk("Bad swap file entry %08lx\n", entry.val);
goto out;
bad_offset:
- printk("Bad swap offset entry %08lx\n", entry.val);
+ /* Don't report: can happen in read_swap_cache_async after swapoff */
+ /* printk("Bad swap offset entry %08lx\n", entry.val); */
goto out;
bad_unused:
- printk("Unused swap offset entry in swap_dup %08lx\n", entry.val);
+ /* Don't report: can happen in read_swap_cache_async after sleeping */
+ /* printk("Unused swap offset entry in swap_dup %08lx\n", entry.val); */
goto out;
}
@@ -1037,8 +1089,8 @@
}
/*
- * Kernel_lock protects against swap device deletion. Grab an extra
- * reference on the swaphandle so that it dos not become unused.
+ * Kernel_lock protects against swap device deletion. Don't grab an extra
+ * reference on the swaphandle, it doesn't matter if it becomes unused.
*/
int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
{
@@ -1059,7 +1111,6 @@
break;
if (swapdev->swap_map[toff] == SWAP_MAP_BAD)
break;
- swapdev->swap_map[toff]++;
toff++;
ret++;
} while (--i);
--- linux-2.4.9/mm/vmscan.c Wed Aug 15 10:37:07 2001
+++ linux-swapoff/mm/vmscan.c Thu Aug 16 22:06:26 2001
@@ -111,6 +111,7 @@
* is needed on CPUs which update the accessed and dirty
* bits in hardware.
*/
+ flush_cache_page(vma, address);
pte = ptep_get_and_clear(page_table);
flush_tlb_page(vma, address);
@@ -148,20 +149,13 @@
* Basically, this just makes it possible for us to do
* some real work in the future in "refill_inactive()".
*/
- flush_cache_page(vma, address);
- if (!pte_dirty(pte))
- goto drop_pte;
-
- /*
- * Ok, it's really dirty. That means that
- * we should either create a new swap cache
- * entry for it, or we should write it back
- * to its own backing store.
- */
if (page->mapping) {
- set_page_dirty(page);
+ if (pte_dirty(pte))
+ set_page_dirty(page);
goto drop_pte;
}
+ if (!pte_dirty(pte) && !PageDirty(page))
+ goto drop_pte;
/*
* This is a dirty, swappable page. First of all,
@@ -539,8 +533,12 @@
* last copy..
*/
if (PageDirty(page)) {
- int (*writepage)(struct page *) = page->mapping->a_ops->writepage;
+ int (*writepage)(struct page *);
+ /* Can a page get here without page->mapping? */
+ if (!page->mapping)
+ goto page_active;
+ writepage = page->mapping->a_ops->writepage;
if (!writepage)
goto page_active;
@@ -779,7 +777,7 @@
{
pg_data_t *pgdat;
unsigned int global_target = freepages.high + inactive_target;
- unsigned int global_incative = 0;
+ unsigned int global_inactive = 0;
pgdat = pgdat_list;
do {
@@ -799,13 +797,13 @@
if (inactive < zone->pages_high)
return 1;
- global_incative += inactive;
+ global_inactive += inactive;
}
pgdat = pgdat->node_next;
} while (pgdat);
/* Global shortage? */
- return global_incative < global_target;
+ return global_inactive < global_target;
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] more swapoff fixes
2001-08-16 22:30 ` [PATCH] more swapoff fixes Hugh Dickins
@ 2001-08-17 18:02 ` Jeremy Linton
0 siblings, 0 replies; 2+ messages in thread
From: Jeremy Linton @ 2001-08-17 18:02 UTC (permalink / raw)
To: Hugh Dickins; +Cc: linux-mm
> Jeremy and list,
>
> Below is my patch of further bugfixes and speedups to swapoff,
> based on 2.4.9. For now it's an all-in-one patch: one of these
2.4.9 stock is broken with my tests. Let me see if I can track the
problem down before I give your stuff a spin. Looks like it might be USB
related but...
jlinton
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2001-08-17 18:02 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <020901c1251a$e9386850$bef7020a@mammon>
2001-08-16 22:30 ` [PATCH] more swapoff fixes Hugh Dickins
2001-08-17 18:02 ` Jeremy Linton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox