linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Ben LaHaise <bcrl@redhat.com>
Cc: riel@nl.linux.org, Linus Torvalds <torvalds@transmeta.com>,
	linux-mm@kvack.org
Subject: Re: [patch] take 2 Re: PG_swap_entry bug in recent kernels
Date: Fri, 7 Apr 2000 12:45:13 +0200 (CEST)	[thread overview]
Message-ID: <Pine.LNX.4.21.0004071205300.737-100000@alpha.random> (raw)
In-Reply-To: <Pine.LNX.4.21.0004061802080.6583-100000@devserv.devel.redhat.com>

On Thu, 6 Apr 2000, Ben LaHaise wrote:

>First off, it doesn't verify that all of the bits which should be clear in
>a swap entry actually are -- eg bit 0 (present) can and is getting
>set.  It would have to rebuild and compare the swap entry produced by
>SWP_ENTRY(type,offset) to be 100% sure that it is a valid swap entry.

That shouldn't be a problem. What we cares is to return a _valid_ swap
entry from acquire_swap_entry, if that wasn't a cached-swap-entry that
were a minor problem (not a stability issue at least).

However I agree the current design is not very clean and not very strict
and it's too silent. So now I make sure that the swap-entry bitflag is set
only on good page->index.

>Heheh, okay I've moved it there.  Just in case I've also added a
>BUG() check in free_pages_okay to make sure there aren't any other places
>that have been missed.

That's certainly a good idea, I did that too indeed :).

I'm working on that and right now I am stuck fixing up SMP/non-SMP races
and bugs in the VM that I found while checking the swap entry stuff.
However the swap entry stuff in my current tree should be fixed now.

This is a snapshot of my tree so that we can stay in sync. I guess the
below stuff it's ready for inclusion but if you don't agree with something
let me know of course, thanks. It seems to work stable here. I included
also some early fix (the non intrusive ones). One for a SMP race fix for
swapoff/get_swap_page, an SMP race in the vmlist access lock during
swapoff, do_wp_page that have to know the swap cache can have reference
count 3 and not being shared, and the highmem page replacement should
cache also the swap-entry bitflag (I just do a copy of all the flags to be
faster). It doesn't need to copy the mapping pointer since it's always
null and the new_page have it null too.

It's against 2.3.99-pre4-4.

diff -urN ref/mm/filemap.c swap-entry-1/mm/filemap.c
--- ref/mm/filemap.c	Thu Apr  6 01:00:51 2000
+++ swap-entry-1/mm/filemap.c	Fri Apr  7 04:44:27 2000
@@ -300,6 +300,8 @@
 		if (PageSwapCache(page)) {
 			spin_unlock(&pagecache_lock);
 			__delete_from_swap_cache(page);
+			/* the page is local to us now */
+			page->flags &= ~(1UL << PG_swap_entry);
 			goto made_inode_progress;
 		}	
 
diff -urN ref/mm/highmem.c swap-entry-1/mm/highmem.c
--- ref/mm/highmem.c	Mon Apr  3 03:21:59 2000
+++ swap-entry-1/mm/highmem.c	Fri Apr  7 03:48:46 2000
@@ -75,7 +75,7 @@
 
 	/* Preserve the caching of the swap_entry. */
 	highpage->index = page->index;
-	highpage->mapping = page->mapping;
+	highpage->flags = page->flags;
 
 	/*
 	 * We can just forget the old page since 
diff -urN ref/mm/memory.c swap-entry-1/mm/memory.c
--- ref/mm/memory.c	Fri Apr  7 12:17:24 2000
+++ swap-entry-1/mm/memory.c	Fri Apr  7 12:26:53 2000
@@ -838,6 +838,7 @@
 	 */
 	switch (page_count(old_page)) {
 	case 2:
+	case 3:
 		/*
 		 * Lock the page so that no one can look it up from
 		 * the swap cache, grab a reference and start using it.
@@ -880,7 +881,19 @@
 		new_page = old_page;
 	}
 	spin_unlock(&tsk->mm->page_table_lock);
-	__free_page(new_page);
+	/*
+	 * We're releasing a page, it can be an anonymous
+	 * page as well. Since we don't hold any lock (except
+	 * the mmap_sem semaphore) the other user of the anonymous
+	 * page may have released it from under us and now we
+	 * could be the only owner of the page, thus put_page_testzero() can
+	 * return 1, and we have to clear the swap-entry
+	 * bitflag in such case.
+	 */
+	if (put_page_testzero(new_page)) {
+		new_page->flags &= ~(1UL << PG_swap_entry);
+		__free_pages_ok(new_page, 0);
+	}
 	return 1;
 
 bad_wp_page:
diff -urN ref/mm/page_alloc.c swap-entry-1/mm/page_alloc.c
--- ref/mm/page_alloc.c	Thu Apr  6 01:00:52 2000
+++ swap-entry-1/mm/page_alloc.c	Thu Apr  6 05:05:59 2000
@@ -110,6 +110,8 @@
 		BUG();
 	if (PageDecrAfter(page))
 		BUG();
+	if (PageSwapEntry(page))
+		BUG();
 
 	zone = page->zone;
 
diff -urN ref/mm/swap_state.c swap-entry-1/mm/swap_state.c
--- ref/mm/swap_state.c	Thu Apr  6 01:00:52 2000
+++ swap-entry-1/mm/swap_state.c	Fri Apr  7 12:29:00 2000
@@ -126,9 +126,14 @@
 		UnlockPage(page);
 	}
 
-	ClearPageSwapEntry(page);
-
-	__free_page(page);
+	/*
+	 * Only the last unmap have to lose the swap entry
+	 * information that we have cached into page->index.
+	 */
+	if (put_page_testzero(page)) {
+		page->flags &= ~(1UL << PG_swap_entry);
+		__free_pages_ok(page, 0);
+	}
 }
 
 
@@ -151,7 +156,7 @@
 		 * Right now the pagecache is 32-bit only.  But it's a 32 bit index. =)
 		 */
 repeat:
-		found = find_lock_page(&swapper_space, entry.val);
+		found = find_get_page(&swapper_space, entry.val);
 		if (!found)
 			return 0;
 		/*
@@ -163,7 +168,6 @@
 		 * is enough to check whether the page is still in the scache.
 		 */
 		if (!PageSwapCache(found)) {
-			UnlockPage(found);
 			__free_page(found);
 			goto repeat;
 		}
@@ -172,13 +176,11 @@
 #ifdef SWAP_CACHE_INFO
 		swap_cache_find_success++;
 #endif
-		UnlockPage(found);
 		return found;
 	}
 
 out_bad:
 	printk (KERN_ERR "VM: Found a non-swapper swap page!\n");
-	UnlockPage(found);
 	__free_page(found);
 	return 0;
 }
diff -urN ref/mm/swapfile.c swap-entry-1/mm/swapfile.c
--- ref/mm/swapfile.c	Thu Apr  6 01:00:52 2000
+++ swap-entry-1/mm/swapfile.c	Fri Apr  7 12:35:59 2000
@@ -212,22 +212,22 @@
 
 	/* We have the old entry in the page offset still */
 	if (!page->index)
-		goto new_swap_entry;
+		goto null_swap_entry;
 	entry.val = page->index;
 	type = SWP_TYPE(entry);
 	if (type >= nr_swapfiles)
-		goto new_swap_entry;
+		goto bad_nofile;
+	swap_list_lock();
 	p = type + swap_info;
 	if ((p->flags & SWP_WRITEOK) != SWP_WRITEOK)
-		goto new_swap_entry;
+		goto unlock_list;
 	offset = SWP_OFFSET(entry);
 	if (offset >= p->max)
-		goto new_swap_entry;
+		goto bad_offset;
 	/* Has it been re-used for something else? */
-	swap_list_lock();
 	swap_device_lock(p);
 	if (p->swap_map[offset])
-		goto unlock_new_swap_entry;
+		goto unlock;
 
 	/* We're cool, we can just use the old one */
 	p->swap_map[offset] = 1;
@@ -236,11 +236,24 @@
 	swap_list_unlock();
 	return entry;
 
-unlock_new_swap_entry:
+unlock:
 	swap_device_unlock(p);
+unlock_list:
 	swap_list_unlock();
+clear_swap_entry:
+	ClearPageSwapEntry(page);
 new_swap_entry:
 	return get_swap_page();
+
+null_swap_entry:
+	printk(KERN_WARNING __FUNCTION__ " null swap entry\n");
+	goto clear_swap_entry;
+bad_nofile:
+	printk(KERN_WARNING __FUNCTION__ " nonexistent swap file\n");
+	goto clear_swap_entry;
+bad_offset:
+	printk(KERN_WARNING __FUNCTION__ " bad offset\n");
+	goto unlock_list;
 }
 
 /*
@@ -263,8 +276,11 @@
 		/* If this entry is swap-cached, then page must already
                    hold the right address for any copies in physical
                    memory */
-		if (pte_page(pte) != page)
+		if (pte_page(pte) != page) {
+			if (page->index == entry.val)
+				ClearPageSwapEntry(page);
 			return;
+		}
 		/* We will be removing the swap cache in a moment, so... */
 		set_pte(dir, pte_mkdirty(pte));
 		return;
@@ -358,10 +374,20 @@
 	 */
 	if (!mm)
 		return;
+	/*
+	 * Avoid the vmas to go away from under us
+	 * and also avoids the task to play with
+	 * pagetables while we're running. If the
+	 * vmlist_modify_lock wouldn't acquire the
+	 * mm->page_table_lock spinlock we should
+	 * acquire it by hand.
+	 */
+	vmlist_access_lock(mm);
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		pgd_t * pgd = pgd_offset(mm, vma->vm_start);
 		unuse_vma(vma, pgd, entry, page);
 	}
+	vmlist_access_unlock(mm);
 	return;
 }
 
@@ -418,8 +444,10 @@
 		shm_unuse(entry, page);
 		/* Now get rid of the extra reference to the temporary
                    page we've been using. */
-		if (PageSwapCache(page))
+		if (PageSwapCache(page)) {
 			delete_from_swap_cache(page);
+			ClearPageSwapEntry(page);
+		}
 		__free_page(page);
 		/*
 		 * Check for and clear any overflowed swap map counts.
@@ -488,8 +516,8 @@
 		swap_list.next = swap_list.head;
 	}
 	nr_swap_pages -= p->pages;
-	swap_list_unlock();
 	p->flags = SWP_USED;
+	swap_list_unlock();
 	err = try_to_unuse(type);
 	if (err) {
 		/* re-insert swap space back into swap_list */

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

  reply	other threads:[~2000-04-07 10:45 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-04-03 22:22 Ben LaHaise
2000-04-04 15:06 ` Andrea Arcangeli
2000-04-04 15:46   ` Rik van Riel
2000-04-04 16:50     ` Andrea Arcangeli
2000-04-04 17:06       ` Ben LaHaise
2000-04-04 18:03         ` Andrea Arcangeli
2000-04-06 22:11           ` [patch] take 2 " Ben LaHaise
2000-04-07 10:45             ` Andrea Arcangeli [this message]
2000-04-07 11:29               ` Rik van Riel
2000-04-07 12:00                 ` Andrea Arcangeli
2000-04-07 12:54                   ` Rik van Riel
2000-04-07 13:14                     ` Andrea Arcangeli
2000-04-07 20:12               ` Kanoj Sarcar
2000-04-07 23:26                 ` Andrea Arcangeli
2000-04-08  0:11                   ` Kanoj Sarcar
2000-04-08  0:37                     ` Kanoj Sarcar
2000-04-08 13:20                       ` Andrea Arcangeli
2000-04-08 21:39                         ` Kanoj Sarcar
2000-04-08 23:02                           ` Andrea Arcangeli
2000-04-08 23:18                             ` Kanoj Sarcar
2000-04-08 23:58                               ` Andrea Arcangeli
2000-04-08 13:30                     ` Andrea Arcangeli
2000-04-08 17:39                       ` Andrea Arcangeli
2000-04-07 23:54                 ` Andrea Arcangeli
2000-04-08  0:15                   ` Kanoj Sarcar
2000-04-08 13:14                     ` Andrea Arcangeli
2000-04-08 21:47                       ` Kanoj Sarcar
2000-04-08 23:10                         ` Andrea Arcangeli
2000-04-08 23:21                           ` Kanoj Sarcar
2000-04-08 23:39                             ` Andrea Arcangeli
2000-04-09  0:40                               ` Kanoj Sarcar
2000-04-10  8:55                                 ` andrea
2000-04-11  2:45                                   ` Kanoj Sarcar
2000-04-11 16:22                                     ` Andrea Arcangeli
2000-04-11 17:40                                       ` Rik van Riel
2000-04-11 18:20                                         ` Kanoj Sarcar
2000-04-21 18:23                                         ` Andrea Arcangeli
2000-04-21 21:00                                           ` Rik van Riel
2000-04-22  1:12                                             ` Andrea Arcangeli
2000-04-22  1:51                                               ` Linus Torvalds
2000-04-22 18:29                                                 ` Rik van Riel
2000-04-22 19:58                                                   ` Linus Torvalds
2000-04-11 18:26                                       ` Kanoj Sarcar
2000-04-10 19:10                         ` Stephen C. Tweedie
2000-04-08  0:04                 ` Andrea Arcangeli
     [not found] <yttem7xstk2.fsf@vexeta.dc.fi.udc.es>
2000-04-23  0:52 ` Andrea Arcangeli
     [not found] <yttk8ho26s8.fsf@vexeta.dc.fi.udc.es>
2000-04-23 16:07 ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.21.0004071205300.737-100000@alpha.random \
    --to=andrea@suse.de \
    --cc=bcrl@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=riel@nl.linux.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox