linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/6] Swapless Page Migration V1: Overview
@ 2006-04-04  6:57 Christoph Lameter
  2006-04-04  6:57 ` [RFC 1/6] Swapless V1: try_to_unmap() - Rename ignrefs to "migration" Christoph Lameter
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Christoph Lameter @ 2006-04-04  6:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Lee Schermerhorn, Christoph Lameter, lhms-devel,
	Hirokazu Takahashi, Marcelo Tosatti, KAMEZAWA Hiroyuki

Swapless Page migration

Currently page migration is depending on the ability to assign swap entries
to pages. This means that page migration will not work without swap although
that swap space is never used.

This patchset removes that dependency by introducing a special type of
swap entry that encodes a pfn number of the page being migrated. If that
swap pte is encountered then do_swap_page() will simply wait for the page
to become unlocked again (meaning page migration is complete) and then refetch
the pte. The special type of swap entry is only in use while the page to be
migrated is locked and therefore we can hopefully get away with just a few
supporting functions.

To some extend this covers the same ground as Lee's and Marcelo's migration
cache. However, I hope that this approach simplifies things without opening
up any holes. Please check.

The patchset is also a prerequisite for later patches that enable
migration of VM_LOCKED vmas and add the ability to exempt vmas from
page migration.

The patchset consists of six patches:

1. try_to_unmap(): Rename ignrefs to "migration"

   We will be using that try_to_unmap flag in the next patch to
   mean that page migration has called try_to_unmap().

2. Add SWP_TYPE_MIGRATION

   Add the SWP_TYPE_MIGRATION and a few necessary handlers for this
   type of entry.

3. try_to_unmap(): Create migration entries if migration calls
   try_to_unmap for pages without PageSwapCache() set.

4. Remove migration ptes

   This is a similar logic to remove_from_swap(). We walk through
   the reverse maps and replace all SWP_TYPE_MIGRATION entries with
   the correct pte. Since we only do that to SWP_TYPE_MIGRATION entries
   we can simplify the function.

5. Rip out old swap migration code

   Remove all the old swap based code. Note that this also removes the fallback
   to swap if all other attempts to migrate fail and also the ability to
   migrate to swap (which was never used)

6. Revise main migration code

   Revise the migration logic to use the new SWP_TYPE_MIGRATION. This means
   that anonymous pages without a mapping may be migrated. Therefore we have
   to deal with page counts differently.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC 1/6] Swapless V1: try_to_unmap() - Rename ignrefs to "migration"
  2006-04-04  6:57 [RFC 0/6] Swapless Page Migration V1: Overview Christoph Lameter
@ 2006-04-04  6:57 ` Christoph Lameter
  2006-04-04  6:57 ` [RFC 2/6] Swapless V1: Add SWP_TYPE_MIGRATION Christoph Lameter
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2006-04-04  6:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Lee Schermerhorn, Christoph Lameter, lhms-devel,
	Hirokazu Takahashi, Marcelo Tosatti, KAMEZAWA Hiroyuki

migrate is a better name since we implement special handling for
page migration later.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.17-rc1/mm/rmap.c
===================================================================
--- linux-2.6.17-rc1.orig/mm/rmap.c	2006-04-02 20:22:10.000000000 -0700
+++ linux-2.6.17-rc1/mm/rmap.c	2006-04-03 22:33:56.000000000 -0700
@@ -578,7 +578,7 @@ void page_remove_rmap(struct page *page)
  * repeatedly from either try_to_unmap_anon or try_to_unmap_file.
  */
 static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
-				int ignore_refs)
+				int migration)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -602,7 +602,7 @@ static int try_to_unmap_one(struct page 
 	 */
 	if ((vma->vm_flags & VM_LOCKED) ||
 			(ptep_clear_flush_young(vma, address, pte)
-				&& !ignore_refs)) {
+				&& !migration)) {
 		ret = SWAP_FAIL;
 		goto out_unmap;
 	}
@@ -736,7 +736,7 @@ static void try_to_unmap_cluster(unsigne
 	pte_unmap_unlock(pte - 1, ptl);
 }
 
-static int try_to_unmap_anon(struct page *page, int ignore_refs)
+static int try_to_unmap_anon(struct page *page, int migration)
 {
 	struct anon_vma *anon_vma;
 	struct vm_area_struct *vma;
@@ -747,7 +747,7 @@ static int try_to_unmap_anon(struct page
 		return ret;
 
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
-		ret = try_to_unmap_one(page, vma, ignore_refs);
+		ret = try_to_unmap_one(page, vma, migration);
 		if (ret == SWAP_FAIL || !page_mapped(page))
 			break;
 	}
@@ -764,7 +764,7 @@ static int try_to_unmap_anon(struct page
  *
  * This function is only called from try_to_unmap for object-based pages.
  */
-static int try_to_unmap_file(struct page *page, int ignore_refs)
+static int try_to_unmap_file(struct page *page, int migration)
 {
 	struct address_space *mapping = page->mapping;
 	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
@@ -778,7 +778,7 @@ static int try_to_unmap_file(struct page
 
 	spin_lock(&mapping->i_mmap_lock);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
-		ret = try_to_unmap_one(page, vma, ignore_refs);
+		ret = try_to_unmap_one(page, vma, migration);
 		if (ret == SWAP_FAIL || !page_mapped(page))
 			goto out;
 	}
@@ -863,16 +863,16 @@ out:
  * SWAP_AGAIN	- we missed a mapping, try again later
  * SWAP_FAIL	- the page is unswappable
  */
-int try_to_unmap(struct page *page, int ignore_refs)
+int try_to_unmap(struct page *page, int migration)
 {
 	int ret;
 
 	BUG_ON(!PageLocked(page));
 
 	if (PageAnon(page))
-		ret = try_to_unmap_anon(page, ignore_refs);
+		ret = try_to_unmap_anon(page, migration);
 	else
-		ret = try_to_unmap_file(page, ignore_refs);
+		ret = try_to_unmap_file(page, migration);
 
 	if (!page_mapped(page))
 		ret = SWAP_SUCCESS;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC 2/6] Swapless V1:  Add SWP_TYPE_MIGRATION
  2006-04-04  6:57 [RFC 0/6] Swapless Page Migration V1: Overview Christoph Lameter
  2006-04-04  6:57 ` [RFC 1/6] Swapless V1: try_to_unmap() - Rename ignrefs to "migration" Christoph Lameter
@ 2006-04-04  6:57 ` Christoph Lameter
  2006-04-04 11:04   ` KAMEZAWA Hiroyuki
  2006-04-04  6:57 ` [RFC 3/6] Swapless V1: try_to_unmap() - Create migration entries Christoph Lameter
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2006-04-04  6:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Lee Schermerhorn, Christoph Lameter, lhms-devel,
	Hirokazu Takahashi, Marcelo Tosatti, KAMEZAWA Hiroyuki

Add migration swap type

SWP_TYPE_MIGRATION is a special swap type that encodes the pfn of the
page in the swp_offset.

Note that the swp_offset size is limited. This is 27 bits on 32 bit and
54 bits on IA64. pfn numbers must fit into that size of a field for
this scheme to work. Could that be a problem?

SWP_TYPE_MIGRATION is only set for a pte while the corresponding page
is locked. It is removed while the page is still locked. Therefore the
processing for this special type of swap page can be simple.

The freeing of this type of entry is simply ignored.

lookup_swap_cache() determines the page from the pfn and only takes a
reference on the page.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.17-rc1/mm/swap_state.c
===================================================================
--- linux-2.6.17-rc1.orig/mm/swap_state.c	2006-04-02 20:22:10.000000000 -0700
+++ linux-2.6.17-rc1/mm/swap_state.c	2006-04-03 23:26:21.000000000 -0700
@@ -10,6 +10,7 @@
 #include <linux/mm.h>
 #include <linux/kernel_stat.h>
 #include <linux/swap.h>
+#include <linux/swapops.h>
 #include <linux/init.h>
 #include <linux/pagemap.h>
 #include <linux/buffer_head.h>
@@ -299,6 +300,16 @@ struct page * lookup_swap_cache(swp_entr
 {
 	struct page *page;
 
+	/*
+	 * If the swap type is SWP_TYPE_MIGRATION then the
+	 * swap entry contains the pfn of a page.
+	 */
+	if (unlikely(swp_type(entry) == SWP_TYPE_MIGRATION)) {
+		page = pfn_to_page(swp_offset(entry));
+		get_page(page);
+		return page;
+	}
+
 	page = find_get_page(&swapper_space, entry.val);
 
 	if (page)
Index: linux-2.6.17-rc1/mm/swapfile.c
===================================================================
--- linux-2.6.17-rc1.orig/mm/swapfile.c	2006-04-02 20:22:10.000000000 -0700
+++ linux-2.6.17-rc1/mm/swapfile.c	2006-04-03 23:26:21.000000000 -0700
@@ -395,6 +395,9 @@ void free_swap_and_cache(swp_entry_t ent
 	struct swap_info_struct * p;
 	struct page *page = NULL;
 
+	if (swp_type(entry) == SWP_TYPE_MIGRATION)
+		return;
+
 	p = swap_info_get(entry);
 	if (p) {
 		if (swap_entry_free(p, swp_offset(entry)) == 1) {
@@ -1710,6 +1713,9 @@ int swap_duplicate(swp_entry_t entry)
 	int result = 0;
 
 	type = swp_type(entry);
+	if (type == SWP_TYPE_MIGRATION)
+		return 1;
+
 	if (type >= nr_swapfiles)
 		goto bad_file;
 	p = type + swap_info;
Index: linux-2.6.17-rc1/include/linux/swap.h
===================================================================
--- linux-2.6.17-rc1.orig/include/linux/swap.h	2006-04-02 20:22:10.000000000 -0700
+++ linux-2.6.17-rc1/include/linux/swap.h	2006-04-03 23:43:03.000000000 -0700
@@ -29,7 +29,10 @@ static inline int current_is_kswapd(void
  * the type/offset into the pte as 5/27 as well.
  */
 #define MAX_SWAPFILES_SHIFT	5
-#define MAX_SWAPFILES		(1 << MAX_SWAPFILES_SHIFT)
+#define MAX_SWAPFILES		((1 << MAX_SWAPFILES_SHIFT)-1)
+
+/* Use last entry for page migration swap entries */
+#define SWP_TYPE_MIGRATION	MAX_SWAPFILES
 
 /*
  * Magic header for a swap area. The first part of the union is
@@ -293,7 +296,6 @@ static inline void disable_swap_token(vo
 #define swap_duplicate(swp)			/*NOTHING*/
 #define swap_free(swp)				/*NOTHING*/
 #define read_swap_cache_async(swp,vma,addr)	NULL
-#define lookup_swap_cache(swp)			NULL
 #define valid_swaphandles(swp, off)		0
 #define can_share_swap_page(p)			0
 #define move_to_swap_cache(p, swp)		1
@@ -302,6 +304,12 @@ static inline void disable_swap_token(vo
 #define delete_from_swap_cache(p)		/*NOTHING*/
 #define swap_token_default_timeout		0
 
+#ifdef CONFIG_MIGRATION
+extern struct page* lookup_swap_cache(swp_entry_t);
+#else
+#define lookup_swap_cache(swp)			NULL
+#endif
+
 static inline int remove_exclusive_swap_page(struct page *p)
 {
 	return 0;
Index: linux-2.6.17-rc1/mm/migrate.c
===================================================================
--- linux-2.6.17-rc1.orig/mm/migrate.c	2006-04-03 22:07:40.000000000 -0700
+++ linux-2.6.17-rc1/mm/migrate.c	2006-04-03 23:44:10.000000000 -0700
@@ -32,6 +32,18 @@
 
 #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
 
+#ifndef CONFIG_SWAP
+struct page *lookup_swap_cache(swp_entry_t entry)
+{
+	if (unlikely(swp_type(entry) == SWP_TYPE_MIGRATION)) {
+		struct page *page = pfn_to_page(swp_offset(entry));
+		get_page(page);
+		return page;
+	}
+	return NULL;
+}
+#endif
+
 /*
  * Isolate one page from the LRU lists. If successful put it onto
  * the indicated list with elevated page count.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC 3/6] Swapless V1: try_to_unmap() - Create migration entries
  2006-04-04  6:57 [RFC 0/6] Swapless Page Migration V1: Overview Christoph Lameter
  2006-04-04  6:57 ` [RFC 1/6] Swapless V1: try_to_unmap() - Rename ignrefs to "migration" Christoph Lameter
  2006-04-04  6:57 ` [RFC 2/6] Swapless V1: Add SWP_TYPE_MIGRATION Christoph Lameter
@ 2006-04-04  6:57 ` Christoph Lameter
  2006-04-04  6:58 ` [RFC 4/6] Swapless V1: remove migration ptes Christoph Lameter
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2006-04-04  6:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Lee Schermerhorn, Christoph Lameter, lhms-devel,
	Hirokazu Takahashi, Marcelo Tosatti, KAMEZAWA Hiroyuki

Modify try_to_unmap to produce swap migration entries

If we are trying to unmap an entry and do not have an associated
swapcache entry but are doing migration then create a special
swap pte of type SWP_TYPE_MIGRATION pointing to the pfn.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.17-rc1/mm/rmap.c
===================================================================
--- linux-2.6.17-rc1.orig/mm/rmap.c	2006-04-03 22:33:56.000000000 -0700
+++ linux-2.6.17-rc1/mm/rmap.c	2006-04-03 22:50:00.000000000 -0700
@@ -620,6 +620,17 @@ static int try_to_unmap_one(struct page 
 
 	if (PageAnon(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
+
+		if (!PageSwapCache(page) && migration) {
+			/*
+			 * Store the pfn of the page in a special migration
+			 * pte. do_swap_page() will wait until the page is unlocked
+			 * and then restart the fault handling.
+			 */
+			entry = swp_entry(SWP_TYPE_MIGRATION, page_to_pfn(page));
+			set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
+			goto finish;
+		}
 		/*
 		 * Store the swap location in the pte.
 		 * See handle_pte_fault() ...
@@ -638,6 +649,7 @@ static int try_to_unmap_one(struct page 
 	} else
 		dec_mm_counter(mm, file_rss);
 
+finish:
 	page_remove_rmap(page);
 	page_cache_release(page);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC 4/6] Swapless V1: remove migration ptes
  2006-04-04  6:57 [RFC 0/6] Swapless Page Migration V1: Overview Christoph Lameter
                   ` (2 preceding siblings ...)
  2006-04-04  6:57 ` [RFC 3/6] Swapless V1: try_to_unmap() - Create migration entries Christoph Lameter
@ 2006-04-04  6:58 ` Christoph Lameter
  2006-04-04  6:58 ` [RFC 5/6] Swapless V1: Rip out swap migration code Christoph Lameter
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2006-04-04  6:58 UTC (permalink / raw)
  To: linux-mm
  Cc: Lee Schermerhorn, Christoph Lameter, lhms-devel,
	Hirokazu Takahashi, Marcelo Tosatti, KAMEZAWA Hiroyuki

Add ability to remove migration ptes.

1. Modify page_check_address to support matching on ptes with
   SWP_TYPE_MIGRATION

2. Add functions to scan the anon vma and replace SWAP_TYPE_MIGRATION
   ptes with regular ones.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.17-rc1/mm/rmap.c
===================================================================
--- linux-2.6.17-rc1.orig/mm/rmap.c	2006-04-03 22:50:00.000000000 -0700
+++ linux-2.6.17-rc1/mm/rmap.c	2006-04-03 22:57:08.000000000 -0700
@@ -291,7 +291,7 @@ pte_t *page_check_address(struct page *p
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
-	pte_t *pte;
+	pte_t *ptep, pte;
 	spinlock_t *ptl;
 
 	pgd = pgd_offset(mm, address);
@@ -306,23 +306,84 @@ pte_t *page_check_address(struct page *p
 	if (!pmd_present(*pmd))
 		return NULL;
 
-	pte = pte_offset_map(pmd, address);
+	ptep = pte_offset_map(pmd, address);
+	pte = *ptep;
 	/* Make a quick check before getting the lock */
-	if (!pte_present(*pte)) {
-		pte_unmap(pte);
+	if (pte_none(pte) || pte_file(pte)) {
+		pte_unmap(ptep);
 		return NULL;
 	}
 
 	ptl = pte_lockptr(mm, pmd);
 	spin_lock(ptl);
-	if (pte_present(*pte) && page_to_pfn(page) == pte_pfn(*pte)) {
-		*ptlp = ptl;
-		return pte;
+	if (pte_present(pte)) {
+		if (page_to_pfn(page) == pte_pfn(pte)) {
+			*ptlp = ptl;
+			return ptep;
+		}
+	} else {
+		/* Could still be a migration entry pointing to the page */
+		swp_entry_t entry = pte_to_swp_entry(pte);
+
+		if (swp_type(entry) == SWP_TYPE_MIGRATION &&
+			swp_offset(entry) == page_to_pfn(page)) {
+				*ptlp = ptl;
+				return ptep;
+		}
 	}
 	pte_unmap_unlock(pte, ptl);
 	return NULL;
 }
 
+#ifdef CONFIG_MIGRATION
+/*
+ * Restore a potential migration pte to a working pte entry for
+ * anonymous pages.
+ */
+static void remove_migration_pte(struct vm_area_struct *vma, unsigned long addr,
+		struct page *old, struct page *new)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	pte_t *ptep;
+	spinlock_t *ptl;
+
+	ptep = page_check_address(old, mm, addr, &ptl);
+	if (!ptep)
+		return;
+
+	get_page(new);
+	set_pte_at(mm, addr, ptep, pte_mkold(mk_pte(new, vma->vm_page_prot)));
+	page_add_anon_rmap(new, vma, addr);
+
+	spin_unlock(ptl);
+}
+
+/*
+ * Get rid of all migration entries and replace them by
+ * references to the indicated page.
+ *
+ * Must hold mmap_sem lock on at least one of the vmas containing
+ * the page so that the anon_vma cannot vanish.
+ */
+void remove_migration_ptes(struct page *page, struct page *newpage)
+{
+	struct anon_vma *anon_vma;
+	struct vm_area_struct *vma;
+
+	if (!PageAnon(newpage))
+		return;
+
+	anon_vma = page_lock_anon_vma(newpage);
+	BUG_ON(!anon_vma);
+
+	list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
+		remove_migration_pte(vma, page_address_in_vma(newpage, vma),
+				page, newpage);
+
+	spin_unlock(&anon_vma->lock);
+}
+#endif
+
 /*
  * Subfunctions of page_referenced: page_referenced_one called
  * repeatedly from either page_referenced_anon or page_referenced_file.
Index: linux-2.6.17-rc1/include/linux/rmap.h
===================================================================
--- linux-2.6.17-rc1.orig/include/linux/rmap.h	2006-04-02 20:22:10.000000000 -0700
+++ linux-2.6.17-rc1/include/linux/rmap.h	2006-04-03 22:57:08.000000000 -0700
@@ -105,6 +105,11 @@ pte_t *page_check_address(struct page *,
  */
 unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
 
+/*
+ * Used by page migration to restore ptes of anonymous pages
+ */
+void remove_migration_ptes(struct page *page, struct page *newpage);
+
 #else	/* !CONFIG_MMU */
 
 #define anon_vma_init()		do {} while (0)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC 5/6] Swapless V1: Rip out swap migration code
  2006-04-04  6:57 [RFC 0/6] Swapless Page Migration V1: Overview Christoph Lameter
                   ` (3 preceding siblings ...)
  2006-04-04  6:58 ` [RFC 4/6] Swapless V1: remove migration ptes Christoph Lameter
@ 2006-04-04  6:58 ` Christoph Lameter
  2006-04-04 10:37   ` KAMEZAWA Hiroyuki
  2006-04-04  6:58 ` [RFC 6/6] Swapless V1: Revise main migration logic Christoph Lameter
  2006-04-05 14:46 ` [Lhms-devel] [RFC 0/6] Swapless Page Migration V1: Overview Lee Schermerhorn
  6 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2006-04-04  6:58 UTC (permalink / raw)
  To: linux-mm
  Cc: Lee Schermerhorn, Christoph Lameter, lhms-devel,
	Hirokazu Takahashi, Marcelo Tosatti, KAMEZAWA Hiroyuki

Rip the page migration logic out

Remove all code that has to do with swapping during page migration.

This also guts the ability to migrate pages to swap. No one used that
so lets let it go for good.

Page migration should be a bit broken after this patch.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.17-rc1/mm/migrate.c
===================================================================
--- linux-2.6.17-rc1.orig/mm/migrate.c	2006-04-03 22:07:40.000000000 -0700
+++ linux-2.6.17-rc1/mm/migrate.c	2006-04-03 22:07:56.000000000 -0700
@@ -70,10 +70,6 @@ int isolate_lru_page(struct page *page, 
  */
 int migrate_prep(void)
 {
-	/* Must have swap device for migration */
-	if (nr_swap_pages <= 0)
-		return -ENODEV;
-
 	/*
 	 * Clear the LRU lists so pages can be isolated.
 	 * Note that pages may be moved off the LRU after we have
@@ -129,53 +125,6 @@ int fail_migrate_page(struct page *newpa
 EXPORT_SYMBOL(fail_migrate_page);
 
 /*
- * swapout a single page
- * page is locked upon entry, unlocked on exit
- */
-static int swap_page(struct page *page)
-{
-	struct address_space *mapping = page_mapping(page);
-
-	if (page_mapped(page) && mapping)
-		if (try_to_unmap(page, 1) != SWAP_SUCCESS)
-			goto unlock_retry;
-
-	if (PageDirty(page)) {
-		/* Page is dirty, try to write it out here */
-		switch(pageout(page, mapping)) {
-		case PAGE_KEEP:
-		case PAGE_ACTIVATE:
-			goto unlock_retry;
-
-		case PAGE_SUCCESS:
-			goto retry;
-
-		case PAGE_CLEAN:
-			; /* try to free the page below */
-		}
-	}
-
-	if (PagePrivate(page)) {
-		if (!try_to_release_page(page, GFP_KERNEL) ||
-		    (!mapping && page_count(page) == 1))
-			goto unlock_retry;
-	}
-
-	if (remove_mapping(mapping, page)) {
-		/* Success */
-		unlock_page(page);
-		return 0;
-	}
-
-unlock_retry:
-	unlock_page(page);
-
-retry:
-	return -EAGAIN;
-}
-EXPORT_SYMBOL(swap_page);
-
-/*
  * Remove references for a page and establish the new page with the correct
  * basic settings to be able to stop accesses to the page.
  */
@@ -336,8 +285,7 @@ EXPORT_SYMBOL(migrate_page);
  * Two lists are passed to this function. The first list
  * contains the pages isolated from the LRU to be migrated.
  * The second list contains new pages that the pages isolated
- * can be moved to. If the second list is NULL then all
- * pages are swapped out.
+ * can be moved to.
  *
  * The function returns after 10 attempts or if no pages
  * are movable anymore because to has become empty
@@ -393,30 +341,13 @@ redo:
 		 * Only wait on writeback if we have already done a pass where
 		 * we we may have triggered writeouts for lots of pages.
 		 */
-		if (pass > 0) {
+		if (pass > 0)
 			wait_on_page_writeback(page);
-		} else {
+		else {
 			if (PageWriteback(page))
 				goto unlock_page;
 		}
 
-		/*
-		 * Anonymous pages must have swap cache references otherwise
-		 * the information contained in the page maps cannot be
-		 * preserved.
-		 */
-		if (PageAnon(page) && !PageSwapCache(page)) {
-			if (!add_to_swap(page, GFP_KERNEL)) {
-				rc = -ENOMEM;
-				goto unlock_page;
-			}
-		}
-
-		if (!to) {
-			rc = swap_page(page);
-			goto next;
-		}
-
 		newpage = lru_to_page(to);
 		lock_page(newpage);
 
@@ -470,24 +401,6 @@ redo:
 			goto unlock_both;
 		}
 
-		/*
-		 * On early passes with mapped pages simply
-		 * retry. There may be a lock held for some
-		 * buffers that may go away. Later
-		 * swap them out.
-		 */
-		if (pass > 4) {
-			/*
-			 * Persistently unable to drop buffers..... As a
-			 * measure of last resort we fall back to
-			 * swap_page().
-			 */
-			unlock_page(newpage);
-			newpage = NULL;
-			rc = swap_page(page);
-			goto next;
-		}
-
 unlock_both:
 		unlock_page(newpage);
 
Index: linux-2.6.17-rc1/mm/swapfile.c
===================================================================
--- linux-2.6.17-rc1.orig/mm/swapfile.c	2006-04-03 22:07:46.000000000 -0700
+++ linux-2.6.17-rc1/mm/swapfile.c	2006-04-03 22:07:56.000000000 -0700
@@ -618,15 +618,6 @@ static int unuse_mm(struct mm_struct *mm
 	return 0;
 }
 
-#ifdef CONFIG_MIGRATION
-int remove_vma_swap(struct vm_area_struct *vma, struct page *page)
-{
-	swp_entry_t entry = { .val = page_private(page) };
-
-	return unuse_vma(vma, entry, page);
-}
-#endif
-
 /*
  * Scan swap_map from current position to next entry still in use.
  * Recycle to start on reaching the end, returning 0 when empty.
Index: linux-2.6.17-rc1/mm/rmap.c
===================================================================
--- linux-2.6.17-rc1.orig/mm/rmap.c	2006-04-03 22:07:55.000000000 -0700
+++ linux-2.6.17-rc1/mm/rmap.c	2006-04-03 22:07:56.000000000 -0700
@@ -205,44 +205,6 @@ out:
 	return anon_vma;
 }
 
-#ifdef CONFIG_MIGRATION
-/*
- * Remove an anonymous page from swap replacing the swap pte's
- * through real pte's pointing to valid pages and then releasing
- * the page from the swap cache.
- *
- * Must hold page lock on page and mmap_sem of one vma that contains
- * the page.
- */
-void remove_from_swap(struct page *page)
-{
-	struct anon_vma *anon_vma;
-	struct vm_area_struct *vma;
-	unsigned long mapping;
-
-	if (!PageSwapCache(page))
-		return;
-
-	mapping = (unsigned long)page->mapping;
-
-	if (!mapping || (mapping & PAGE_MAPPING_ANON) == 0)
-		return;
-
-	/*
-	 * We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
-	 */
-	anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
-	spin_lock(&anon_vma->lock);
-
-	list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
-		remove_vma_swap(vma, page);
-
-	spin_unlock(&anon_vma->lock);
-	delete_from_swap_cache(page);
-}
-EXPORT_SYMBOL(remove_from_swap);
-#endif
-
 /*
  * At what user virtual address is page expected in vma?
  */
Index: linux-2.6.17-rc1/include/linux/rmap.h
===================================================================
--- linux-2.6.17-rc1.orig/include/linux/rmap.h	2006-04-03 22:07:55.000000000 -0700
+++ linux-2.6.17-rc1/include/linux/rmap.h	2006-04-03 22:07:56.000000000 -0700
@@ -92,7 +92,6 @@ static inline void page_dup_rmap(struct 
  */
 int page_referenced(struct page *, int is_locked);
 int try_to_unmap(struct page *, int ignore_refs);
-void remove_from_swap(struct page *page);
 
 /*
  * Called from mm/filemap_xip.c to unmap empty zero page

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC 6/6] Swapless V1: Revise main migration logic
  2006-04-04  6:57 [RFC 0/6] Swapless Page Migration V1: Overview Christoph Lameter
                   ` (4 preceding siblings ...)
  2006-04-04  6:58 ` [RFC 5/6] Swapless V1: Rip out swap migration code Christoph Lameter
@ 2006-04-04  6:58 ` Christoph Lameter
  2006-04-04 10:58   ` KAMEZAWA Hiroyuki
  2006-04-05 14:46 ` [Lhms-devel] [RFC 0/6] Swapless Page Migration V1: Overview Lee Schermerhorn
  6 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2006-04-04  6:58 UTC (permalink / raw)
  To: linux-mm
  Cc: Lee Schermerhorn, Christoph Lameter, lhms-devel,
	Hirokazu Takahashi, Marcelo Tosatti, KAMEZAWA Hiroyuki

New migration scheme

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.17-rc1/mm/migrate.c
===================================================================
--- linux-2.6.17-rc1.orig/mm/migrate.c	2006-04-03 23:44:31.000000000 -0700
+++ linux-2.6.17-rc1/mm/migrate.c	2006-04-03 23:48:02.000000000 -0700
@@ -151,27 +151,21 @@ int migrate_page_remove_references(struc
 	 * indicates that the page is in use or truncate has removed
 	 * the page.
 	 */
-	if (!mapping || page_mapcount(page) + nr_refs != page_count(page))
-		return -EAGAIN;
+	if (!page->mapping ||
+		page_mapcount(page) + nr_refs + !!mapping != page_count(page))
+			return -EAGAIN;
 
 	/*
-	 * Establish swap ptes for anonymous pages or destroy pte
+	 * Establish migration ptes for anonymous pages or destroy pte
 	 * maps for files.
 	 *
 	 * In order to reestablish file backed mappings the fault handlers
 	 * will take the radix tree_lock which may then be used to stop
   	 * processses from accessing this page until the new page is ready.
 	 *
-	 * A process accessing via a swap pte (an anonymous page) will take a
-	 * page_lock on the old page which will block the process until the
-	 * migration attempt is complete. At that time the PageSwapCache bit
-	 * will be examined. If the page was migrated then the PageSwapCache
-	 * bit will be clear and the operation to retrieve the page will be
-	 * retried which will find the new page in the radix tree. Then a new
-	 * direct mapping may be generated based on the radix tree contents.
-	 *
-	 * If the page was not migrated then the PageSwapCache bit
-	 * is still set and the operation may continue.
+	 * A process accessing via a migration pte (an anonymous page) will
+	 * take a  page_lock on the old page which will block the process
+	 * until the migration attempt is complete.
 	 */
 	if (try_to_unmap(page, 1) == SWAP_FAIL)
 		/* A vma has VM_LOCKED set -> permanent failure */
@@ -183,13 +177,19 @@ int migrate_page_remove_references(struc
 	if (page_mapcount(page))
 		return -EAGAIN;
 
+	if (!mapping)
+		return 0;	/* Anonymous page without swap */
+
+	/*
+	 * Page has a mapping that we need to change
+	 */
 	write_lock_irq(&mapping->tree_lock);
 
 	radix_pointer = (struct page **)radix_tree_lookup_slot(
 						&mapping->page_tree,
 						page_index(page));
 
-	if (!page_mapping(page) || page_count(page) != nr_refs ||
+	if (!page_mapping(page) || page_count(page) != nr_refs + 1 ||
 			*radix_pointer != page) {
 		write_unlock_irq(&mapping->tree_lock);
 		return -EAGAIN;
@@ -206,11 +206,12 @@ int migrate_page_remove_references(struc
 	get_page(newpage);
 	newpage->index = page->index;
 	newpage->mapping = page->mapping;
+#ifdef CONFIG_SWAP
 	if (PageSwapCache(page)) {
 		SetPageSwapCache(newpage);
 		set_page_private(newpage, page_private(page));
 	}
-
+#endif
 	*radix_pointer = newpage;
 	__put_page(page);
 	write_unlock_irq(&mapping->tree_lock);
@@ -244,7 +245,9 @@ void migrate_page_copy(struct page *newp
 		set_page_dirty(newpage);
  	}
 
+#ifdef CONFIG_SWAP
 	ClearPageSwapCache(page);
+#endif
 	ClearPageActive(page);
 	ClearPagePrivate(page);
 	set_page_private(page, 0);
@@ -271,10 +274,12 @@ int migrate_page(struct page *newpage, s
 
 	BUG_ON(PageWriteback(page));	/* Writeback must be complete */
 
-	rc = migrate_page_remove_references(newpage, page, 2);
+	rc = migrate_page_remove_references(newpage, page, 1);
 
-	if (rc)
+	if (rc) {
+		remove_migration_ptes(page, page);
 		return rc;
+	}
 
 	migrate_page_copy(newpage, page);
 
@@ -286,7 +291,7 @@ int migrate_page(struct page *newpage, s
 	 * waiting on the page lock to use the new page via the page tables
 	 * before the new page is unlocked.
 	 */
-	remove_from_swap(newpage);
+	remove_migration_ptes(page, newpage);
 	return 0;
 }
 EXPORT_SYMBOL(migrate_page);
@@ -368,9 +373,12 @@ redo:
 		 * Try to migrate the page.
 		 */
 		mapping = page_mapping(page);
-		if (!mapping)
+		if (!mapping) {
+
+			rc = migrate_page(newpage, page);
 			goto unlock_both;
 
+		} else
 		if (mapping->a_ops->migratepage) {
 			/*
 			 * Most pages have a mapping and most filesystems
@@ -462,7 +470,7 @@ int buffer_migrate_page(struct page *new
 
 	head = page_buffers(page);
 
-	rc = migrate_page_remove_references(newpage, page, 3);
+	rc = migrate_page_remove_references(newpage, page, 2);
 
 	if (rc)
 		return rc;
Index: linux-2.6.17-rc1/mm/Kconfig
===================================================================
--- linux-2.6.17-rc1.orig/mm/Kconfig	2006-04-02 20:22:10.000000000 -0700
+++ linux-2.6.17-rc1/mm/Kconfig	2006-04-03 23:44:31.000000000 -0700
@@ -138,8 +138,8 @@ config SPLIT_PTLOCK_CPUS
 #
 config MIGRATION
 	bool "Page migration"
-	def_bool y if NUMA
-	depends on SWAP && NUMA
+	def_bool y
+	depends on NUMA
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful for

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 5/6] Swapless V1: Rip out swap migration code
  2006-04-04  6:58 ` [RFC 5/6] Swapless V1: Rip out swap migration code Christoph Lameter
@ 2006-04-04 10:37   ` KAMEZAWA Hiroyuki
  2006-04-04 15:06     ` Christoph Lameter
  0 siblings, 1 reply; 23+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-04 10:37 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, lee.schermerhorn, lhms-devel, taka, marcelo.tosatti

On Mon, 3 Apr 2006 23:58:05 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> Rip the page migration logic out
> 

Thank you. I like this removal, especially removing remove_from_swap() :)

-- Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 6/6] Swapless V1: Revise main migration logic
  2006-04-04  6:58 ` [RFC 6/6] Swapless V1: Revise main migration logic Christoph Lameter
@ 2006-04-04 10:58   ` KAMEZAWA Hiroyuki
  2006-04-04 14:24     ` Christoph Lameter
  0 siblings, 1 reply; 23+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-04 10:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, lee.schermerhorn, lhms-devel, taka, marcelo.tosatti

On Mon, 3 Apr 2006 23:58:10 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> New migration scheme
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> Index: linux-2.6.17-rc1/mm/migrate.c
> ===================================================================
> --- linux-2.6.17-rc1.orig/mm/migrate.c	2006-04-03 23:44:31.000000000 -0700
> +++ linux-2.6.17-rc1/mm/migrate.c	2006-04-03 23:48:02.000000000 -0700
> @@ -151,27 +151,21 @@ int migrate_page_remove_references(struc
>  	 * indicates that the page is in use or truncate has removed
>  	 * the page.
>  	 */
> -	if (!mapping || page_mapcount(page) + nr_refs != page_count(page))
> -		return -EAGAIN;
> +	if (!page->mapping ||
> +		page_mapcount(page) + nr_refs + !!mapping != page_count(page))
> +			return -EAGAIN;
>  
I think this hidden !!mapping refcnt is not easy to read.

How about modifying caller istead of callee ?

in migrate_page()
==
if (page->mapping) 
	rc = migrate_page_remove_reference(newpage, page, 2)
else
	rc = migrate_page_remove_reference(newpage, page, 1);
==

If you dislike this 'if', plz do as you like.

Thanks,
--Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 2/6] Swapless V1:  Add SWP_TYPE_MIGRATION
  2006-04-04  6:57 ` [RFC 2/6] Swapless V1: Add SWP_TYPE_MIGRATION Christoph Lameter
@ 2006-04-04 11:04   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 23+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-04 11:04 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, lee.schermerhorn, lhms-devel, taka, marcelo.tosatti

On Mon, 3 Apr 2006 23:57:50 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

>
>  #define MAX_SWAPFILES_SHIFT	5
> -#define MAX_SWAPFILES		(1 << MAX_SWAPFILES_SHIFT)
> +#define MAX_SWAPFILES		((1 << MAX_SWAPFILES_SHIFT)-1)
> +
> +/* Use last entry for page migration swap entries */
> +#define SWP_TYPE_MIGRATION	MAX_SWAPFILES

How about this ?

#ifdef CONFIG_MIGRATION
#define MAX_SWAPFILES ((1 << MAX_SWAPFILES_SHIFT) - 1)
#else
#define MAX_SWAPFILES (1 << MAX_SWAPFILES_SHIFT)
#endif

#define SWP_TYPE_MIGRATION (MAX_SWAPFILES + 1)


.....but I don't think there is a user who uses 32 swaps....

--Kame
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 6/6] Swapless V1: Revise main migration logic
  2006-04-04 10:58   ` KAMEZAWA Hiroyuki
@ 2006-04-04 14:24     ` Christoph Lameter
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2006-04-04 14:24 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, lee.schermerhorn, lhms-devel, taka, marcelo.tosatti

On Tue, 4 Apr 2006, KAMEZAWA Hiroyuki wrote:

> >  	 */
> > -	if (!mapping || page_mapcount(page) + nr_refs != page_count(page))
> > -		return -EAGAIN;
> > +	if (!page->mapping ||
> > +		page_mapcount(page) + nr_refs + !!mapping != page_count(page))
> > +			return -EAGAIN;
> >  
> I think this hidden !!mapping refcnt is not easy to read.
> 
> How about modifying caller istead of callee ?
> 
> in migrate_page()
> ==
> if (page->mapping) 
> 	rc = migrate_page_remove_reference(newpage, page, 2)
> else
> 	rc = migrate_page_remove_reference(newpage, page, 1);
> ==
> 
> If you dislike this 'if', plz do as you like.

Good idea.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 5/6] Swapless V1: Rip out swap migration code
  2006-04-04 10:37   ` KAMEZAWA Hiroyuki
@ 2006-04-04 15:06     ` Christoph Lameter
  2006-04-05  1:06       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2006-04-04 15:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, lee.schermerhorn, lhms-devel, taka, marcelo.tosatti

On Tue, 4 Apr 2006, KAMEZAWA Hiroyuki wrote:

> On Mon, 3 Apr 2006 23:58:05 -0700 (PDT)
> Christoph Lameter <clameter@sgi.com> wrote:
> 
> > Rip the page migration logic out
> > 
> 
> Thank you. I like this removal, especially removing remove_from_swap() :)

Have a look at remove_migration_ptes(). Like remove_from_swap() it has the 
requirement that the mmap_sem is held since that is the only secure way to 
make sure that the anon_vma is not vanishing from under us. That may be a 
problem if you are not coming from a process context. Any ideas on how to 
fix that?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 5/6] Swapless V1: Rip out swap migration code
  2006-04-04 15:06     ` Christoph Lameter
@ 2006-04-05  1:06       ` KAMEZAWA Hiroyuki
  2006-04-05  2:45         ` Christoph Lameter
  0 siblings, 1 reply; 23+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-05  1:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, lee.schermerhorn, lhms-devel, taka, marcelo.tosatti

On Tue, 4 Apr 2006 08:06:26 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Tue, 4 Apr 2006, KAMEZAWA Hiroyuki wrote:
> 
> > On Mon, 3 Apr 2006 23:58:05 -0700 (PDT)
> > Christoph Lameter <clameter@sgi.com> wrote:
> > 
> > > Rip the page migration logic out
> > > 
> > 
> > Thank you. I like this removal, especially removing remove_from_swap() :)
> 
> Have a look at remove_migration_ptes(). Like remove_from_swap() it has the 
> requirement that the mmap_sem is held since that is the only secure way to 
> make sure that the anon_vma is not vanishing from under us. That may be a 
> problem if you are not coming from a process context. Any ideas on how to 
> fix that?
> 
I think adding SWP_TYPE_MIGRATION consideration to free_swap_and_cache() is
enough against anon_vma vanishing. Because remove_migration_ptes() compares 
old pte entry with old page's pfn, a page cannot be remapped into old place
when anon_vma has gone. This is my first impression.
My concern is refcnt handling of SWP_TYPE_MIGRATION pages, but maybe no problem.

Note: unuse_vma() doesn't check what pte entry contains.

-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 5/6] Swapless V1: Rip out swap migration code
  2006-04-05  1:06       ` KAMEZAWA Hiroyuki
@ 2006-04-05  2:45         ` Christoph Lameter
  2006-04-05  3:33           ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2006-04-05  2:45 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, lee.schermerhorn, lhms-devel, taka, marcelo.tosatti

On Wed, 5 Apr 2006, KAMEZAWA Hiroyuki wrote:

> I think adding SWP_TYPE_MIGRATION consideration to free_swap_and_cache() is
> enough against anon_vma vanishing. Because remove_migration_ptes() compares 
> old pte entry with old page's pfn, a page cannot be remapped into old place
> when anon_vma has gone. This is my first impression.

However, the last process containing the page may terminate and free the 
page, while we migrate. The SWAP_TYPE_MIGRATION pte will be rewoved 
together with the anonvma if no lock is held on mmap_sem. Then 
remove_migration_ptes() cannot obtain a anon_vma. So it would break 
without holding mmap_sem. We could fix this if we could somehow know that 
the last process mapping the page vanished and skip 
remove_migration_ptes().

> My concern is refcnt handling of SWP_TYPE_MIGRATION pages, but maybe no problem.

What are the exact concerns?


> Note: unuse_vma() doesn't check what pte entry contains.

unuse_vma() relies on the mapping via swap space that will no longer exist 
with the new code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 5/6] Swapless V1: Rip out swap migration code
  2006-04-05  2:45         ` Christoph Lameter
@ 2006-04-05  3:33           ` KAMEZAWA Hiroyuki
  2006-04-05  3:47             ` Christoph Lameter
  0 siblings, 1 reply; 23+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-05  3:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, lee.schermerhorn, lhms-devel, taka, marcelo.tosatti

On Tue, 4 Apr 2006 19:45:49 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> > My concern is refcnt handling of SWP_TYPE_MIGRATION pages, but maybe no problem.
> 
> What are the exact concerns?
> 
When a page is converted into SWP_TYPE_MIGRATION, changed pte entry
implicitly points old page. This introduces the state 'a page is referred 
but no refcnt'. if mmap_sem is held, this is maybe no problem. 
but looks a bit dangerous.


> On Wed, 5 Apr 2006, KAMEZAWA Hiroyuki wrote:
> 
> > I think adding SWP_TYPE_MIGRATION consideration to free_swap_and_cache() is
> > enough against anon_vma vanishing. Because remove_migration_ptes() compares 
> > old pte entry with old page's pfn, a page cannot be remapped into old place
> > when anon_vma has gone. This is my first impression.
> 
> However, the last process containing the page may terminate and free the 
> page, while we migrate. The SWAP_TYPE_MIGRATION pte will be rewoved 
> together with the anonvma if no lock is held on mmap_sem. 
yes. 

> Then remove_migration_ptes() cannot obtain a anon_vma. So it would break 
> without holding mmap_sem. We could fix this if we could somehow know that 
> the last process mapping the page vanished and skip 
> remove_migration_ptes().
> 

Hmm, I'm not sure but how about this way ?
1. don't drop refcnt in try_to_unmap_one() when changing a page to 
   SWP_TYPE_MIGRATION. because it is referred. (rmap should be removed ?)
2. drop refcnt of the old page and inc refcnt of the new page in 
   remove_migration_ptes()

like this.
==
in remove_migration_pte
+	ptep = page_check_address(old, mm, addr, &ptl);
+	if (!ptep)
+		return;
+
+	get_page(new);
+	set_pte_at(mm, addr, ptep, pte_mkold(mk_pte(new, vma->vm_page_prot)));
+	page_add_anon_rmap(new, vma, addr);

+ put_page(old); << add this

We can check old page's refcnt in remove_migration_ptes().
if page_count(oldpage)==1, this page's anon_vma is removed.
So we don't have to modify ptes, all of them are zapped..
(In this method, page's refcnt should be dropped when swp_entry
 for SWP_TYPE_MIGRATION is freed.)

In page unmapping, each page's refcnt is dropped before zapping anon_vma.
So, I think this can work.

> 
> > Note: unuse_vma() doesn't check what pte entry contains.
> 
> unuse_vma() relies on the mapping via swap space that will no longer exist 
> with the new code.
> 
Yes. I know. just wrote about old code. sorry.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 5/6] Swapless V1: Rip out swap migration code
  2006-04-05  3:33           ` KAMEZAWA Hiroyuki
@ 2006-04-05  3:47             ` Christoph Lameter
  2006-04-05  4:07               ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2006-04-05  3:47 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, lee.schermerhorn, lhms-devel, taka, marcelo.tosatti

On Wed, 5 Apr 2006, KAMEZAWA Hiroyuki wrote:

> On Tue, 4 Apr 2006 19:45:49 -0700 (PDT)
> Christoph Lameter <clameter@sgi.com> wrote:
> 
> > > My concern is refcnt handling of SWP_TYPE_MIGRATION pages, but maybe no problem.
> > 
> > What are the exact concerns?
> > 
> When a page is converted into SWP_TYPE_MIGRATION, changed pte entry
> implicitly points old page. This introduces the state 'a page is referred 
> but no refcnt'. if mmap_sem is held, this is maybe no problem. 
> but looks a bit dangerous.

We have increased the refcnt on the page (see isolate_lru_page()) and the 
page is locked when  SWP_TYPE_MIGRATION is used. So there is a refcnt.

> > > I think adding SWP_TYPE_MIGRATION consideration to free_swap_and_cache() is
> > > enough against anon_vma vanishing. Because remove_migration_ptes() compares 
> > > old pte entry with old page's pfn, a page cannot be remapped into old place
> > > when anon_vma has gone. This is my first impression.
> > 
> > However, the last process containing the page may terminate and free the 
> > page, while we migrate. The SWAP_TYPE_MIGRATION pte will be rewoved 
> > together with the anonvma if no lock is held on mmap_sem. 
> yes. 
> 
> > Then remove_migration_ptes() cannot obtain a anon_vma. So it would break 
> > without holding mmap_sem. We could fix this if we could somehow know that 
> > the last process mapping the page vanished and skip 
> > remove_migration_ptes().
> > 
> 
> Hmm, I'm not sure but how about this way ?
> 1. don't drop refcnt in try_to_unmap_one() when changing a page to 
>    SWP_TYPE_MIGRATION. because it is referred. (rmap should be removed ?)

Then we would have a page with mapcounts but there are no real ptes 
pointing to the page. It would be a strange condition for the page. 

Moreover, a process may fork or terminate while we migrate. Forking may 
increase the refcnt and termination may decrease it. We do not keep
refcnts for the SWP_TYPE_MIGRATION entry but rely on the reverse maps. So 
we may end up with a messed up mapcount if we do not drop the refcnts.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 5/6] Swapless V1: Rip out swap migration code
  2006-04-05  3:47             ` Christoph Lameter
@ 2006-04-05  4:07               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 23+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-05  4:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, lee.schermerhorn, lhms-devel, taka, marcelo.tosatti

On Tue, 4 Apr 2006 20:47:58 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> > When a page is converted into SWP_TYPE_MIGRATION, changed pte entry
> > implicitly points old page. This introduces the state 'a page is referred 
> > but no refcnt'. if mmap_sem is held, this is maybe no problem. 
> > but looks a bit dangerous.
> 
> We have increased the refcnt on the page (see isolate_lru_page()) and the 
> page is locked when  SWP_TYPE_MIGRATION is used. So there is a refcnt.
> 
yes. I just wrote about implicit refcnt.

> > > > I think adding SWP_TYPE_MIGRATION consideration to free_swap_and_cache() is
> > > > enough against anon_vma vanishing. Because remove_migration_ptes() compares 
> > > > old pte entry with old page's pfn, a page cannot be remapped into old place
> > > > when anon_vma has gone. This is my first impression.
> > > 
> > > However, the last process containing the page may terminate and free the 
> > > page, while we migrate. The SWAP_TYPE_MIGRATION pte will be rewoved 
> > > together with the anonvma if no lock is held on mmap_sem. 
> > yes. 
> > 
> > > Then remove_migration_ptes() cannot obtain a anon_vma. So it would break 
> > > without holding mmap_sem. We could fix this if we could somehow know that 
> > > the last process mapping the page vanished and skip 
> > > remove_migration_ptes().
> > > 
> > 
> > Hmm, I'm not sure but how about this way ?
> > 1. don't drop refcnt in try_to_unmap_one() when changing a page to 
> >    SWP_TYPE_MIGRATION. because it is referred. (rmap should be removed ?)
> 
> Then we would have a page with mapcounts but there are no real ptes 
> pointing to the page. It would be a strange condition for the page. 
> 
O.K. dropping mapcount is necessary. (migrate_page_remove_reference checks it, 
anyway)
refcnt mentioned above is page_count(page).

> Moreover, a process may fork or terminate while we migrate. Forking may 
> increase the refcnt and termination may decrease it. We do not keep
> refcnts for the SWP_TYPE_MIGRATION entry but rely on the reverse maps. So 
> we may end up with a messed up mapcount if we do not drop the refcnts.

At fork, copy_one_pte() can manage swap entry.
Adding SWP_TYPE_MIGRATION consideration there is necessary and enough if 
not holding mmap_sem. Hmm...maybe.

exit is the same case as zap_page_range(). modifing swap_entry_free() will be
necessary.

-Kame









--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Lhms-devel] [RFC 0/6] Swapless Page Migration V1: Overview
  2006-04-04  6:57 [RFC 0/6] Swapless Page Migration V1: Overview Christoph Lameter
                   ` (5 preceding siblings ...)
  2006-04-04  6:58 ` [RFC 6/6] Swapless V1: Revise main migration logic Christoph Lameter
@ 2006-04-05 14:46 ` Lee Schermerhorn
  2006-04-05 16:28   ` Christoph Lameter
  6 siblings, 1 reply; 23+ messages in thread
From: Lee Schermerhorn @ 2006-04-05 14:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, lhms-devel, Hirokazu Takahashi, Marcelo Tosatti,
	KAMEZAWA Hiroyuki

On Mon, 2006-04-03 at 23:57 -0700, Christoph Lameter wrote:
> Swapless Page migration
> 
> Currently page migration is depending on the ability to assign swap entries
> to pages. This means that page migration will not work without swap although
> that swap space is never used.
> 
> This patchset removes that dependency by introducing a special type of
> swap entry that encodes a pfn number of the page being migrated. If that
> swap pte is encountered then do_swap_page() will simply wait for the page
> to become unlocked again (meaning page migration is complete) and then refetch
> the pte. The special type of swap entry is only in use while the page to be
> migrated is locked and therefore we can hopefully get away with just a few
> supporting functions.
> 
> To some extend this covers the same ground as Lee's and Marcelo's migration
> cache. However, I hope that this approach simplifies things without opening
> up any holes. Please check.
> 

Christoph:

Does this approach still allow "migrate-on-fault" for anon pages?
Especially, in the case where the migrating page has >1 pte referencing
it?  How will the fault handler find all of the pte's referencing the
old page?  Actually, I don't think we'd want to burden the task whose
fault caused the migration with finding and replacing and replacing all
pte's referecing the old page.  Using a real cache, this isn't a problem
because we replace the old page with a new one in the cache, and the
cache ptes reference the cache entry.  Tasks are free to fault in a real
pte for the new page at any time.  I'd hate to lose this capability.  I
believe that this is one of the reasons that Marcello used a real idr-
based cache for the migration cache.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Lhms-devel] [RFC 0/6] Swapless Page Migration V1: Overview
  2006-04-05 14:46 ` [Lhms-devel] [RFC 0/6] Swapless Page Migration V1: Overview Lee Schermerhorn
@ 2006-04-05 16:28   ` Christoph Lameter
  2006-04-05 16:58     ` Lee Schermerhorn
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2006-04-05 16:28 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, lhms-devel, Hirokazu Takahashi, Marcelo Tosatti,
	KAMEZAWA Hiroyuki

On Wed, 5 Apr 2006, Lee Schermerhorn wrote:

> Does this approach still allow "migrate-on-fault" for anon pages?

I am not aware of something that would be in the way.

> Especially, in the case where the migrating page has >1 pte referencing
> it?  How will the fault handler find all of the pte's referencing the
> old page?  Actually, I don't think we'd want to burden the task whose

The fault handler can find these via the reverse maps.

> fault caused the migration with finding and replacing and replacing all
> pte's referecing the old page.  Using a real cache, this isn't a problem
> because we replace the old page with a new one in the cache, and the
> cache ptes reference the cache entry.  Tasks are free to fault in a real
> pte for the new page at any time.  I'd hate to lose this capability.  I
> believe that this is one of the reasons that Marcello used a real idr-
> based cache for the migration cache.

We never allow a faulting in of the new page before migration is 
complete. The replacing of the swap ptes with real ptes was always done 
after migration was complete. Same thing here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Lhms-devel] [RFC 0/6] Swapless Page Migration V1: Overview
  2006-04-05 16:28   ` Christoph Lameter
@ 2006-04-05 16:58     ` Lee Schermerhorn
  2006-04-05 17:43       ` Christoph Lameter
  2006-04-05 18:17       ` Some ideas on lazy migration with swapless migration Christoph Lameter
  0 siblings, 2 replies; 23+ messages in thread
From: Lee Schermerhorn @ 2006-04-05 16:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, lhms-devel, Hirokazu Takahashi, Marcelo Tosatti,
	KAMEZAWA Hiroyuki

On Wed, 2006-04-05 at 09:28 -0700, Christoph Lameter wrote:
> On Wed, 5 Apr 2006, Lee Schermerhorn wrote:
> 
> > Does this approach still allow "migrate-on-fault" for anon pages?
> 
> I am not aware of something that would be in the way.
> 
> > Especially, in the case where the migrating page has >1 pte referencing
> > it?  How will the fault handler find all of the pte's referencing the
> > old page?  Actually, I don't think we'd want to burden the task whose
> 
> The fault handler can find these via the reverse maps.
> 
> > fault caused the migration with finding and replacing and replacing all
> > pte's referecing the old page.  Using a real cache, this isn't a problem
> > because we replace the old page with a new one in the cache, and the
> > cache ptes reference the cache entry.  Tasks are free to fault in a real
> > pte for the new page at any time.  I'd hate to lose this capability.  I
> > believe that this is one of the reasons that Marcello used a real idr-
> > based cache for the migration cache.
> 
> We never allow a faulting in of the new page before migration is 
> complete. The replacing of the swap ptes with real ptes was always done 
> after migration was complete. Same thing here.

Unless we're talking about different things [happens], my migrate-on-
fault patches do this.  Pages are unmapped from ptes and left hanging in
the cache until some task touches them.  Then the migration occurs, if
mapcount+policy so indicate, the new page replaces the old page in the
cache, the fault handler inserts a real pte referencing the new page and
removes one reference from the cache entry.  In the case of migration
cache, if this was the last pte reference, the entry is freed.  For the
swap cache, the page still references the swap entry and will until
explicitly removed.  If other task's ptes reference the cache entry, it
remains available, pointing at the new page, to resolve subsequent page
faults by those tasks.

Series starts with: 
http://marc.theaimsgroup.com/?l=linux-mm&m=114200021231527&w=4

I've been reworking these patches against your reorganized migration
code in 2.6.17-rc1.  I planned to resubmit after refreshing against 17-
rc1-mm1.  Unfortunately, 17-rc1-mm1 doesn't boot on my platform [sans
any of my patches], so now I'm investigating that...

In any case, I don't think we want to be walking reverse maps and other
task's pte's in one task's page fault path.  Perhaps "migrate-on-fault"
and "auto-migration" are not going to go anywhere, but if they do, we'll
need something like the existing swap/migration cache behavior, where
the temporary ptes reference a single [reference counted] cache entry
that points at either the old or new page.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Lhms-devel] [RFC 0/6] Swapless Page Migration V1: Overview
  2006-04-05 16:58     ` Lee Schermerhorn
@ 2006-04-05 17:43       ` Christoph Lameter
  2006-04-05 18:52         ` Lee Schermerhorn
  2006-04-05 18:17       ` Some ideas on lazy migration with swapless migration Christoph Lameter
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2006-04-05 17:43 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, lhms-devel, Hirokazu Takahashi, Marcelo Tosatti,
	KAMEZAWA Hiroyuki

On Wed, 5 Apr 2006, Lee Schermerhorn wrote:

> > We never allow a faulting in of the new page before migration is 
> > complete. The replacing of the swap ptes with real ptes was always done 
> > after migration was complete. Same thing here.
> 
> Unless we're talking about different things [happens], my migrate-on-
> fault patches do this.  Pages are unmapped from ptes and left hanging in
> the cache until some task touches them.  Then the migration occurs, if

Well you can only umap file backed pages. These are still working the same 
way. Anonymous pages can only be remapped in a different way not unmapped.
"unmap" of anonymous pages in todays kernels really means remap to swap 
space.

If you put the anonymous pages on swap then you can still have the old 
behavior but then you would require swap space.

> In any case, I don't think we want to be walking reverse maps and other
> task's pte's in one task's page fault path.  Perhaps "migrate-on-fault"
> and "auto-migration" are not going to go anywhere, but if they do, we'll
> need something like the existing swap/migration cache behavior, where
> the temporary ptes reference a single [reference counted] cache entry
> that points at either the old or new page.

No we certainly do not want to walk reverse maps in critical sections of 
the code.

I think the opportunistic lazy migration that we were talking about before 
would be fine with this scheme. You just check the refcount during the 
fault and then migrate the page if this would establish the first 
mapcount.

Pushing pages into the migration cache from the scheduler in order to 
migrate them later when references are to be reestablished will no longer 
work.
 
Would not swap be a more appropriate mechanism there? I mean the 
functionality that you want is almost exactly the same as swap. The 
checking of the mapcounts can then work the same way as opportunistic lazy 
migration.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Some ideas on lazy migration with swapless migration
  2006-04-05 16:58     ` Lee Schermerhorn
  2006-04-05 17:43       ` Christoph Lameter
@ 2006-04-05 18:17       ` Christoph Lameter
  1 sibling, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2006-04-05 18:17 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: linux-mm, lhms-devel, Hirokazu Takahashi, Marcelo Tosatti,
	KAMEZAWA Hiroyuki

I think it is possible to do lazy migration without having to resort to a 
migration cache by either

A. Forbidding write to the page. The corresponding invocation to
   to do_wp_page() on a write attempt can then be used to migrate the 
   page. However, this would only work for write attempts.

B. Clear the present bit. The corresponding invocation of do_swap_page 
   may check for the type of pte and do the lazy migration and then set 
   the present bit again.

Hmm... B. would be an even better way to replace SWP_TYPE_MIGRATION and 
not use the swap code at all (which would simply take a lock on the page 
and redo the fault after releasing the lock) but it would require some 
work to get arch support for clearing and setting the present bit. 
However, there are only a few arches supporting NUMA and migration. So it 
should be doable.

Maybe the idea with the present bit can be used to further simplify 
migration:

1. Before migration clear all the present bits which guarantees
   that the faults will stall in do_swap_page() since the page is
   locked. No need to reduce the mapcount since the ptes are still there
   and can be switched back to working condition by do_swap_page().

2. do_swap_page() will lock the page (and therefore stall during 
   migration). After the page lock is obtained we check the present bit if
   it is now set then redo the fault. If not then do lazy migration if 
   needed and set the bit.

3. Migration will move the page and then replace ptes with cleared 
   present bits with ptes pointing to the new page with the present bit 
   enabled. 

Since we do not reduce the mapcount, we can use that mapcount to verify 
that it is still safe to get to the corresponding anonymous vma for 
anonymous pages. Some portions of the vm would have to be fixed up to know 
how to deal with valid ptes that are not present (fork and unmap code).

For file backed pages we would not have to remove the references anymore. 
We can migrate in the same way as the anonymous pages. We just need to 
make sure to first change the mapping. That would be an important feature 
for us because it preserves the page state in a better way. We could also 
preserve the dirty bits and accessed bits in the pte.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Lhms-devel] [RFC 0/6] Swapless Page Migration V1: Overview
  2006-04-05 17:43       ` Christoph Lameter
@ 2006-04-05 18:52         ` Lee Schermerhorn
  0 siblings, 0 replies; 23+ messages in thread
From: Lee Schermerhorn @ 2006-04-05 18:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, lhms-devel, Hirokazu Takahashi, Marcelo Tosatti,
	KAMEZAWA Hiroyuki

On Wed, 2006-04-05 at 10:43 -0700, Christoph Lameter wrote:
> On Wed, 5 Apr 2006, Lee Schermerhorn wrote:
> 
> > > We never allow a faulting in of the new page before migration is 
> > > complete. The replacing of the swap ptes with real ptes was always done 
> > > after migration was complete. Same thing here.
> > 
> > Unless we're talking about different things [happens], my migrate-on-
> > fault patches do this.  Pages are unmapped from ptes and left hanging in
> > the cache until some task touches them.  Then the migration occurs, if
> 
> Well you can only umap file backed pages. These are still working the same 
> way. Anonymous pages can only be remapped in a different way not unmapped.
> "unmap" of anonymous pages in todays kernels really means remap to swap 
> space.

My point exactly.  And to get them to migrate on fault [which I want to
do], I need to unmap them and leave them that way until some task
touches them.

> 
> 
> If you put the anonymous pages on swap then you can still have the old 
> behavior but then you would require swap space.

Or a migration cache that behaves like swap, but doesn't actually
reserve disk space.

Note:  my traces show that the current [2.6.17-rc1] migration mechanism
only uses one swap entry at a time, per running instance of migration.
So, I don't think there is a hurry to eliminate this usage for "direct
migration".  If we accept migrate on fault, then pages can lay around in
the swap cache for some time.  That would motivate us to investigate a
solution that doesn't reserve swap.  


> > In any case, I don't think we want to be walking reverse maps and other
> > task's pte's in one task's page fault path.  Perhaps "migrate-on-fault"
> > and "auto-migration" are not going to go anywhere, but if they do, we'll
> > need something like the existing swap/migration cache behavior, where
> > the temporary ptes reference a single [reference counted] cache entry
> > that points at either the old or new page.
> 
> No we certainly do not want to walk reverse maps in critical sections of 
> the code.
> 
> I think the opportunistic lazy migration that we were talking about before 
> would be fine with this scheme. You just check the refcount during the 
> fault and then migrate the page if this would establish the first 
> mapcount.

The pages must exist in a cache with mapcount==0 at fault time [swap or
migration cache for anon pages] for this to work, right?

> 
> Pushing pages into the migration cache from the scheduler in order to 
> migrate them later when references are to be reestablished will no longer 
> work.

:-(, I know...

>  
> Would not swap be a more appropriate mechanism there? I mean the 
> functionality that you want is almost exactly the same as swap. The 
> checking of the mapcounts can then work the same way as opportunistic lazy 
> migration.

Yes.  We've discussed this before.  Swap works just fine for this.  My
current migrate-on-fault and auto-migration series does not change this.
The issue that we still need to work out [assuming these patches go
forward] is whether it's perferable to let such pages hang around in the
swap cache tying up swap device space that they never intend to use, or
to implement a pseudo-swap device like the migration cache to hold the
pte entries of unmapped anon pages.  I put the migration cache work on
hold to work up the aforementioned patch series.  I could do this,
because it works with swap.  If you remove the use of swap in
try_to_unmap(), etc., my patches would either have to put it back or
ressurect the migration cache sooner than planned.  As it stands,
migrate-on-fault is a relatively small change to the in-kernel migration
mechanism.

Lee



  

 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2006-04-05 18:52 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-04  6:57 [RFC 0/6] Swapless Page Migration V1: Overview Christoph Lameter
2006-04-04  6:57 ` [RFC 1/6] Swapless V1: try_to_unmap() - Rename ignrefs to "migration" Christoph Lameter
2006-04-04  6:57 ` [RFC 2/6] Swapless V1: Add SWP_TYPE_MIGRATION Christoph Lameter
2006-04-04 11:04   ` KAMEZAWA Hiroyuki
2006-04-04  6:57 ` [RFC 3/6] Swapless V1: try_to_unmap() - Create migration entries Christoph Lameter
2006-04-04  6:58 ` [RFC 4/6] Swapless V1: remove migration ptes Christoph Lameter
2006-04-04  6:58 ` [RFC 5/6] Swapless V1: Rip out swap migration code Christoph Lameter
2006-04-04 10:37   ` KAMEZAWA Hiroyuki
2006-04-04 15:06     ` Christoph Lameter
2006-04-05  1:06       ` KAMEZAWA Hiroyuki
2006-04-05  2:45         ` Christoph Lameter
2006-04-05  3:33           ` KAMEZAWA Hiroyuki
2006-04-05  3:47             ` Christoph Lameter
2006-04-05  4:07               ` KAMEZAWA Hiroyuki
2006-04-04  6:58 ` [RFC 6/6] Swapless V1: Revise main migration logic Christoph Lameter
2006-04-04 10:58   ` KAMEZAWA Hiroyuki
2006-04-04 14:24     ` Christoph Lameter
2006-04-05 14:46 ` [Lhms-devel] [RFC 0/6] Swapless Page Migration V1: Overview Lee Schermerhorn
2006-04-05 16:28   ` Christoph Lameter
2006-04-05 16:58     ` Lee Schermerhorn
2006-04-05 17:43       ` Christoph Lameter
2006-04-05 18:52         ` Lee Schermerhorn
2006-04-05 18:17       ` Some ideas on lazy migration with swapless migration Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox