[PATCH 0/2] Page migration via Swap V2: Overview

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Page migration via Swap V2: Overview
@ 2005-10-18  0:49 Christoph Lameter
  2005-10-18  0:49 ` [PATCH 1/2] Page migration via Swap V2: Page Eviction Christoph Lameter
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Christoph Lameter @ 2005-10-18  0:49 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, ak, Christoph Lameter, lhms-devel

In a NUMA system it is often beneficial to be able to move the memory
in use by a process to different nodes in order to enhance performance.
Currently Linux simply does not support this facility.

Page migration is also useful for other purposes:

1. Memory hotplug. Migrating processes off a memory node that is going
   to be disconnected.

2. Remapping of bad pages. These could be detected through soft ECC errors
   and other mechanisms.

Work on page migration has been done in the context of the memory hotplug project
(see https://lists.sourceforge.net/lists/listinfo/lhms-devel). Ray Bryant
has also posted a series of manual page migration patchsets. However, the patches
are complex, and may have impacts on the VM in various places, there are unresolved
issues regarding memory placement during direct migration and thus the functionality
may not be available for some time.

This patchset was done in awareness of the work done there and realizes page
migration via swap. Pages are not directly moved to their target
location but simply swapped out. If the application touches the page later then
a new page is allocated in the desired location.

The advantage of page based swapping is that the necessary changes to the kernel
are minimal. With a fully functional but minimal page migration capability we
will be able to enhance low level code and higher level APIs at the same time.
This will hopefully decrease the time needed to get the code for direct page
migration working and into the kernel trees.

The disadvantage over direct page migration are:

A. Performance: Having to go through swap is slower.

B. The need for swap space: The area to be migrated must fit into swap.

C. Placement of pages at swapin is done under the memory policy in
   effect at that time. This may destroy nodeset relative positioning.

The advantages over direct page migration:

A. More general and less of an impact on the system

B. Uses the proven swap code. No new page behavior that
   may have to be considered in other places of the VM.

C. May be used for additional purposes like suspending an application
   by swapping it out.

The patchset consists of two patches:

1. Page eviction patch

Modifies mm/vmscan.c to add functions to isolate pages from the LRU lists,
swapout lists of pages and return pages to the LRU lists.

2. MPOL_MF_MOVE flag for memory policies.

This implements MPOL_MF_MOVE in addition to MPOL_MF_STRICT. MPOL_MF_STRICT
allows the checking if all pages in a memory area obey the memory policies.
MPOL_MF_MOVE will evict all pages that do not conform to the memory policy.
The system will allocate pages conforming to the policy on swap in.

URLs referring to the discussion regarding the initial version of these
patches.

Page eviction: http://marc.theaimsgroup.com/?l=linux-mm&m=112922756730989&w=2
Numa policy  : http://marc.theaimsgroup.com/?l=linux-mm&m=112922756724715&w=2

Changes from V1:
- Patch against 2.6.14-rc4-mm1
- Remove move_pages() function
- Code cleanup to make it less invasive.
- Fix missing lru_add_drain() invocation from isolate_lru_page()

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] Page migration via Swap V2: Page Eviction
  2005-10-18  0:49 [PATCH 0/2] Page migration via Swap V2: Overview Christoph Lameter
@ 2005-10-18  0:49 ` Christoph Lameter
  2005-10-18  1:04   ` Andrew Morton
  2005-10-18  8:34   ` Magnus Damm
  2005-10-18  0:49 ` [PATCH 2/2] Page migration via Swap V2: MPOL_MF_MOVE interface Christoph Lameter
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: Christoph Lameter @ 2005-10-18  0:49 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, lhms-devel, ak, Christoph Lameter

This patch adds functions that allow the eviction of pages to swap space.
Page eviction may be useful to migrate pages, to suspend programs or for
ummapping single pages (useful for faulty pages or pages with soft ECC
failures)

The process is as follows:

The function wanting to evict pages must first build a list of pages to be evicted
and take them off the lru lists. This is done using the isolate_lru_page function.
isolate_lru_page determines that a page is freeable based on the LRU bit set and
adds the page if it is indeed freeable to the list specified.
isolate_lru_page will return 0 for a page that is not freeable.

Then the actual swapout can happen by calling swapout_pages().

swapout_pages does its best to swapout the pages and does multiple passes over the list.
However, swapout_pages may not be able to evict all pages for a variety of reasons.

The remaining pages may be returned to the LRU lists using putback_lru_pages().

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.14-rc4-mm1/include/linux/swap.h
===================================================================
--- linux-2.6.14-rc4-mm1.orig/include/linux/swap.h	2005-10-17 10:24:16.000000000 -0700
+++ linux-2.6.14-rc4-mm1/include/linux/swap.h	2005-10-17 17:29:49.000000000 -0700
@@ -176,6 +176,10 @@ extern int zone_reclaim(struct zone *, u
 extern int shrink_all_memory(int);
 extern int vm_swappiness;
 
+extern int isolate_lru_page(struct page *p, struct list_head *l);
+extern int swapout_pages(struct list_head *l);
+extern int putback_lru_pages(struct list_head *l);
+
 #ifdef CONFIG_MMU
 /* linux/mm/shmem.c */
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
Index: linux-2.6.14-rc4-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.14-rc4-mm1.orig/mm/vmscan.c	2005-10-17 10:24:30.000000000 -0700
+++ linux-2.6.14-rc4-mm1/mm/vmscan.c	2005-10-17 16:19:21.000000000 -0700
@@ -564,6 +564,144 @@ keep:
 }
 
 /*
+ * Swapout evicts the pages on the list to swap space.
+ * This is essentially a dumbed down version of shrink_list
+ *
+ * returns the number of pages that were not evictable
+ *
+ * Multiple passes are performed over the list. The first
+ * pass avoids waiting on locks and triggers writeout
+ * actions. Later passes begin to wait on locks in order
+ * to have a better chance of acquiring the lock.
+ */
+int swapout_pages(struct list_head *l)
+{
+	int retry;
+	int failed;
+	int pass = 0;
+	struct page *page;
+	struct page *page2;
+
+	current->flags |= PF_KSWAPD;
+
+redo:
+	retry = 0;
+	failed = 0;
+
+	list_for_each_entry_safe(page, page2, l, lru) {
+		struct address_space *mapping;
+
+		cond_resched();
+
+		/*
+		 * Skip locked pages during the first two passes to give the
+		 * functions holding the lock time to release the page. Later we use
+		 * lock_page to have a higher chance of acquiring the lock.
+		 */
+		if (pass > 2)
+			lock_page(page);
+		else
+			if (TestSetPageLocked(page))
+				goto retry_later;
+
+		/*
+		 * Only wait on writeback if we have already done a pass where
+		 * we we may have triggered writeouts for lots of pages.
+		 */
+		if (pass > 0)
+			wait_on_page_writeback(page);
+		else
+			if (PageWriteback(page))
+				goto retry_later_locked;
+
+#ifdef CONFIG_SWAP
+		if (PageAnon(page) && !PageSwapCache(page)) {
+			if (!add_to_swap(page))
+				goto failed;
+		}
+#endif /* CONFIG_SWAP */
+
+		mapping = page_mapping(page);
+		if (page_mapped(page) && mapping)
+			if (try_to_unmap(page) != SWAP_SUCCESS)
+				goto retry_later_locked;
+
+		if (PageDirty(page)) {
+			/* Page is dirty, try to write it out here */
+			switch(pageout(page, mapping)) {
+			case PAGE_KEEP:
+			case PAGE_ACTIVATE:
+				goto retry_later_locked;
+			case PAGE_SUCCESS:
+				goto retry_later;
+			case PAGE_CLEAN:
+				; /* try to free the page below */
+			}
+		}
+
+		if (PagePrivate(page)) {
+			if (!try_to_release_page(page, GFP_KERNEL))
+				goto retry_later_locked;
+			if (!mapping && page_count(page) == 1)
+				goto free_it;
+		}
+
+		if (!mapping)
+			goto retry_later_locked;       /* truncate got there first */
+
+		write_lock_irq(&mapping->tree_lock);
+
+		if (page_count(page) != 2 || PageDirty(page)) {
+			write_unlock_irq(&mapping->tree_lock);
+			goto retry_later_locked;
+		}
+
+#ifdef CONFIG_SWAP
+		if (PageSwapCache(page)) {
+			swp_entry_t swap = { .val = page->private };
+			__delete_from_swap_cache(page);
+			write_unlock_irq(&mapping->tree_lock);
+			swap_free(swap);
+			__put_page(page);       /* The pagecache ref */
+			goto free_it;
+		}
+#endif /* CONFIG_SWAP */
+
+		__remove_from_page_cache(page);
+		write_unlock_irq(&mapping->tree_lock);
+		__put_page(page);
+
+free_it:
+		/*
+		 * We may free pages that were taken off the active list
+		 * by isolate_lru_page. However, free_hot_cold_page will check
+		 * if the active bit is set. So clear it.
+		 */
+		ClearPageActive(page);
+
+		list_del(&page->lru);
+		unlock_page(page);
+		put_page(page);
+		continue;
+
+failed:
+		failed++;
+		unlock_page(page);
+		continue;
+
+retry_later_locked:
+		unlock_page(page);
+retry_later:
+		retry++;
+	}
+	if (retry && pass++ < 10)
+		goto redo;
+
+	current->flags &= ~PF_KSWAPD;
+	return failed + retry;
+}
+
+/*
  * zone->lru_lock is heavily contended.  Some of the functions that
  * shrink the lists perform better by taking out a batch of pages
  * and working on them outside the LRU lock.
@@ -612,6 +750,63 @@ static int isolate_lru_pages(int nr_to_s
 	return nr_taken;
 }
 
+static void lru_add_drain_per_cpu(void *dummy)
+{
+	lru_add_drain();
+}
+
+/*
+ * Isolate one page from the LRU lists and put it on the
+ * indicated list.
+ *
+ * Result:
+ *  0 = page not on LRU list
+ *  1 = page removed from LRU list and added to the specified list.
+ * -1 = page is being freed elsewhere.
+ */
+int isolate_lru_page(struct page *page, struct list_head *l)
+{
+	int rc = 0;
+	struct zone *zone = page_zone(page);
+
+redo:
+	spin_lock_irq(&zone->lru_lock);
+	if (TestClearPageLRU(page)) {
+		list_del(&page->lru);
+		if (get_page_testone(page)) {
+			/*
+			 * It is being freed elsewhere
+			 */
+			__put_page(page);
+			SetPageLRU(page);
+			if (PageActive(page))
+				list_add(&page->lru, &zone->active_list);
+			else
+				list_add(&page->lru, &zone->inactive_list);
+			rc = -1;
+		} else {
+			list_add(&page->lru, l);
+			if (PageActive(page))
+				zone->nr_active--;
+			else
+				zone->nr_inactive--;
+			rc = 1;
+		}
+	}
+	spin_unlock_irq(&zone->lru_lock);
+	if (rc == 0) {
+		/*
+		 * Maybe this page is still waiting for a cpu to drain it
+		 * from one of the lru lists?
+		 */
+		smp_call_function(&lru_add_drain_per_cpu, NULL, 0, 1);
+		lru_add_drain();
+		if (PageLRU(page))
+			goto redo;
+	}
+	return rc;
+}
+
 /*
  * shrink_cache() adds the number of pages reclaimed to sc->nr_reclaimed
  */
@@ -678,6 +873,32 @@ done:
 }
 
 /*
+ * Add isolated pages back on the LRU lists
+ */
+int putback_lru_pages(struct list_head *l)
+{
+	struct page * page;
+	struct page * page2;
+	int count = 0;
+
+	list_for_each_entry_safe(page, page2, l, lru) {
+		struct zone *zone = page_zone(page);
+
+		spin_lock_irq(&zone->lru_lock);
+		list_del(&page->lru);
+		if (!TestSetPageLRU(page)) {
+			if (PageActive(page))
+				add_page_to_active_list(zone, page);
+			else
+				add_page_to_inactive_list(zone, page);
+			count++;
+		}
+		spin_unlock_irq(&zone->lru_lock);
+	}
+	return count;
+}
+
+/*
  * This moves pages from the active list to the inactive list.
  *
  * We move them the other way if the page is referenced by one or more

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/2] Page migration via Swap V2: MPOL_MF_MOVE interface
  2005-10-18  0:49 [PATCH 0/2] Page migration via Swap V2: Overview Christoph Lameter
  2005-10-18  0:49 ` [PATCH 1/2] Page migration via Swap V2: Page Eviction Christoph Lameter
@ 2005-10-18  0:49 ` Christoph Lameter
  2005-10-18 10:05   ` Magnus Damm
  2005-10-18  3:18 ` [PATCH 0/2] Page migration via Swap V2: Overview KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2005-10-18  0:49 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, ak, Christoph Lameter, lhms-devel

This patch adds page migration support to the NUMA policy layer. An additional
flag MPOL_MF_MOVE is introduced for mbind. If MPOL_MF_MOVE is specified then
pages that do not conform to the memory policy will be evicted from memory.
When they get pages back in new pages will be allocated following the numa policy.

Version 2
- Add vma_migratable() function for future enhancements.
- Remove function with side effects from WARN_ON
- Remove move_pages
- Make patch fit 2.6.14-rc4-mm1

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.14-rc4-mm1/mm/mempolicy.c
===================================================================
--- linux-2.6.14-rc4-mm1.orig/mm/mempolicy.c	2005-10-17 10:24:16.000000000 -0700
+++ linux-2.6.14-rc4-mm1/mm/mempolicy.c	2005-10-17 17:37:39.000000000 -0700
@@ -83,6 +83,7 @@
 #include <linux/init.h>
 #include <linux/compat.h>
 #include <linux/mempolicy.h>
+#include <linux/swap.h>
 #include <asm/tlbflush.h>
 #include <asm/uaccess.h>
 
@@ -181,7 +182,8 @@ static struct mempolicy *mpol_new(int mo
 
 /* Ensure all existing pages follow the policy. */
 static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
-		unsigned long addr, unsigned long end, nodemask_t *nodes)
+		unsigned long addr, unsigned long end,
+		nodemask_t *nodes, struct list_head *pagelist)
 {
 	pte_t *orig_pte;
 	pte_t *pte;
@@ -200,15 +202,28 @@ static int check_pte_range(struct vm_are
 			continue;
 		}
 		nid = pfn_to_nid(pfn);
-		if (!node_isset(nid, *nodes))
-			break;
+		if (!node_isset(nid, *nodes)) {
+			if (pagelist) {
+				struct page *page = pfn_to_page(pfn);
+				int rc = isolate_lru_page(page, pagelist);
+
+				/*
+				 * If the isolate attempt was not successful
+				 * then we just encountered an unswappable
+				 * page. Something must be wrong.
+			 	 */
+				WARN_ON(rc == 0);
+			} else
+				break;
+		}
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 	pte_unmap_unlock(orig_pte, ptl);
 	return addr != end;
 }
 
 static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
-		unsigned long addr, unsigned long end, nodemask_t *nodes)
+		unsigned long addr, unsigned long end,
+		nodemask_t *nodes, struct list_head *pagelist)
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -218,14 +233,15 @@ static inline int check_pmd_range(struct
 		next = pmd_addr_end(addr, end);
 		if (pmd_none_or_clear_bad(pmd))
 			continue;
-		if (check_pte_range(vma, pmd, addr, next, nodes))
+		if (check_pte_range(vma, pmd, addr, next, nodes, pagelist))
 			return -EIO;
 	} while (pmd++, addr = next, addr != end);
 	return 0;
 }
 
 static inline int check_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
-		unsigned long addr, unsigned long end, nodemask_t *nodes)
+		unsigned long addr, unsigned long end,
+		nodemask_t *nodes, struct list_head *pagelist)
 {
 	pud_t *pud;
 	unsigned long next;
@@ -235,14 +251,15 @@ static inline int check_pud_range(struct
 		next = pud_addr_end(addr, end);
 		if (pud_none_or_clear_bad(pud))
 			continue;
-		if (check_pmd_range(vma, pud, addr, next, nodes))
+		if (check_pmd_range(vma, pud, addr, next, nodes, pagelist))
 			return -EIO;
 	} while (pud++, addr = next, addr != end);
 	return 0;
 }
 
 static inline int check_pgd_range(struct vm_area_struct *vma,
-		unsigned long addr, unsigned long end, nodemask_t *nodes)
+		unsigned long addr, unsigned long end,
+		nodemask_t *nodes, struct list_head *pagelist)
 {
 	pgd_t *pgd;
 	unsigned long next;
@@ -252,16 +269,30 @@ static inline int check_pgd_range(struct
 		next = pgd_addr_end(addr, end);
 		if (pgd_none_or_clear_bad(pgd))
 			continue;
-		if (check_pud_range(vma, pgd, addr, next, nodes))
+		if (check_pud_range(vma, pgd, addr, next, nodes, pagelist))
 			return -EIO;
 	} while (pgd++, addr = next, addr != end);
 	return 0;
 }
 
+/* Check if a vma is migratable */
+static inline int vma_migratable(struct vm_area_struct *vma)
+{
+	if (vma->vm_flags & (
+			VM_LOCKED |
+			VM_IO |
+			VM_RESERVED |
+			VM_DENYWRITE |
+			VM_SHM
+	   ))
+		return 0;
+	return 1;
+}
+
 /* Step 1: check the range */
 static struct vm_area_struct *
 check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
-	    nodemask_t *nodes, unsigned long flags)
+	    nodemask_t *nodes, unsigned long flags, struct list_head *pagelist)
 {
 	int err;
 	struct vm_area_struct *first, *vma, *prev;
@@ -277,13 +308,16 @@ check_range(struct mm_struct *mm, unsign
 			return ERR_PTR(-EFAULT);
 		if (prev && prev->vm_end < vma->vm_start)
 			return ERR_PTR(-EFAULT);
-		if ((flags & MPOL_MF_STRICT) && !is_vm_hugetlb_page(vma)) {
+		if (!is_vm_hugetlb_page(vma) &&
+		    ((flags & MPOL_MF_STRICT) ||
+		     ((flags & MPOL_MF_MOVE) && vma_migratable(vma))
+		   )) {
 			unsigned long endvma = vma->vm_end;
 			if (endvma > end)
 				endvma = end;
 			if (vma->vm_start > start)
 				start = vma->vm_start;
-			err = check_pgd_range(vma, start, endvma, nodes);
+			err = check_pgd_range(vma, start, endvma, nodes, pagelist);
 			if (err) {
 				first = ERR_PTR(err);
 				break;
@@ -357,21 +391,28 @@ long do_mbind(unsigned long start, unsig
 	struct mempolicy *new;
 	unsigned long end;
 	int err;
+	LIST_HEAD(pagelist);
 
-	if ((flags & ~(unsigned long)(MPOL_MF_STRICT)) || mode > MPOL_MAX)
+	if ((flags & ~(unsigned long)(MPOL_MF_STRICT | MPOL_MF_MOVE))
+	    || mode > MPOL_MAX)
 		return -EINVAL;
 	if (start & ~PAGE_MASK)
 		return -EINVAL;
+
 	if (mode == MPOL_DEFAULT)
 		flags &= ~MPOL_MF_STRICT;
+
 	len = (len + PAGE_SIZE - 1) & PAGE_MASK;
 	end = start + len;
+
 	if (end < start)
 		return -EINVAL;
 	if (end == start)
 		return 0;
+
 	if (mpol_check_policy(mode, nmask))
 		return -EINVAL;
+
 	new = mpol_new(mode, nmask);
 	if (IS_ERR(new))
 		return PTR_ERR(new);
@@ -380,10 +421,19 @@ long do_mbind(unsigned long start, unsig
 			mode,nodes_addr(nodes)[0]);
 
 	down_write(&mm->mmap_sem);
-	vma = check_range(mm, start, end, nmask, flags);
+	vma = check_range(mm, start, end, nmask, flags,
+			  (flags & MPOL_MF_MOVE) ? &pagelist : NULL);
 	err = PTR_ERR(vma);
-	if (!IS_ERR(vma))
+	if (!IS_ERR(vma)) {
 		err = mbind_range(vma, start, end, new);
+		if (!list_empty(&pagelist))
+			swapout_pages(&pagelist);
+		if (!err  && !list_empty(&pagelist) && (flags & MPOL_MF_STRICT))
+				err = -EIO;
+	}
+	if (!list_empty(&pagelist))
+		putback_lru_pages(&pagelist);
+
 	up_write(&mm->mmap_sem);
 	mpol_free(new);
 	return err;
Index: linux-2.6.14-rc4-mm1/include/linux/mempolicy.h
===================================================================
--- linux-2.6.14-rc4-mm1.orig/include/linux/mempolicy.h	2005-10-17 10:24:13.000000000 -0700
+++ linux-2.6.14-rc4-mm1/include/linux/mempolicy.h	2005-10-17 17:33:34.000000000 -0700
@@ -22,6 +22,7 @@
 
 /* Flags for mbind */
 #define MPOL_MF_STRICT	(1<<0)	/* Verify existing pages in the mapping */
+#define MPOL_MF_MOVE	(1<<1)	/* Move pages to conform to mapping */
 
 #ifdef __KERNEL__
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Page migration via Swap V2: Page Eviction
  2005-10-18  0:49 ` [PATCH 1/2] Page migration via Swap V2: Page Eviction Christoph Lameter
@ 2005-10-18  1:04   ` Andrew Morton
  2005-10-18  8:51     ` Nick Piggin
  2005-10-18 16:38     ` Christoph Lameter
  2005-10-18  8:34   ` Magnus Damm
  1 sibling, 2 replies; 20+ messages in thread
From: Andrew Morton @ 2005-10-18  1:04 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, lhms-devel, ak

Christoph Lameter <clameter@sgi.com> wrote:
>
> +		write_lock_irq(&mapping->tree_lock);
>  +
>  +		if (page_count(page) != 2 || PageDirty(page)) {
>  +			write_unlock_irq(&mapping->tree_lock);
>  +			goto retry_later_locked;
>  +		}

This needs the (uncommented (grr)) smp_rmb() copied-and-pasted as well.

It's a shame about the copy-and-pasting :(   Is it unavoidable?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] Page migration via Swap V2: Overview
  2005-10-18  0:49 [PATCH 0/2] Page migration via Swap V2: Overview Christoph Lameter
  2005-10-18  0:49 ` [PATCH 1/2] Page migration via Swap V2: Page Eviction Christoph Lameter
  2005-10-18  0:49 ` [PATCH 2/2] Page migration via Swap V2: MPOL_MF_MOVE interface Christoph Lameter
@ 2005-10-18  3:18 ` KAMEZAWA Hiroyuki
  2005-10-18 14:27   ` [Lhms-devel] " Lee Schermerhorn
  2005-10-18 16:47   ` Christoph Lameter
  2005-10-18  6:37 ` KAMEZAWA Hiroyuki
  2005-10-18 12:16 ` Marcelo Tosatti
  4 siblings, 2 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2005-10-18  3:18 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, ak, lhms-devel

Hi,

Christoph Lameter wrote:

> The disadvantage over direct page migration are:
> 
> A. Performance: Having to go through swap is slower.
> 
> B. The need for swap space: The area to be migrated must fit into swap.
> 
I think migration cache will work well for A & B :)
migraction cache is virtual swap, just unmap a page and modifies it as a swap cache.

> C. Placement of pages at swapin is done under the memory policy in
>    effect at that time. This may destroy nodeset relative positioning.
> 
How about this ?
==
1. do_mbind()
2. unmap and moves to migraction cache
3. touch all pages
==
For 3., 2. should gather all present virtual address list...

D. We need another page-cache migration functions for moving page-cache :(
    Moving just anon is not for memory-hotplug.
    (BTW, how should pages in page cache be affected by memory location control ??
     I think some people discussed about that...)

-- Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] Page migration via Swap V2: Overview
  2005-10-18  0:49 [PATCH 0/2] Page migration via Swap V2: Overview Christoph Lameter
                   ` (2 preceding siblings ...)
  2005-10-18  3:18 ` [PATCH 0/2] Page migration via Swap V2: Overview KAMEZAWA Hiroyuki
@ 2005-10-18  6:37 ` KAMEZAWA Hiroyuki
  2005-10-18 16:50   ` Christoph Lameter
  2005-10-18 12:16 ` Marcelo Tosatti
  4 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2005-10-18  6:37 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, lhms-devel, jschopp, Andrew Morton

Hi,
Christoph Lameter wrote:
> The patchset consists of two patches:
> 
> 1. Page eviction patch
> 
> Modifies mm/vmscan.c to add functions to isolate pages from the LRU lists,
> swapout lists of pages and return pages to the LRU lists.
> 
> 2. MPOL_MF_MOVE flag for memory policies.
> 
> This implements MPOL_MF_MOVE in addition to MPOL_MF_STRICT. MPOL_MF_STRICT
> allows the checking if all pages in a memory area obey the memory policies.
> MPOL_MF_MOVE will evict all pages that do not conform to the memory policy.
> The system will allocate pages conforming to the policy on swap in.
> 

Because sys_mbind() acquires mm->mmap_sem, once page is unmapped,
all accesses to the page are blocked.

So, even if the range contains hot pages, there will not be
hard-to-be-swapped-out pages. right ?

sys_mbind() can aquire mm->mmap_sem for migrating *a process's page*,
but memory-hotplug cannot aquire the lock for migrating a chunk of pages.

I think we'll need radix_tree_replace for migating arbitrary chunk of pages, anyway.

Thanks,
-- Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Page migration via Swap V2: Page Eviction
  2005-10-18  0:49 ` [PATCH 1/2] Page migration via Swap V2: Page Eviction Christoph Lameter
  2005-10-18  1:04   ` Andrew Morton
@ 2005-10-18  8:34   ` Magnus Damm
  2005-10-18 16:43     ` Christoph Lameter
  1 sibling, 1 reply; 20+ messages in thread
From: Magnus Damm @ 2005-10-18  8:34 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, lhms-devel, ak

On 10/18/05, Christoph Lameter <clameter@sgi.com> wrote:
> +/*
> + * Isolate one page from the LRU lists and put it on the
> + * indicated list.
> + *
> + * Result:
> + *  0 = page not on LRU list
> + *  1 = page removed from LRU list and added to the specified list.
> + * -1 = page is being freed elsewhere.
> + */
> +int isolate_lru_page(struct page *page, struct list_head *l)
> +{
> +       int rc = 0;
> +       struct zone *zone = page_zone(page);
> +
> +redo:
> +       spin_lock_irq(&zone->lru_lock);
> +       if (TestClearPageLRU(page)) {
> +               list_del(&page->lru);
> +               if (get_page_testone(page)) {
> +                       /*
> +                        * It is being freed elsewhere
> +                        */
> +                       __put_page(page);
> +                       SetPageLRU(page);
> +                       if (PageActive(page))
> +                               list_add(&page->lru, &zone->active_list);
> +                       else
> +                               list_add(&page->lru, &zone->inactive_list);
> +                       rc = -1;
> +               } else {
> +                       list_add(&page->lru, l);
> +                       if (PageActive(page))
> +                               zone->nr_active--;
> +                       else
> +                               zone->nr_inactive--;
> +                       rc = 1;
> +               }
> +       }
> +       spin_unlock_irq(&zone->lru_lock);
> +       if (rc == 0) {
> +               /*
> +                * Maybe this page is still waiting for a cpu to drain it
> +                * from one of the lru lists?
> +                */
> +               smp_call_function(&lru_add_drain_per_cpu, NULL, 0, 1);
> +               lru_add_drain();
> +               if (PageLRU(page))
> +                       goto redo;
> +       }
> +       return rc;
> +}

This function is very similar to isolate_lru_pages(), except that it
operates on one page at a time and drains the lru if needed. Maybe
isolate_lru_pages() could use this function (inline) if the spinlock
and drain code was moved out?

I'm also curios why you choose to always use list_del() and move back
the page if freed elsewhere, instead of using
del_page_from_[in]active_list(). I guess because of performance. But
if that is the case, wouldn't it make sense to do as little as
possible with the spinlock held, ie move list_add() (when rc == 1) out
of the function?

I'd love to see those patches included somewhere, it would help me a
lot when I build code for separated mapped and unmapped LRU:s.

/ magnus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Page migration via Swap V2: Page Eviction
  2005-10-18  1:04   ` Andrew Morton
@ 2005-10-18  8:51     ` Nick Piggin
  2005-10-18 16:38     ` Christoph Lameter
  1 sibling, 0 replies; 20+ messages in thread
From: Nick Piggin @ 2005-10-18  8:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, linux-mm, lhms-devel, ak

Andrew Morton wrote:

>Christoph Lameter <clameter@sgi.com> wrote:
>
>>+		write_lock_irq(&mapping->tree_lock);
>> +
>> +		if (page_count(page) != 2 || PageDirty(page)) {
>> +			write_unlock_irq(&mapping->tree_lock);
>> +			goto retry_later_locked;
>> +		}
>>
>
>This needs the (uncommented (grr)) smp_rmb() copied-and-pasted as well.
>
>It's a shame about the copy-and-pasting :(   Is it unavoidable?
>
>

It is commented. The comment says that page_count must be tested
before PageDirty. The code simply didn't match the comment before,
so it didn't warrant any more commenting aside from the changelog.

Nick


Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Page migration via Swap V2: MPOL_MF_MOVE interface
  2005-10-18  0:49 ` [PATCH 2/2] Page migration via Swap V2: MPOL_MF_MOVE interface Christoph Lameter
@ 2005-10-18 10:05   ` Magnus Damm
  2005-10-18 16:46     ` Christoph Lameter
  0 siblings, 1 reply; 20+ messages in thread
From: Magnus Damm @ 2005-10-18 10:05 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, ak, lhms-devel

Hi again,

On 10/18/05, Christoph Lameter <clameter@sgi.com> wrote:
> +       vma = check_range(mm, start, end, nmask, flags,
> +                         (flags & MPOL_MF_MOVE) ? &pagelist : NULL);
>         err = PTR_ERR(vma);
> -       if (!IS_ERR(vma))
> +       if (!IS_ERR(vma)) {
>                 err = mbind_range(vma, start, end, new);
> +               if (!list_empty(&pagelist))
> +                       swapout_pages(&pagelist);
> +               if (!err  && !list_empty(&pagelist) && (flags & MPOL_MF_STRICT))
> +                               err = -EIO;
> +       }
> +       if (!list_empty(&pagelist))
> +               putback_lru_pages(&pagelist);

isolate_lru_page() calls get_page_testone(), and swapout_pages() seems
to call __put_page(). But who decrements page->_count in the case of
putback_lru_pages()?

/ magnus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] Page migration via Swap V2: Overview
  2005-10-18  0:49 [PATCH 0/2] Page migration via Swap V2: Overview Christoph Lameter
                   ` (3 preceding siblings ...)
  2005-10-18  6:37 ` KAMEZAWA Hiroyuki
@ 2005-10-18 12:16 ` Marcelo Tosatti
  2005-10-18 16:54   ` Christoph Lameter
  4 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2005-10-18 12:16 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, ak, lhms-devel

On Mon, Oct 17, 2005 at 05:49:32PM -0700, Christoph Lameter wrote:
> In a NUMA system it is often beneficial to be able to move the memory
> in use by a process to different nodes in order to enhance performance.
> Currently Linux simply does not support this facility.
> 
> Page migration is also useful for other purposes:
> 
> 1. Memory hotplug. Migrating processes off a memory node that is going
>    to be disconnected.
> 
> 2. Remapping of bad pages. These could be detected through soft ECC errors
>    and other mechanisms.
> 
> Work on page migration has been done in the context of the memory hotplug project
> (see https://lists.sourceforge.net/lists/listinfo/lhms-devel). Ray Bryant
> hs also posted a series of manual page migration patchsets. However, the patches
> are complex, and may have impacts on the VM in various places, there are unresolved
> issues regarding memory placement during direct migration and thus the functionality
> may not be available for some time.

Is there a description of the unresolved issues you mention somewhere?

Having a duplicate implementation is somewhat disappointing - why not fix the problems
with real page migration?

> This patchset was done in awareness of the work done there and realizes page
> migration via swap. Pages are not directly moved to their target
> location but simply swapped out. If the application touches the page later then
> a new page is allocated in the desired location.
> 
> The advantage of page based swapping is that the necessary changes to the kernel
> are minimal. With a fully functional but minimal page migration capability we
> will be able to enhance low level code and higher level APIs at the same time.

> This will hopefully decrease the time needed to get the code for direct page
> migration working and into the kernel trees.

Why would that be the case?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Lhms-devel] Re: [PATCH 0/2] Page migration via Swap V2: Overview
  2005-10-18  3:18 ` [PATCH 0/2] Page migration via Swap V2: Overview KAMEZAWA Hiroyuki
@ 2005-10-18 14:27   ` Lee Schermerhorn
  2005-10-18 16:47   ` Christoph Lameter
  1 sibling, 0 replies; 20+ messages in thread
From: Lee Schermerhorn @ 2005-10-18 14:27 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Christoph Lameter, akpm, linux-mm, ak, lhms-devel, Avelino F. Zorzo

On Tue, 2005-10-18 at 12:18 +0900, KAMEZAWA Hiroyuki wrote:
> Hi,
> 
> Christoph Lameter wrote:
> 
> > The disadvantage over direct page migration are:
> > 
> > A. Performance: Having to go through swap is slower.
> > 
> > B. The need for swap space: The area to be migrated must fit into swap.
> > 
> I think migration cache will work well for A & B :)
> migraction cache is virtual swap, just unmap a page and modifies it as a swap cache.

I submitted a "reworked" migration cache patch back on 20sep:

http://marc.theaimsgroup.com/?l=lhms-devel&m=112724852823727&w=4

The "rework", based on a conversation with Marcello, attempts to hide
most of the migration cache behind the swap interface.  Of course, the
decision to add a page to the swap cache vs the migration cache must be
explicit, but once added to either cache a page can be manipulated
almost entirely via [slightly modified] swap APIs to limit propagation
of changes to other parts of vm.

I have used this version of the migration cache successfully with Ray
Bryant's manual page migration [based on a 2.6.13-rc3-git7-mhp2 tree]
and with a prototype "lazy page migration" patch [work in progress] that
works similar to Christoph's current patch under discussion.

> 
> > C. Placement of pages at swapin is done under the memory policy in
> >    effect at that time. This may destroy nodeset relative positioning.
> > 
> How about this ?
> ==
> 1. do_mbind()
> 2. unmap and moves to migraction cache
> 3. touch all pages

Touching all pages could be optional [an additional flag to mbind()].
Then a process only migrates pages as they are used.  Maybe not all of
the pages marked for migration will actually be used by the process
before one decides to migrate it again.  However, then we'd need a way
to find pages in the migration cache and move them to the swap cache for
page out under memory pressure.  Marcello mentioned this way back when
he first proposed the migration cache.  I'm thinking that shrink_list()
could probably do this when it finds an anon page in the "swap cache"--
i.e., check if it's really in the migration cache and if may_swap, move
it to the swap cache.

> ==
> For 3., 2. should gather all present virtual address list...
> 
> D. We need another page-cache migration functions for moving page-cache :(
>     Moving just anon is not for memory-hotplug.
>     (BTW, how should pages in page cache be affected by memory location control ??
>      I think some people discussed about that...)

If when scanning a range of virtual addresses [from mbind()], one
encounters non-anon pages and unmaps them [e.g., via page_migratable()-
>try_to_unmap()], they can be refaulted from the cache or backing store
on next touch.  Of course, they won't have been migrated yet.  We'd need
to mark the pages to be tested for migration in the fault path.  I've
used a "PageCheckPolicy" flag [yet another page flag :-(] to indicate
that the page location must be checked against the policy at fault time.
This is less expensive than querying the policy for the 'correct'
location on each fault.  Note that pages in the migration must also be
so marked because they aren't swapped out.  Similar for pages in the
swap cache if they aren't actually swapped out.  We'd need to clear this
flag when the pages are freed if they haven't been migrated yet [flag is
tested/cleared in fault path].

Regards,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Page migration via Swap V2: Page Eviction
  2005-10-18  1:04   ` Andrew Morton
  2005-10-18  8:51     ` Nick Piggin
@ 2005-10-18 16:38     ` Christoph Lameter
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2005-10-18 16:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, lhms-devel, ak, nickpiggin

On Mon, 17 Oct 2005, Andrew Morton wrote:

> This needs the (uncommented (grr)) smp_rmb() copied-and-pasted as well.
> 
> It's a shame about the copy-and-pasting :(   Is it unavoidable?

Well there is a way at least to extract a major section from it that 
includes the smb_rmb().

Index: linux-2.6.14-rc4-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.14-rc4-mm1.orig/mm/vmscan.c	2005-10-18 09:36:36.000000000 -0700
+++ linux-2.6.14-rc4-mm1/mm/vmscan.c	2005-10-18 09:36:42.000000000 -0700
@@ -370,6 +370,42 @@ static pageout_t pageout(struct page *pa
 	return PAGE_CLEAN;
 }
 
+static inline int remove_mapping(struct address_space *mapping,
+				struct page *page)
+{
+	if (!mapping)
+		return 0;		/* truncate got there first */
+
+	write_lock_irq(&mapping->tree_lock);
+
+	/*
+	 * The non-racy check for busy page.  It is critical to check
+	 * PageDirty _after_ making sure that the page is freeable and
+	 * not in use by anybody. 	(pagecache + us == 2)
+	 */
+	if (page_count(page) != 2 || PageDirty(page)) {
+		write_unlock_irq(&mapping->tree_lock);
+		return 0;
+	}
+
+#ifdef CONFIG_SWAP
+	if (PageSwapCache(page)) {
+		swp_entry_t swap = { .val = page->private };
+		add_to_swapped_list(swap.val);
+		__delete_from_swap_cache(page);
+		write_unlock_irq(&mapping->tree_lock);
+		swap_free(swap);
+		__put_page(page);	/* The pagecache ref */
+		return 1;
+	}
+#endif /* CONFIG_SWAP */
+
+	__remove_from_page_cache(page);
+	write_unlock_irq(&mapping->tree_lock);
+	__put_page(page);
+	return 1;
+}
+
 /*
  * shrink_list adds the number of reclaimed pages to sc->nr_reclaimed
  */
@@ -508,36 +544,8 @@ static int shrink_list(struct list_head 
 				goto free_it;
 		}
 
-		if (!mapping)
-			goto keep_locked;	/* truncate got there first */
-
-		write_lock_irq(&mapping->tree_lock);
-
-		/*
-		 * The non-racy check for busy page.  It is critical to check
-		 * PageDirty _after_ making sure that the page is freeable and
-		 * not in use by anybody. 	(pagecache + us == 2)
-		 */
-		if (page_count(page) != 2 || PageDirty(page)) {
-			write_unlock_irq(&mapping->tree_lock);
+		if (!remove_mapping(mapping, page))
 			goto keep_locked;
-		}
-
-#ifdef CONFIG_SWAP
-		if (PageSwapCache(page)) {
-			swp_entry_t swap = { .val = page->private };
-			add_to_swapped_list(swap.val);
-			__delete_from_swap_cache(page);
-			write_unlock_irq(&mapping->tree_lock);
-			swap_free(swap);
-			__put_page(page);	/* The pagecache ref */
-			goto free_it;
-		}
-#endif /* CONFIG_SWAP */
-
-		__remove_from_page_cache(page);
-		write_unlock_irq(&mapping->tree_lock);
-		__put_page(page);
 
 free_it:
 		unlock_page(page);
@@ -646,31 +654,9 @@ redo:
 				goto free_it;
 		}
 
-		if (!mapping)
+		if (!remove_mapping(mapping, page))
 			goto retry_later_locked;       /* truncate got there first */
 
-		write_lock_irq(&mapping->tree_lock);
-
-		if (page_count(page) != 2 || PageDirty(page)) {
-			write_unlock_irq(&mapping->tree_lock);
-			goto retry_later_locked;
-		}
-
-#ifdef CONFIG_SWAP
-		if (PageSwapCache(page)) {
-			swp_entry_t swap = { .val = page->private };
-			__delete_from_swap_cache(page);
-			write_unlock_irq(&mapping->tree_lock);
-			swap_free(swap);
-			__put_page(page);       /* The pagecache ref */
-			goto free_it;
-		}
-#endif /* CONFIG_SWAP */
-
-		__remove_from_page_cache(page);
-		write_unlock_irq(&mapping->tree_lock);
-		__put_page(page);
-
 free_it:
 		/*
 		 * We may free pages that were taken off the active list

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Page migration via Swap V2: Page Eviction
  2005-10-18  8:34   ` Magnus Damm
@ 2005-10-18 16:43     ` Christoph Lameter
  2005-10-19 10:04       ` Magnus Damm
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2005-10-18 16:43 UTC (permalink / raw)
  To: Magnus Damm; +Cc: akpm, linux-mm, lhms-devel, ak

On Tue, 18 Oct 2005, Magnus Damm wrote:

> This function is very similar to isolate_lru_pages(), except that it
> operates on one page at a time and drains the lru if needed. Maybe
> isolate_lru_pages() could use this function (inline) if the spinlock
> and drain code was moved out?

isolate_lru_pages operates on batches of pages from the same zone and is 
very efficient by only taking a single lock. It also does not drain other 
processors LRUs.

> I'm also curios why you choose to always use list_del() and move back
> the page if freed elsewhere, instead of using
> del_page_from_[in]active_list(). I guess because of performance. But
> if that is the case, wouldn't it make sense to do as little as
> possible with the spinlock held, ie move list_add() (when rc == 1) out
> of the function?

I tried to follow isolate_lru_pages as closely as possible. list_add() is 
a simple operation and so I left it inside following some earlier code 
from the hotplug project.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Page migration via Swap V2: MPOL_MF_MOVE interface
  2005-10-18 10:05   ` Magnus Damm
@ 2005-10-18 16:46     ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2005-10-18 16:46 UTC (permalink / raw)
  To: Magnus Damm; +Cc: akpm, linux-mm, ak, lhms-devel

On Tue, 18 Oct 2005, Magnus Damm wrote:

> isolate_lru_page() calls get_page_testone(), and swapout_pages() seems
> to call __put_page(). But who decrements page->_count in the case of
> putback_lru_pages()?

Right. Here is a patch that does a put_page in putback_lru_pages():

Index: linux-2.6.14-rc4-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.14-rc4-mm1.orig/mm/vmscan.c	2005-10-17 16:19:21.000000000 -0700
+++ linux-2.6.14-rc4-mm1/mm/vmscan.c	2005-10-18 09:36:36.000000000 -0700
@@ -894,6 +894,8 @@ int putback_lru_pages(struct list_head *
 			count++;
 		}
 		spin_unlock_irq(&zone->lru_lock);
+		/* Undo the get from isolate_lru_page */
+		put_page(page);
 	}
 	return count;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] Page migration via Swap V2: Overview
  2005-10-18  3:18 ` [PATCH 0/2] Page migration via Swap V2: Overview KAMEZAWA Hiroyuki
  2005-10-18 14:27   ` [Lhms-devel] " Lee Schermerhorn
@ 2005-10-18 16:47   ` Christoph Lameter
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2005-10-18 16:47 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: akpm, linux-mm, ak, lhms-devel

On Tue, 18 Oct 2005, KAMEZAWA Hiroyuki wrote:

> I think migration cache will work well for A & B :)
> migraction cache is virtual swap, just unmap a page and modifies it as a swap
> cache.

That would be great. Cold you rework the migration cache patch to apply to 
2.6.14-rc4-mm1?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] Page migration via Swap V2: Overview
  2005-10-18  6:37 ` KAMEZAWA Hiroyuki
@ 2005-10-18 16:50   ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2005-10-18 16:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, lhms-devel, jschopp, Andrew Morton

On Tue, 18 Oct 2005, KAMEZAWA Hiroyuki wrote:

> Because sys_mbind() acquires mm->mmap_sem, once page is unmapped,
> all accesses to the page are blocked.
> 
> So, even if the range contains hot pages, there will not be
> hard-to-be-swapped-out pages. right ?

There may be locked pages and maybe pages that are continually busy.
 
> sys_mbind() can aquire mm->mmap_sem for migrating *a process's page*,
> but memory-hotplug cannot aquire the lock for migrating a chunk of pages.

I did mbind first because it is the less invasive. The primary reason to 
acquire mmap_sem is to be able to walk the vma areas.

> I think we'll need radix_tree_replace for migating arbitrary chunk of pages,
> anyway.

Likely. Ultimately I would like to see the direct migration work.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] Page migration via Swap V2: Overview
  2005-10-18 12:16 ` Marcelo Tosatti
@ 2005-10-18 16:54   ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2005-10-18 16:54 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: akpm, linux-mm, ak, lhms-devel

On Tue, 18 Oct 2005, Marcelo Tosatti wrote:

> Having a duplicate implementation is somewhat disappointing - why not fix the problems
> with real page migration?

There are problems on a variety of levels. Its just too complicated to 
work them out in one go. I think we would need much more support from the 
larger developer community to get there. With a simple working migration 
approach we can simultaneously:

1. Explore solutions to the lower level migration code

2. Deal with the memory policy issues arising in hotplug and in memory 
migration. These are masked by the swap based migration because swapin 
guarantees the correct use of memory policies and cpuset restrictions.

3. Implement appropriate higher level control of page migration via a 
variety of methods and develop the necessary user land support structures.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Page migration via Swap V2: Page Eviction
  2005-10-18 16:43     ` Christoph Lameter
@ 2005-10-19 10:04       ` Magnus Damm
  2005-10-19 15:29         ` Christoph Lameter
  2005-10-19 20:32         ` Christoph Lameter
  0 siblings, 2 replies; 20+ messages in thread
From: Magnus Damm @ 2005-10-19 10:04 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, lhms-devel, ak

On 10/19/05, Christoph Lameter <clameter@engr.sgi.com> wrote:
> On Tue, 18 Oct 2005, Magnus Damm wrote:
>
> > This function is very similar to isolate_lru_pages(), except that it
> > operates on one page at a time and drains the lru if needed. Maybe
> > isolate_lru_pages() could use this function (inline) if the spinlock
> > and drain code was moved out?
>
> isolate_lru_pages operates on batches of pages from the same zone and is
> very efficient by only taking a single lock. It also does not drain other
> processors LRUs.

Ah, I see. You have a mix of pages from different zones on your list.
Maybe it is possible to use the same kind of zone locking style as
release_pages() to avoid duplicating code...

> > I'm also curios why you choose to always use list_del() and move back
> > the page if freed elsewhere, instead of using
> > del_page_from_[in]active_list(). I guess because of performance. But
> > if that is the case, wouldn't it make sense to do as little as
> > possible with the spinlock held, ie move list_add() (when rc == 1) out
> > of the function?
>
> I tried to follow isolate_lru_pages as closely as possible. list_add() is
> a simple operation and so I left it inside following some earlier code
> from the hotplug project.

Yep, it probably won't matter.

I'm trying to figure out if this code works in all cases:

+               spin_lock_irq(&zone->lru_lock);
+               list_del(&page->lru);
+               if (!TestSetPageLRU(page)) {
+                       if (PageActive(page))
+                               add_page_to_active_list(zone, page);
+                       else
+                               add_page_to_inactive_list(zone, page);
+                       count++;
+               }
+               spin_unlock_irq(&zone->lru_lock);

Why not use if (TestSetPageLRU(page)) BUG()?

Or is it possible that someone sets the LRU bit while we are keeping
the pages on our non-lru list? If so, who is stealing and will your
put_page() patch work correctly if the page is stolen from us?

Thanks,

/ magnus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Page migration via Swap V2: Page Eviction
  2005-10-19 10:04       ` Magnus Damm
@ 2005-10-19 15:29         ` Christoph Lameter
  2005-10-19 20:32         ` Christoph Lameter
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2005-10-19 15:29 UTC (permalink / raw)
  To: Magnus Damm; +Cc: akpm, linux-mm, lhms-devel, ak

On Wed, 19 Oct 2005, Magnus Damm wrote:

> I'm trying to figure out if this code works in all cases:
> 
> +               spin_lock_irq(&zone->lru_lock);
> +               list_del(&page->lru);
> +               if (!TestSetPageLRU(page)) {
> +                       if (PageActive(page))
> +                               add_page_to_active_list(zone, page);
> +                       else
> +                               add_page_to_inactive_list(zone, page);
> +                       count++;
> +               }
> +               spin_unlock_irq(&zone->lru_lock);
> 
> Why not use if (TestSetPageLRU(page)) BUG()?

That is probably right.
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Page migration via Swap V2: Page Eviction
  2005-10-19 10:04       ` Magnus Damm
  2005-10-19 15:29         ` Christoph Lameter
@ 2005-10-19 20:32         ` Christoph Lameter
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2005-10-19 20:32 UTC (permalink / raw)
  To: Magnus Damm; +Cc: akpm, linux-mm, lhms-devel, ak

On Wed, 19 Oct 2005, Magnus Damm wrote:

> I'm trying to figure out if this code works in all cases:
> 
> +               spin_lock_irq(&zone->lru_lock);
> +               list_del(&page->lru);
> +               if (!TestSetPageLRU(page)) {
> +                       if (PageActive(page))
> +                               add_page_to_active_list(zone, page);
> +                       else
> +                               add_page_to_inactive_list(zone, page);
> +                       count++;
> +               }
> +               spin_unlock_irq(&zone->lru_lock);
> 
> Why not use if (TestSetPageLRU(page)) BUG()?

The memory hotplug project has a BUG() there and I cannot find a way that
something else could legitimately set the LRU bit. You are right. Thus 
this fix which also includes the addition of put_page() already included 
in an earlier patch.

Index: linux-2.6.14-rc4-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.14-rc4-mm1.orig/mm/vmscan.c	2005-10-17 16:19:21.000000000 -0700
+++ linux-2.6.14-rc4-mm1/mm/vmscan.c	2005-10-19 13:30:00.000000000 -0700
@@ -886,14 +886,16 @@ int putback_lru_pages(struct list_head *
 
 		spin_lock_irq(&zone->lru_lock);
 		list_del(&page->lru);
-		if (!TestSetPageLRU(page)) {
-			if (PageActive(page))
-				add_page_to_active_list(zone, page);
-			else
-				add_page_to_inactive_list(zone, page);
-			count++;
-		}
+		if (TestSetPageLRU(page))
+			BUG();
+		if (PageActive(page))
+			add_page_to_active_list(zone, page);
+		else
+			add_page_to_inactive_list(zone, page);
+		count++;
 		spin_unlock_irq(&zone->lru_lock);
+		/* Undo the get from isolate_lru_page */
+		put_page(page);
 	}
 	return count;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2005-10-19 20:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-18  0:49 [PATCH 0/2] Page migration via Swap V2: Overview Christoph Lameter
2005-10-18  0:49 ` [PATCH 1/2] Page migration via Swap V2: Page Eviction Christoph Lameter
2005-10-18  1:04   ` Andrew Morton
2005-10-18  8:51     ` Nick Piggin
2005-10-18 16:38     ` Christoph Lameter
2005-10-18  8:34   ` Magnus Damm
2005-10-18 16:43     ` Christoph Lameter
2005-10-19 10:04       ` Magnus Damm
2005-10-19 15:29         ` Christoph Lameter
2005-10-19 20:32         ` Christoph Lameter
2005-10-18  0:49 ` [PATCH 2/2] Page migration via Swap V2: MPOL_MF_MOVE interface Christoph Lameter
2005-10-18 10:05   ` Magnus Damm
2005-10-18 16:46     ` Christoph Lameter
2005-10-18  3:18 ` [PATCH 0/2] Page migration via Swap V2: Overview KAMEZAWA Hiroyuki
2005-10-18 14:27   ` [Lhms-devel] " Lee Schermerhorn
2005-10-18 16:47   ` Christoph Lameter
2005-10-18  6:37 ` KAMEZAWA Hiroyuki
2005-10-18 16:50   ` Christoph Lameter
2005-10-18 12:16 ` Marcelo Tosatti
2005-10-18 16:54   ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox