Re: [PATCH] Add page migration support via swap to the NUMA policy layer

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] Add page migration support via swap to the NUMA policy layer
@ 2005-10-16  8:56 Andi Kleen
  0 siblings, 0 replies; 3+ messages in thread
From: Andi Kleen @ 2005-10-16  8:56 UTC (permalink / raw)
  To: linux-mm

[-- Attachment #1: Type: text/plain, Size: 50 bytes --]


[original mail had a mistyped linux-mm address]


[-- Attachment #2: Mail Delivery System <MAILER-DAEMON@suse.de>: Undelivered Mail Returned to Sender --]
[-- Type: message/rfc822, Size: 5035 bytes --]

[-- Attachment #2.1.1: Notification --]
[-- Type: text/plain, Size: 605 bytes --]

This is the Postfix program at host mx2.suse.de.

I'm sorry to have to inform you that your message could not be
be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to <postmaster>

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

			The Postfix program

<linux-mm@vger.kernel.org>: host vger.kernel.org[209.132.176.167] said: 554
    5.0.0 Hi [195.135.220.15], unresolvable address:
    <linux-mm@vger.kernel.org>; nosuchuser; linux-mm@vger.kernel.org (in reply
    to RCPT TO command)

[-- Attachment #2.1.2: Delivery report --]
[-- Type: message/delivery-status, Size: 465 bytes --]

[-- Attachment #2.1.3: Undelivered Message --]
[-- Type: message/rfc822, Size: 1756 bytes --]

From: Andi Kleen <ak@suse.de>
To: Christoph Lameter <clameter@engr.sgi.com>
Cc: lhms-devel@lists.sourceforge.net, linux-mm@vger.kernel.org
Subject: Re: [PATCH] Add page migration support via swap to the NUMA policy layer
Date: Thu, 13 Oct 2005 20:47:03 +0200
Message-ID: <200510132047.03892.ak@suse.de>

On Thursday 13 October 2005 20:15, Christoph Lameter wrote:
> This patch adds page migration support to the NUMA policy layer. An additional
> flag MPOL_MF_MOVE is introduced for mbind. If MPOL_MF_MOVE is specified then
> pages that do not conform to the memory policy will be evicted from memory.
> When they get pages back in new pages will be allocated following the numa policy.

That part looks ok.

> 
> In addition this also adds a move_pages function that may be used from outside
> of the policy layer to move pages between nodes (needed by the cpuset support
> and the /proc interface). The design is intended to support future direct page
> migration without going through swap space.

Please split that out and resubmit if there are really other users

(what /proc support?) 

> +				WARN_ON(isolate_lru_page(page, pagelist) == 0);

WARN_ONs are not supposed to have side effects.

-Andi


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Add page migration support via swap to the NUMA policy layer
  2005-10-13 18:15 Christoph Lameter
@ 2005-10-13 18:16 ` Christoph Lameter
  0 siblings, 0 replies; 3+ messages in thread
From: Christoph Lameter @ 2005-10-13 18:16 UTC (permalink / raw)
  To: lhms-devel; +Cc: linux-mm, ak

I forgot to say:

The patch requires the memory policy layering patch posted yesterday and 
the page eviction patch posted today to be applied to 2.6.14-rc4.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] Add page migration support via swap to the NUMA policy layer
@ 2005-10-13 18:15 Christoph Lameter
  2005-10-13 18:16 ` Christoph Lameter
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Lameter @ 2005-10-13 18:15 UTC (permalink / raw)
  To: lhms-devel; +Cc: linux-mm, ak

This patch adds page migration support to the NUMA policy layer. An additional
flag MPOL_MF_MOVE is introduced for mbind. If MPOL_MF_MOVE is specified then
pages that do not conform to the memory policy will be evicted from memory.
When they get pages back in new pages will be allocated following the numa policy.

In addition this also adds a move_pages function that may be used from outside
of the policy layer to move pages between nodes (needed by the cpuset support
and the /proc interface). The design is intended to support future direct page
migration without going through swap space.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.14-rc4/mm/mempolicy.c
===================================================================
--- linux-2.6.14-rc4.orig/mm/mempolicy.c	2005-10-13 10:13:43.000000000 -0700
+++ linux-2.6.14-rc4/mm/mempolicy.c	2005-10-13 11:07:51.000000000 -0700
@@ -83,6 +83,7 @@
 #include <linux/init.h>
 #include <linux/compat.h>
 #include <linux/mempolicy.h>
+#include <linux/swap.h>
 #include <asm/tlbflush.h>
 #include <asm/uaccess.h>
 
@@ -182,7 +183,8 @@ static struct mempolicy *mpol_new(int mo
 
 /* Ensure all existing pages follow the policy. */
 static int check_pte_range(struct mm_struct *mm, pmd_t *pmd,
-		unsigned long addr, unsigned long end, nodemask_t *nodes)
+		unsigned long addr, unsigned long end,
+		nodemask_t *nodes, struct list_head *pagelist)
 {
 	pte_t *orig_pte;
 	pte_t *pte;
@@ -199,8 +201,14 @@ static int check_pte_range(struct mm_str
 		if (!pfn_valid(pfn))
 			continue;
 		nid = pfn_to_nid(pfn);
-		if (!node_isset(nid, *nodes))
-			break;
+		if (!node_isset(nid, *nodes)) {
+			if (pagelist) {
+				struct page *page = pfn_to_page(pfn);
+
+				WARN_ON(isolate_lru_page(page, pagelist) == 0);
+			} else
+				break;
+		}
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 	pte_unmap(orig_pte);
 	spin_unlock(&mm->page_table_lock);
@@ -208,7 +216,8 @@ static int check_pte_range(struct mm_str
 }
 
 static inline int check_pmd_range(struct mm_struct *mm, pud_t *pud,
-		unsigned long addr, unsigned long end, nodemask_t *nodes)
+		unsigned long addr, unsigned long end,
+		nodemask_t *nodes, struct list_head *pagelist)
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -218,14 +227,15 @@ static inline int check_pmd_range(struct
 		next = pmd_addr_end(addr, end);
 		if (pmd_none_or_clear_bad(pmd))
 			continue;
-		if (check_pte_range(mm, pmd, addr, next, nodes))
+		if (check_pte_range(mm, pmd, addr, next, nodes, pagelist))
 			return -EIO;
 	} while (pmd++, addr = next, addr != end);
 	return 0;
 }
 
 static inline int check_pud_range(struct mm_struct *mm, pgd_t *pgd,
-		unsigned long addr, unsigned long end, nodemask_t *nodes)
+		unsigned long addr, unsigned long end,
+		nodemask_t *nodes, struct list_head *pagelist)
 {
 	pud_t *pud;
 	unsigned long next;
@@ -235,14 +245,15 @@ static inline int check_pud_range(struct
 		next = pud_addr_end(addr, end);
 		if (pud_none_or_clear_bad(pud))
 			continue;
-		if (check_pmd_range(mm, pud, addr, next, nodes))
+		if (check_pmd_range(mm, pud, addr, next, nodes, pagelist))
 			return -EIO;
 	} while (pud++, addr = next, addr != end);
 	return 0;
 }
 
 static inline int check_pgd_range(struct mm_struct *mm,
-		unsigned long addr, unsigned long end, nodemask_t *nodes)
+		unsigned long addr, unsigned long end,
+		nodemask_t *nodes, struct list_head *pagelist)
 {
 	pgd_t *pgd;
 	unsigned long next;
@@ -252,7 +263,7 @@ static inline int check_pgd_range(struct
 		next = pgd_addr_end(addr, end);
 		if (pgd_none_or_clear_bad(pgd))
 			continue;
-		if (check_pud_range(mm, pgd, addr, next, nodes))
+		if (check_pud_range(mm, pgd, addr, next, nodes, pagelist))
 			return -EIO;
 	} while (pgd++, addr = next, addr != end);
 	return 0;
@@ -261,7 +272,7 @@ static inline int check_pgd_range(struct
 /* Step 1: check the range */
 static struct vm_area_struct *
 check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
-	    nodemask_t *nodes, unsigned long flags)
+	    nodemask_t *nodes, unsigned long flags, struct list_head *pagelist)
 {
 	int err;
 	struct vm_area_struct *first, *vma, *prev;
@@ -275,14 +286,20 @@ check_range(struct mm_struct *mm, unsign
 			return ERR_PTR(-EFAULT);
 		if (prev && prev->vm_end < vma->vm_start)
 			return ERR_PTR(-EFAULT);
-		if ((flags & MPOL_MF_STRICT) && !is_vm_hugetlb_page(vma)) {
-			unsigned long endvma = vma->vm_end;
+		if (!is_vm_hugetlb_page(vma) &&
+		    ((flags & MPOL_MF_STRICT) ||
+		     ((flags & MPOL_MF_MOVE) &&
+		      ((vma->vm_flags & (VM_LOCKED|VM_IO|VM_RESERVED|VM_DENYWRITE|VM_SHM))==0)
+		   ))) {
+			unsigned long endvma;
+
+			endvma = vma->vm_end;
 			if (endvma > end)
 				endvma = end;
 			if (vma->vm_start > start)
 				start = vma->vm_start;
 			err = check_pgd_range(vma->vm_mm,
-					   start, endvma, nodes);
+					   start, endvma, nodes, pagelist);
 			if (err) {
 				first = ERR_PTR(err);
 				break;
@@ -293,6 +310,36 @@ check_range(struct mm_struct *mm, unsign
 	return first;
 }
 
+/*
+ * Main entry point to page migration.
+ * For now move_pages simply swaps out the pages from nodes that are in
+ * the source set but not in the target set. In the future, we would
+ * want a function that moves pages between the two nodesets in such
+ * a way as to preserve the physical layout as much as possible.
+ *
+ * Returns the number of page that could not be moved.
+ */
+int move_pages(struct mm_struct *mm, unsigned long start, unsigned long end,
+	nodemask_t *from_nodes, nodemask_t *to_nodes)
+{
+	LIST_HEAD(pagelist);
+	int count = 0;
+	nodemask_t nodes;
+
+	nodes_andnot(nodes, *from_nodes, *to_nodes);
+	nodes_complement(nodes, nodes);
+
+	down_read(&mm->mmap_sem);
+	check_range(mm, start, end, &nodes, MPOL_MF_MOVE, &pagelist);
+	if (!list_empty(&pagelist)) {
+		swapout_pages(&pagelist);
+		if (!list_empty(&pagelist))
+			count = putback_lru_pages(&pagelist);
+	}
+	up_read(&mm->mmap_sem);
+	return count;
+}
+
 /* Apply policy to a single VMA */
 static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
 {
@@ -356,21 +403,28 @@ long do_mbind(unsigned long start, unsig
 	struct mempolicy *new;
 	unsigned long end;
 	int err;
+	LIST_HEAD(pagelist);
 
-	if ((flags & ~(unsigned long)(MPOL_MF_STRICT)) || mode > MPOL_MAX)
+	if ((flags & ~(unsigned long)(MPOL_MF_STRICT | MPOL_MF_MOVE))
+	    || mode > MPOL_MAX)
 		return -EINVAL;
 	if (start & ~PAGE_MASK)
 		return -EINVAL;
+
 	if (mode == MPOL_DEFAULT)
 		flags &= ~MPOL_MF_STRICT;
+
 	len = (len + PAGE_SIZE - 1) & PAGE_MASK;
 	end = start + len;
+
 	if (end < start)
 		return -EINVAL;
 	if (end == start)
 		return 0;
+
 	if (contextualize_policy(mode, nmask))
 		return -EINVAL;
+
 	new = mpol_new(mode, nmask);
 	if (IS_ERR(new))
 		return PTR_ERR(new);
@@ -379,10 +433,19 @@ long do_mbind(unsigned long start, unsig
 			mode,nodes_addr(nodes)[0]);
 
 	down_write(&mm->mmap_sem);
-	vma = check_range(mm, start, end, nmask, flags);
+	vma = check_range(mm, start, end, nmask, flags,
+			  (flags & MPOL_MF_MOVE) ? &pagelist : NULL);
 	err = PTR_ERR(vma);
-	if (!IS_ERR(vma))
+	if (!IS_ERR(vma)) {
 		err = mbind_range(vma, start, end, new);
+		if (!list_empty(&pagelist))
+			swapout_pages(&pagelist);
+		if (!err  && !list_empty(&pagelist) && (flags & MPOL_MF_STRICT))
+				err = -EIO;
+	}
+	if (!list_empty(&pagelist))
+		putback_lru_pages(&pagelist);
+
 	up_write(&mm->mmap_sem);
 	mpol_free(new);
 	return err;
Index: linux-2.6.14-rc4/include/linux/mempolicy.h
===================================================================
--- linux-2.6.14-rc4.orig/include/linux/mempolicy.h	2005-10-13 10:13:43.000000000 -0700
+++ linux-2.6.14-rc4/include/linux/mempolicy.h	2005-10-13 10:14:10.000000000 -0700
@@ -22,6 +22,7 @@
 
 /* Flags for mbind */
 #define MPOL_MF_STRICT	(1<<0)	/* Verify existing pages in the mapping */
+#define MPOL_MF_MOVE	(1<<1)	/* Move pages to specified nodes */
 
 #ifdef __KERNEL__
 
@@ -157,6 +158,9 @@ extern void numa_default_policy(void);
 extern void numa_policy_init(void);
 extern struct mempolicy default_policy;
 
+extern int move_pages(struct mm_struct *mm, unsigned long from, unsigned long to,
+			nodemask_t *from_nodes, nodemask_t *to_nodes);
+
 #else
 
 struct mempolicy {};

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-10-16  8:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-16  8:56 [PATCH] Add page migration support via swap to the NUMA policy layer Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2005-10-13 18:15 Christoph Lameter
2005-10-13 18:16 ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox