* [PATCH 2.6.13-rc1 1/11] mm: hirokazu-steal_page_from_lru.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
@ 2005-07-01 22:40 ` Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 2/11] mm: manual page migration-rc4 -- xfs-migrate-page-rc4.patch Ray Bryant
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:40 UTC (permalink / raw)
To: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen, Dave Hansen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
Hi Dave,
Would you apply the following patch right after
AA-PM-01-steal_page_from_lru.patch.
This patch makes steal_page_from_lru() and putback_page_to_lru()
check PageLRU() with zone->lur_lock held. Currently the process
migration code, where Ray is working on, only uses this code.
Thanks,
Hirokazu Takahashi.
Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp>
---
linux-2.6.12-rc3-taka/include/linux/mm_inline.h | 8 +++++---
1 files changed, 5 insertions, 3 deletions
diff -puN include/linux/mm_inline.h~taka-steal_page_from_lru-FIX include/linux/mm_inline.h
--- linux-2.6.12-rc3/include/linux/mm_inline.h~taka-steal_page_from_lru-FIX Mon May 23 02:26:57 2005
+++ linux-2.6.12-rc3-taka/include/linux/mm_inline.h Mon May 23 02:26:57 2005
@@ -80,9 +80,10 @@ static inline int
steal_page_from_lru(struct zone *zone, struct page *page,
struct list_head *dst)
{
- int ret;
+ int ret = 0;
spin_lock_irq(&zone->lru_lock);
- ret = __steal_page_from_lru(zone, page, dst);
+ if (PageLRU(page))
+ ret = __steal_page_from_lru(zone, page, dst);
spin_unlock_irq(&zone->lru_lock);
return ret;
}
@@ -102,7 +103,8 @@ static inline void
putback_page_to_lru(struct zone *zone, struct page *page)
{
spin_lock_irq(&zone->lru_lock);
- __putback_page_to_lru(zone, page);
+ if (!PageLRU(page))
+ __putback_page_to_lru(zone, page);
spin_unlock_irq(&zone->lru_lock);
}
_
-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Lhms-devel mailing list
Lhms-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lhms-devel
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 2.6.13-rc1 2/11] mm: manual page migration-rc4 -- xfs-migrate-page-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 1/11] mm: hirokazu-steal_page_from_lru.patch Ray Bryant
@ 2005-07-01 22:40 ` Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 3/11] mm: manual page migration-rc4 -- add-node_map-arg-to-try_to_migrate_pages-rc4.patch Ray Bryant
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:40 UTC (permalink / raw)
To: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
Nathan Scott of SGI provided this patch for XFS that supports
the migrate_page method in the address_space operations vector.
It is basically the same as what is in ext2_migrate_page().
However, the routine "xfs_skip_migrate_page()" is added to
disallow migration of xfs metadata.
Signed-off-by: Ray Bryant <raybry@sgi.com>
xfs_aops.c | 10 ++++++++++
xfs_buf.c | 7 +++++++
2 files changed, 17 insertions(+)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/fs/xfs/linux-2.6/xfs_aops.c 2005-06-13 11:12:36.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/fs/xfs/linux-2.6/xfs_aops.c 2005-06-13 11:12:42.000000000 -0700
@@ -54,6 +54,7 @@
#include "xfs_iomap.h"
#include <linux/mpage.h>
#include <linux/writeback.h>
+#include <linux/mmigrate.h>
STATIC void xfs_count_page_state(struct page *, int *, int *, int *);
STATIC void xfs_convert_page(struct inode *, struct page *, xfs_iomap_t *,
@@ -1273,6 +1274,14 @@ linvfs_prepare_write(
return block_prepare_write(page, from, to, linvfs_get_block);
}
+STATIC int
+linvfs_migrate_page(
+ struct page *from,
+ struct page *to)
+{
+ return generic_migrate_page(from, to, migrate_page_buffer);
+}
+
struct address_space_operations linvfs_aops = {
.readpage = linvfs_readpage,
.readpages = linvfs_readpages,
@@ -1283,4 +1292,5 @@ struct address_space_operations linvfs_a
.commit_write = generic_commit_write,
.bmap = linvfs_bmap,
.direct_IO = linvfs_direct_IO,
+ .migrate_page = linvfs_migrate_page,
};
Index: linux-2.6.12-rc5-mhp1-page-migration-export/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/fs/xfs/linux-2.6/xfs_buf.c 2005-06-13 11:12:36.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/fs/xfs/linux-2.6/xfs_buf.c 2005-06-13 11:12:42.000000000 -0700
@@ -1626,6 +1626,12 @@ xfs_setsize_buftarg(
}
STATIC int
+xfs_skip_migrate_page(struct page *from, struct page *to)
+{
+ return -EBUSY;
+}
+
+STATIC int
xfs_mapping_buftarg(
xfs_buftarg_t *btp,
struct block_device *bdev)
@@ -1635,6 +1641,7 @@ xfs_mapping_buftarg(
struct address_space *mapping;
static struct address_space_operations mapping_aops = {
.sync_page = block_sync_page,
+ .migrate_page = xfs_skip_migrate_page,
};
inode = new_inode(bdev->bd_inode->i_sb);
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 2.6.13-rc1 3/11] mm: manual page migration-rc4 -- add-node_map-arg-to-try_to_migrate_pages-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 1/11] mm: hirokazu-steal_page_from_lru.patch Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 2/11] mm: manual page migration-rc4 -- xfs-migrate-page-rc4.patch Ray Bryant
@ 2005-07-01 22:40 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 4/11] mm: manual page migration-rc4 -- add-sys_migrate_pages-rc4.patch Ray Bryant
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:40 UTC (permalink / raw)
To: Hirokazu Takahashi, Andi Kleen, Dave Hansen, Marcelo Tosatti
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This patch changes the interface to try_to_migrate_pages() so that one
can specify the nodes where the pages are to be migrated to. This is
done by adding a "node_map" argument to try_to_migrate_pages(), node_map
is of type "int *".
If this argument is NULL, then try_to_migrate_pages() behaves exactly
as before and this is the interface the rest of the memory hotplug
patch should use. (Note: This patchset does not include the changes
for the rest of the memory hotplug patch that will be necessary to use
this new interface [if it is accepted]. Those chagnes will be provided
as a distinct patch.)
If the argument is non-NULL, the node_map points at an array of shorts
of size MAX_NUMNODES. node_map[N] is either the id of an online node
or -1. If node_map[N] >=0 then pages found in the page list passed
to try_to_migrate_pages() that are found on node N are migrated to node
node_map[N]. if node_map[N] == -1, then pages found on node N are left
where they are.
This change depends on previous changes to migrate_onepage()
that support migrating a page to a specified node. These changes
are already part of the memory migration sub-patch of the memory
hotplug patch.
Signed-off-by: Ray Bryant <raybry@sgi.com>
include/linux/mmigrate.h | 11 ++++++++++-
mm/mmigrate.c | 10 ++++++----
2 files changed, 16 insertions(+), 5 deletions(-)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mmigrate.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/mmigrate.h 2005-06-10 14:47:25.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mmigrate.h 2005-06-13 10:22:22.000000000 -0700
@@ -16,7 +16,16 @@ extern int migrate_page_buffer(struct pa
extern int page_migratable(struct page *, struct page *, int,
struct list_head *);
extern struct page * migrate_onepage(struct page *, int nodeid);
-extern int try_to_migrate_pages(struct list_head *);
+extern int try_to_migrate_pages(struct list_head *, int *);
+
+static inline struct page *node_migrate_onepage(struct page *page, int *node_map)
+{
+ if (node_map)
+ return migrate_onepage(page, node_map[page_to_nid(page)]);
+ else
+ return migrate_onepage(page, MIGRATE_NODE_ANY);
+
+}
#else
static inline int generic_migrate_page(struct page *page, struct page *newpage,
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c 2005-06-10 14:47:25.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c 2005-06-13 10:22:02.000000000 -0700
@@ -501,9 +501,11 @@ out_unlock:
/*
* This is the main entry point to migrate pages in a specific region.
* If a page is inactive, the page may be just released instead of
- * migration.
+ * migration. node_map is supplied in those cases (on NUMA systems)
+ * where the caller wishes to specify to which nodes the pages are
+ * migrated. If node_map is null, the target node is MIGRATE_NODE_ANY.
*/
-int try_to_migrate_pages(struct list_head *page_list)
+int try_to_migrate_pages(struct list_head *page_list, int *node_map)
{
struct page *page, *page2, *newpage;
LIST_HEAD(pass1_list);
@@ -541,7 +543,7 @@ int try_to_migrate_pages(struct list_hea
list_for_each_entry_safe(page, page2, &pass1_list, lru) {
list_del(&page->lru);
if (PageLocked(page) || PageWriteback(page) ||
- IS_ERR(newpage = migrate_onepage(page, MIGRATE_NODE_ANY))) {
+ IS_ERR(newpage = node_migrate_onepage(page, node_map))) {
if (page_count(page) == 1) {
/* the page is already unused */
putback_page_to_lru(page_zone(page), page);
@@ -559,7 +561,7 @@ int try_to_migrate_pages(struct list_hea
*/
list_for_each_entry_safe(page, page2, &pass2_list, lru) {
list_del(&page->lru);
- if (IS_ERR(newpage = migrate_onepage(page, MIGRATE_NODE_ANY))) {
+ if (IS_ERR(newpage = node_migrate_onepage(page, node_map))) {
if (page_count(page) == 1) {
/* the page is already unused */
putback_page_to_lru(page_zone(page), page);
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 2.6.13-rc1 4/11] mm: manual page migration-rc4 -- add-sys_migrate_pages-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (2 preceding siblings ...)
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 3/11] mm: manual page migration-rc4 -- add-node_map-arg-to-try_to_migrate_pages-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 5/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-rc4.patch Ray Bryant
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen, Dave Hansen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This is the main patch that creates the migrate_pages() system
call. Note that in this case, the system call number was more
or less arbitrarily assigned at 1279. This number needs to
allocated.
This patch sits on top of the page migration patches from
the Memory Hotplug project. This particular patchset is built
on top of:
http://www.sr71.net/patches/2.6.12/2.6.13-rc1-mhp1/page_migration/patch-2.6.13-rc1-mhp1-pm.gz
but it may apply on subsequent page migration patches as well.
This patch migrates all pages in the specified process (including
shared libraries.)
See the patches:
sys_migrate_pages-migration-selection-rc4.patch
add-mempolicy-control-rc4.patch
for details on the default kernel migration policy (this determines
which VMAs are actually migrated) and how this policy can be overridden
using the mbind() system call.
Updates since last release of this patchset:
Suggestions from Dave Hansen and Hirokazu Takahashi
have been incorporated.
Signed-off-by: Ray Bryant <raybry@sgi.com>
arch/ia64/kernel/entry.S | 2
kernel/sys_ni.c | 1
mm/mmigrate.c | 184 ++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 185 insertions(+), 2 deletions(-)
Index: linux-2.6.13-rc1-mhp1-page-migration/arch/ia64/kernel/entry.S
===================================================================
--- linux-2.6.13-rc1-mhp1-page-migration.orig/arch/ia64/kernel/entry.S 2005-06-28 22:57:29.000000000 -0700
+++ linux-2.6.13-rc1-mhp1-page-migration/arch/ia64/kernel/entry.S 2005-06-30 11:17:05.000000000 -0700
@@ -1582,6 +1582,6 @@ sys_call_table:
data8 sys_set_zone_reclaim
data8 sys_ni_syscall
data8 sys_ni_syscall
- data8 sys_ni_syscall
+ data8 sys_migrate_pages // 1279
.org sys_call_table + 8*NR_syscalls // guard against failures to increase NR_syscalls
Index: linux-2.6.13-rc1-mhp1-page-migration/mm/mmigrate.c
===================================================================
--- linux-2.6.13-rc1-mhp1-page-migration.orig/mm/mmigrate.c 2005-06-30 11:16:37.000000000 -0700
+++ linux-2.6.13-rc1-mhp1-page-migration/mm/mmigrate.c 2005-06-30 11:17:05.000000000 -0700
@@ -5,6 +5,9 @@
*
* Authors: IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
* Hirokazu Takahashi <taka@valinux.co.jp>
+ *
+ * sys_migrate_pages() added by Ray Bryant <raybry@sgi.com>
+ * Copyright (C) 2005, Silicon Graphics, Inc.
*/
#include <linux/config.h>
@@ -21,6 +24,8 @@
#include <linux/rmap.h>
#include <linux/mmigrate.h>
#include <linux/delay.h>
+#include <linux/nodemask.h>
+#include <asm/bitops.h>
/*
* The concept of memory migration is to replace a target page with
@@ -436,7 +441,7 @@ migrate_onepage(struct page *page, int n
if (nodeid == MIGRATE_NODE_ANY)
newpage = page_cache_alloc(mapping);
else
- newpage = alloc_pages_node(nodeid, mapping->flags, 0);
+ newpage = alloc_pages_node(nodeid, (unsigned int)mapping->flags, 0);
if (newpage == NULL) {
unlock_page(page);
return ERR_PTR(-ENOMEM);
@@ -587,6 +592,183 @@ int try_to_migrate_pages(struct list_hea
return nr_busy;
}
+static int
+migrate_vma(struct task_struct *task, struct mm_struct *mm,
+ struct vm_area_struct *vma, int *node_map)
+{
+ struct page *page, *page2;
+ unsigned long vaddr;
+ int count = 0, nr_busy;
+ LIST_HEAD(pglist);
+
+ /* can't migrate mlock()'d pages */
+ if (vma->vm_flags & VM_LOCKED)
+ return 0;
+
+ /*
+ * gather all of the pages to be migrated from this vma into pglist
+ */
+ spin_lock(&mm->page_table_lock);
+ for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
+ page = follow_page(mm, vaddr, 0);
+ /*
+ * follow_page has been known to return pages with zero mapcount
+ * and NULL mapping. Skip those pages as well
+ */
+ if (!page || !page_mapcount(page))
+ continue;
+
+ if (node_map[page_to_nid(page)] >= 0) {
+ if (steal_page_from_lru(page_zone(page), page, &pglist))
+ count++;
+ else
+ BUG();
+ }
+ }
+ spin_unlock(&mm->page_table_lock);
+
+ /* call the page migration code to move the pages */
+ if (!count)
+ return 0;
+
+ nr_busy = try_to_migrate_pages(&pglist, node_map);
+
+ if (nr_busy < 0)
+ return nr_busy;
+
+ if (nr_busy == 0)
+ return count;
+
+ /* return the unmigrated pages to the LRU lists */
+ list_for_each_entry_safe(page, page2, &pglist, lru) {
+ list_del(&page->lru);
+ putback_page_to_lru(page_zone(page), page);
+ }
+ return -EAGAIN;
+
+}
+
+static inline int nodes_invalid(int *nodes, __u32 count)
+{
+ int i;
+ for (i = 0; i < count; i++)
+ if (nodes[i] < 0 ||
+ nodes[i] > MAX_NUMNODES ||
+ !node_online(nodes[i]))
+ return 1;
+ return 0;
+}
+
+void lru_add_drain_per_cpu(void *info)
+{
+ lru_add_drain();
+}
+
+asmlinkage long
+sys_migrate_pages(pid_t pid, __u32 count, __u32 __user *old_nodes,
+ __u32 __user *new_nodes)
+{
+ int i, ret = 0, migrated = 0;
+ int *tmp_old_nodes = NULL;
+ int *tmp_new_nodes = NULL;
+ int *node_map = NULL;
+ struct task_struct *task;
+ struct mm_struct *mm = NULL;
+ size_t size = count * sizeof(tmp_old_nodes[0]);
+ struct vm_area_struct *vma;
+ nodemask_t old_node_mask, new_node_mask;
+
+ if ((count < 1) || (count > MAX_NUMNODES))
+ goto out_einval;
+
+ tmp_old_nodes = kmalloc(size, GFP_KERNEL);
+ tmp_new_nodes = kmalloc(size, GFP_KERNEL);
+ node_map = kmalloc(MAX_NUMNODES*sizeof(node_map[0]), GFP_KERNEL);
+
+ if (!tmp_old_nodes || !tmp_new_nodes || !node_map) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (copy_from_user(tmp_old_nodes, (void __user *)old_nodes, size) ||
+ copy_from_user(tmp_new_nodes, (void __user *)new_nodes, size)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ if (nodes_invalid(tmp_old_nodes, count) ||
+ nodes_invalid(tmp_new_nodes, count))
+ goto out_einval;
+
+ nodes_clear(old_node_mask);
+ nodes_clear(new_node_mask);
+ for (i = 0; i < count; i++) {
+ node_set(tmp_old_nodes[i], old_node_mask);
+ node_set(tmp_new_nodes[i], new_node_mask);
+
+ }
+
+ if (nodes_intersects(old_node_mask, new_node_mask))
+ goto out_einval;
+
+ read_lock(&tasklist_lock);
+ task = find_task_by_pid(pid);
+ if (task) {
+ task_lock(task);
+ mm = task->mm;
+ if (mm)
+ atomic_inc(&mm->mm_users);
+ task_unlock(task);
+ } else {
+ ret = -ESRCH;
+ read_unlock(&tasklist_lock);
+ goto out;
+ }
+ read_unlock(&tasklist_lock);
+ if (!mm)
+ goto out_einval;
+
+ /* set up the node_map array */
+ for (i = 0; i < MAX_NUMNODES; i++)
+ node_map[i] = -1;
+ for (i = 0; i < count; i++)
+ node_map[tmp_old_nodes[i]] = tmp_new_nodes[i];
+
+ /* prepare for lru list manipulation */
+ smp_call_function(&lru_add_drain_per_cpu, NULL, 0, 1);
+ lru_add_drain();
+
+ /* actually do the migration */
+ down_read(&mm->mmap_sem);
+ for (vma = mm->mmap; vma; vma = vma->vm_next) {
+ ret = migrate_vma(task, mm, vma, node_map);
+ if (ret < 0)
+ goto out_up_mmap_sem;
+ migrated += ret;
+ }
+ up_read(&mm->mmap_sem);
+ ret = migrated;
+
+out:
+ if (mm)
+ mmput(mm);
+
+ kfree(tmp_old_nodes);
+ kfree(tmp_new_nodes);
+ kfree(node_map);
+
+ return ret;
+
+out_einval:
+ ret = -EINVAL;
+ goto out;
+
+out_up_mmap_sem:
+ up_read(&mm->mmap_sem);
+ goto out;
+
+}
+
EXPORT_SYMBOL(generic_migrate_page);
EXPORT_SYMBOL(migrate_page_common);
EXPORT_SYMBOL(migrate_page_buffer);
Index: linux-2.6.13-rc1-mhp1-page-migration/kernel/sys_ni.c
===================================================================
--- linux-2.6.13-rc1-mhp1-page-migration.orig/kernel/sys_ni.c 2005-06-28 22:57:29.000000000 -0700
+++ linux-2.6.13-rc1-mhp1-page-migration/kernel/sys_ni.c 2005-06-30 11:17:48.000000000 -0700
@@ -40,6 +40,7 @@ cond_syscall(sys_shutdown);
cond_syscall(sys_sendmsg);
cond_syscall(sys_recvmsg);
cond_syscall(sys_socketcall);
+cond_syscall(sys_migrate_pages);
cond_syscall(sys_futex);
cond_syscall(compat_sys_futex);
cond_syscall(sys_epoll_create);
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 2.6.13-rc1 5/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (3 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 4/11] mm: manual page migration-rc4 -- add-sys_migrate_pages-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 6/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-shared-policy-fixup-rc4.patch Ray Bryant
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This patch adds code that translates the memory policy structures
as they are encountered so that they continue to represent where
memory should be allocated after the page migration has completed.
Signed-off-by: Ray Bryant <raybry@sgi.com>
include/linux/mempolicy.h | 2
mm/mempolicy.c | 122 +++++++++++++++++++++++++++++++++++++++++++++-
mm/mmigrate.c | 14 ++++-
3 files changed, 135 insertions(+), 3 deletions(-)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mempolicy.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/mempolicy.h 2005-06-24 10:57:10.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mempolicy.h 2005-06-27 12:29:06.000000000 -0700
@@ -152,6 +152,8 @@ struct mempolicy *mpol_shared_policy_loo
extern void numa_default_policy(void);
extern void numa_policy_init(void);
+extern int migrate_process_policy(struct task_struct *, int *);
+extern int migrate_vma_policy(struct vm_area_struct *, int *);
#else
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mempolicy.c 2005-06-24 10:57:10.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c 2005-06-27 12:28:33.000000000 -0700
@@ -706,7 +706,6 @@ static unsigned offset_il_node(struct me
c++;
} while (c <= target);
BUG_ON(nid >= MAX_NUMNODES);
- BUG_ON(!test_bit(nid, pol->v.nodes));
return nid;
}
@@ -1136,3 +1135,124 @@ void numa_default_policy(void)
{
sys_set_mempolicy(MPOL_DEFAULT, NULL, 0);
}
+
+/*
+ * update a node mask according to a migration request
+ */
+static void migrate_node_mask(unsigned long *new_node_mask,
+ unsigned long *old_node_mask,
+ int *node_map)
+{
+ int i;
+
+ bitmap_zero(new_node_mask, MAX_NUMNODES);
+
+ i = find_first_bit(old_node_mask, MAX_NUMNODES);
+ while(i < MAX_NUMNODES) {
+ if (node_map[i] >= 0)
+ set_bit(node_map[i], new_node_mask);
+ else
+ set_bit(i, new_node_mask);
+ i = find_next_bit(old_node_mask, MAX_NUMNODES, i+1);
+ }
+}
+
+/*
+ * update a process or vma mempolicy according to a migration request
+ */
+static struct mempolicy *
+migrate_policy(struct mempolicy *old, int *node_map)
+{
+ struct mempolicy *new;
+ DECLARE_BITMAP(old_nodes, MAX_NUMNODES);
+ DECLARE_BITMAP(new_nodes, MAX_NUMNODES);
+ struct zone *z;
+ int i;
+
+ new = kmem_cache_alloc(policy_cache, GFP_KERNEL);
+ if (!new)
+ return ERR_PTR(-ENOMEM);
+ atomic_set(&new->refcnt, 0);
+ switch(old->policy) {
+ case MPOL_DEFAULT:
+ BUG();
+ case MPOL_INTERLEAVE:
+ migrate_node_mask(new->v.nodes, old->v.nodes, node_map);
+ break;
+ case MPOL_PREFERRED:
+ if (old->v.preferred_node>=0 &&
+ (node_map[old->v.preferred_node] >= 0))
+ new->v.preferred_node = node_map[old->v.preferred_node];
+ else
+ new->v.preferred_node = old->v.preferred_node;
+ break;
+ case MPOL_BIND:
+ bitmap_zero(old_nodes, MAX_NUMNODES);
+ for (i = 0; (z = old->v.zonelist->zones[i]) != NULL; i++)
+ set_bit(z->zone_pgdat->node_id, old_nodes);
+ migrate_node_mask(new_nodes, old_nodes, node_map);
+ new->v.zonelist = bind_zonelist(new_nodes);
+ if (!new->v.zonelist) {
+ kmem_cache_free(policy_cache, new);
+ return ERR_PTR(-ENOMEM);
+ }
+ }
+ new->policy = old->policy;
+ return new;
+}
+
+/*
+ * update a process mempolicy based on a migration request
+ */
+int migrate_process_policy(struct task_struct *task, int *node_map)
+{
+ struct mempolicy *new, *old = task->mempolicy;
+ int tmp;
+
+ if ((!old) || (old->policy == MPOL_DEFAULT))
+ return 0;
+
+ new = migrate_policy(task->mempolicy, node_map);
+ if (IS_ERR(new))
+ return (PTR_ERR(new));
+
+ mpol_get(new);
+ task->mempolicy = new;
+ mpol_free(old);
+
+ if (task->mempolicy->policy == MPOL_INTERLEAVE) {
+ /*
+ * If the task is still running and allocating storage, this
+ * is racy, but there is not much that can be done about it.
+ * In the worst case, this will allow an allocation of one
+ * page under the original policy (not the "new" one above).
+ * Since we update policies according to the migration,
+ * then migrate pages, that page should still get migrated
+ * correctly.
+ */
+ tmp = task->il_next;
+ if (node_map[tmp] >= 0)
+ task->il_next = node_map[tmp];
+ }
+
+ return 0;
+
+}
+
+/*
+ * update a vma mempolicy based on a migration request
+ */
+int migrate_vma_policy(struct vm_area_struct *vma, int *node_map)
+{
+
+ struct mempolicy *new;
+
+ if (!vma->vm_policy || vma->vm_policy->policy == MPOL_DEFAULT)
+ return 0;
+
+ new = migrate_policy(vma->vm_policy, node_map);
+ if (IS_ERR(new))
+ return (PTR_ERR(new));
+
+ return(policy_vma(vma, new));
+}
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c 2005-06-24 11:01:44.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c 2005-06-27 12:26:56.000000000 -0700
@@ -25,6 +25,7 @@
#include <linux/mmigrate.h>
#include <linux/delay.h>
#include <linux/nodemask.h>
+#include <linux/mempolicy.h>
#include <asm/bitops.h>
/*
@@ -598,13 +599,17 @@ migrate_vma(struct task_struct *task, st
{
struct page *page, *page2;
unsigned long vaddr;
- int count = 0, nr_busy;
+ int rc, count = 0, nr_busy;
LIST_HEAD(pglist);
/* can't migrate mlock()'d pages */
if (vma->vm_flags & VM_LOCKED)
return 0;
+ /* update the vma mempolicy, if needed */
+ rc = migrate_vma_policy(vma, node_map);
+ if (rc < 0)
+ return rc;
/*
* gather all of the pages to be migrated from this vma into pglist
*/
@@ -735,9 +740,14 @@ sys_migrate_pages(pid_t pid, __u32 count
node_map[tmp_old_nodes[i]] = tmp_new_nodes[i];
/* prepare for lru list manipulation */
- smp_call_function(&lru_add_drain_per_cpu, NULL, 0, 1);
+ smp_call_function(&lru_add_drain_per_cpu, NULL, 0, 1);
lru_add_drain();
+ /* update the process mempolicy, if needed */
+ ret = migrate_process_policy(task, node_map);
+ if (ret < 0)
+ goto out;
+
/* actually do the migration */
down_read(&mm->mmap_sem);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 2.6.13-rc1 6/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-shared-policy-fixup-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (4 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 5/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 7/11] mm: manual page migration-rc4 -- add-mempolicy-control-rc4.patch Ray Bryant
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Andi Kleen, Dave Hansen, Marcelo Tosatti
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This code fixes a problem with migrating mempolicies for shared
objects (System V shared memory, tmpfs, etc) that Andi Kleen pointed
out in his review of the -rc3 version of the page migration code.
As currently implemented, this only really matters for System V shared
memory, since AFAIK that is the only shared object that has its own
vma->vm_policy->policy code. As code is added for the other cases,
the code below will work with the other shared objects.
One can argue that since the shared object exists outside of the
application, that one shouldn't migrate it at all. The approach taken
here is that if a shared object is mapped into the address space of a
process that is being migrated, then the mapped pages of the shared object
should be migrated with the process. (Pages in the shared object that are
not mapped will not be migrated. This is not perfect, but so it goes.)
Signed-off-by: Ray Bryant <raybry@sgi.com>
mempolicy.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mempolicy.c 2005-06-27 12:28:33.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c 2005-06-27 12:29:19.000000000 -0700
@@ -1245,12 +1245,13 @@ int migrate_process_policy(struct task_s
int migrate_vma_policy(struct vm_area_struct *vma, int *node_map)
{
+ struct mempolicy *old = get_vma_policy(vma, vma->vm_start);
struct mempolicy *new;
- if (!vma->vm_policy || vma->vm_policy->policy == MPOL_DEFAULT)
+ if (old->policy == MPOL_DEFAULT)
return 0;
- new = migrate_policy(vma->vm_policy, node_map);
+ new = migrate_policy(old, node_map);
if (IS_ERR(new))
return (PTR_ERR(new));
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 2.6.13-rc1 7/11] mm: manual page migration-rc4 -- add-mempolicy-control-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (5 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 6/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-shared-policy-fixup-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 8/11] mm: manual page migration-rc4 -- sys_migrate_pages-migration-selection-rc4.patch Ray Bryant
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen, Dave Hansen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This patch allows a process to override the default kernel memory
migration policy (invoked via migrate_pages()) on a mapped file
by mapped file basis.
The default policy is to migrate all anonymous VMAs and all other
VMAs that have the VM_WRITE bit set. (See the patch:
sys_migrate_pages-migration-selection-rc4.patch
for details on how the default policy is implemented.)
This policy does not cause the program executable or any mapped
user data files that are mapped R/O to be migrated. These problems
can be detected and fixed in the user-level migration application,
but that user code needs an interface to do the "fix". This patch
supplies that interface via an extension to the mbind() system call.
The interface is as follows:
mbind(start, length, 0, 0, 0, MPOL_MF_DO_MMIGRATE)
mbind(start, length, 0, 0, 0, MPOL_MF_DO_NOT_MMIGRATE)
These calls override the default kernel policy in
favor of the policy specified. These call cause the bits
AS_DO_MMIGRATE (or AS_DO_NOT_MMIGRATE) to be set in the
memory object pointed to by the VMA at the specified addresses
in the current process's address space. Setting such a "deep"
attribute is required so that the modification can be seen by
all address spaces that map the object.
The bits set by the above call are "sticky" in the sense that
they will remain set so long as the memory object exists. To
return the migration policy for that memory object to its
default setting one issues the following system call:
mbind(start, length, 0, 0, 0, MPOL_MF_MMIGRATE_DEFAULT)
The system call:
get_mempolicy(&policy, NULL, 0, (int *)start, (long) MPOL_F_MMIGRATE)
returns the policy migration bits from the memory object in the bottom
two bits of "policy".
Typical use by the user-level manual page migration code would
be to:
(1) Identify the file name whose migration policy needs modified.
(2) Open and mmap() the file into the current address space.
(3) Issue the appropriate mbind() call from the above list.
(4) (Assuming a successful return), unmap() and close the file.
Note well that this interface allows the memory migration process
to modify the migration policy on a file-by-file basis for all proceses
that mmap() the specified file. This has two implications:
(1) All VMAs that map to the specified memory object will have
the same migration policy applied. There is no way to
specify a distinct migration policy for one of the VMAs that
map the file.
(2) The migration policy for anonymous memory cannot be changed,
since there is no memory object (where the migration policy
bits are stored) in that case.
To date, we have yet to identify any case where these restrictions
would need to be overcome in the manual page migration case.
Signed-off-by: Ray Bryant <raybry@sgi.com>
--
include/linux/mempolicy.h | 18 +++++++++
include/linux/pagemap.h | 4 ++
mm/mempolicy.c | 84 ++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 103 insertions(+), 3 deletions(-)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mempolicy.c 2005-06-13 11:47:46.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c 2005-06-13 12:20:12.000000000 -0700
@@ -76,6 +76,7 @@
#include <linux/init.h>
#include <linux/compat.h>
#include <linux/mempolicy.h>
+#include <linux/pagemap.h>
#include <asm/tlbflush.h>
#include <asm/uaccess.h>
@@ -354,6 +355,54 @@ static int mbind_range(struct vm_area_st
return err;
}
+static int mbind_migration_policy(struct mm_struct *mm, unsigned long start,
+ unsigned long end, unsigned flags)
+{
+ struct vm_area_struct *first, *vma;
+ struct address_space *as;
+ int err = 0;
+
+ /* only one of these bits may be set */
+ if (hweight_long(flags & (MPOL_MF_MMIGRATE_MASK)) > 1)
+ return -EINVAL;
+
+ down_read(&mm->mmap_sem);
+ first = find_vma(mm, start);
+ if (!first) {
+ err = -EFAULT;
+ goto out;
+ }
+ for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) {
+ if (!vma->vm_file)
+ continue;
+ as = vma->vm_file->f_mapping;
+ BUG_ON(!as);
+ switch (flags & MPOL_MF_MMIGRATE_MASK) {
+ case MPOL_MF_DO_MMIGRATE:
+ /* only one of these bits may be set */
+ if (test_bit(AS_DO_NOT_MMIGRATE, &as->flags))
+ clear_bit(AS_DO_NOT_MMIGRATE, &as->flags);
+ set_bit(AS_DO_MMIGRATE, &as->flags);
+ break;
+ case MPOL_MF_DO_NOT_MMIGRATE:
+ /* only one of these bits may be set */
+ if (test_bit(AS_DO_MMIGRATE, &as->flags))
+ clear_bit(AS_DO_MMIGRATE, &as->flags);
+ set_bit(AS_DO_NOT_MMIGRATE, &as->flags);
+ break;
+ case MPOL_MF_MMIGRATE_DEFAULT:
+ clear_bit(AS_DO_MMIGRATE, &as->flags);
+ clear_bit(AS_DO_NOT_MMIGRATE, &as->flags);
+ break;
+ default:
+ BUG();
+ }
+ }
+out:
+ up_read(&mm->mmap_sem);
+ return err;
+}
+
/* Change policy for a memory range */
asmlinkage long sys_mbind(unsigned long start, unsigned long len,
unsigned long mode,
@@ -367,7 +416,7 @@ asmlinkage long sys_mbind(unsigned long
DECLARE_BITMAP(nodes, MAX_NUMNODES);
int err;
- if ((flags & ~(unsigned long)(MPOL_MF_STRICT)) || mode > MPOL_MAX)
+ if ((flags & ~(unsigned long)(MPOL_MF_MASK)) || mode > MPOL_MAX)
return -EINVAL;
if (start & ~PAGE_MASK)
return -EINVAL;
@@ -380,6 +429,12 @@ asmlinkage long sys_mbind(unsigned long
if (end == start)
return 0;
+ if (flags & MPOL_MF_MMIGRATE_MASK)
+ return mbind_migration_policy(mm, start, end, flags);
+
+ if (mode == MPOL_DEFAULT)
+ flags &= ~MPOL_MF_STRICT;
+
err = get_nodes(nodes, nmask, maxnode, mode);
if (err)
return err;
@@ -492,17 +547,40 @@ asmlinkage long sys_get_mempolicy(int __
struct vm_area_struct *vma = NULL;
struct mempolicy *pol = current->mempolicy;
- if (flags & ~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR))
+ if (flags & ~(unsigned long)(MPOL_F_MASK))
return -EINVAL;
+ if ((flags & (MPOL_F_NODE | MPOL_F_ADDR)) &&
+ (flags & MPOL_F_MMIGRATE))
+ return -EINVAL;
if (nmask != NULL && maxnode < MAX_NUMNODES)
return -EINVAL;
- if (flags & MPOL_F_ADDR) {
+ if ((flags & MPOL_F_ADDR) || (flags & MPOL_F_MMIGRATE)) {
down_read(&mm->mmap_sem);
vma = find_vma_intersection(mm, addr, addr+1);
if (!vma) {
up_read(&mm->mmap_sem);
return -EFAULT;
}
+ if (flags & MPOL_F_MMIGRATE) {
+ struct address_space *as;
+ err = 0;
+ if (!vma->vm_file) {
+ err = -EINVAL;
+ goto out;
+ }
+ as = vma->vm_file->f_mapping;
+ BUG_ON(!as);
+ pval = 0;
+ if (test_bit(AS_DO_MMIGRATE, &as->flags))
+ pval |= MPOL_MF_DO_MMIGRATE;
+ if (test_bit(AS_DO_NOT_MMIGRATE, &as->flags))
+ pval |= MPOL_MF_DO_NOT_MMIGRATE;
+ if (policy && put_user(pval, policy)) {
+ err = -EFAULT;
+ goto out;
+ }
+ goto out;
+ }
if (vma->vm_ops && vma->vm_ops->get_policy)
pol = vma->vm_ops->get_policy(vma, addr);
else
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mempolicy.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/mempolicy.h 2005-06-13 11:47:46.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mempolicy.h 2005-06-13 11:48:53.000000000 -0700
@@ -19,9 +19,27 @@
/* Flags for get_mem_policy */
#define MPOL_F_NODE (1<<0) /* return next IL mode instead of node mask */
#define MPOL_F_ADDR (1<<1) /* look up vma using address */
+#define MPOL_F_MMIGRATE (1<<2) /* return migration policy flags */
+
+#define MPOL_F_MASK (MPOL_F_NODE | MPOL_F_ADDR | MPOL_F_MMIGRATE)
/* Flags for mbind */
#define MPOL_MF_STRICT (1<<0) /* Verify existing pages in the mapping */
+/* FUTURE USE (1<<1) RESERVE for MPOL_MF_MOVE */
+/* Flags to set the migration policy for a memory range
+ * By default the kernel will memory migrate all writable VMAs
+ * (this includes anonymous memory) and the program exectuable.
+ * For non-anonymous memory, the user can change the default
+ * actions using the following flags to mbind:
+ */
+#define MPOL_MF_DO_MMIGRATE (1<<2) /* migrate pages of this mem object */
+#define MPOL_MF_DO_NOT_MMIGRATE (1<<3) /* don't migrate any of these pages */
+#define MPOL_MF_MMIGRATE_DEFAULT (1<<4) /* reset back to kernel default */
+
+#define MPOL_MF_MASK (MPOL_MF_STRICT | MPOL_MF_DO_MMIGRATE | \
+ MPOL_MF_DO_NOT_MMIGRATE | MPOL_MF_MMIGRATE_DEFAULT)
+#define MPOL_MF_MMIGRATE_MASK (MPOL_MF_DO_MMIGRATE | \
+ MPOL_MF_DO_NOT_MMIGRATE | MPOL_MF_MMIGRATE_DEFAULT)
#ifdef __KERNEL__
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/pagemap.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/pagemap.h 2005-06-13 11:47:46.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/pagemap.h 2005-06-13 11:48:53.000000000 -0700
@@ -19,6 +19,10 @@
#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */
#define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */
+/* (manual) memory migration control flags. set via mbind() in mempolicy.c */
+#define AS_DO_MMIGRATE (__GFP_BITS_SHIFT + 2) /* migrate pages */
+#define AS_DO_NOT_MMIGRATE (__GFP_BITS_SHIFT + 3) /* don't migrate any pages */
+
static inline unsigned int __nocast mapping_gfp_mask(struct address_space * mapping)
{
return mapping->flags & __GFP_BITS_MASK;
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 2.6.13-rc1 8/11] mm: manual page migration-rc4 -- sys_migrate_pages-migration-selection-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (6 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 7/11] mm: manual page migration-rc4 -- add-mempolicy-control-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 9/11] mm: manual page migration-rc4 -- sys_migrate_pages-cpuset-support-rc4.patch Ray Bryant
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This patch allows a process to override the default kernel memory
migration policy (invoked via migrate_pages()) on a mapped file
by mapped file basis.
The default policy is to migrate all anonymous VMAs and all other
VMAs that have the VM_WRITE bit set. (See the patch:
sys_migrate_pages-migration-selection-rc4.patch
for details on how the default policy is implemented.)
This policy does not cause the program executable or any mapped
user data files that are mapped R/O to be migrated. These problems
can be detected and fixed in the user-level migration application,
but that user code needs an interface to do the "fix". This patch
supplies that interface via an extension to the mbind() system call.
The interface is as follows:
mbind(start, length, 0, 0, 0, MPOL_MF_DO_MMIGRATE)
mbind(start, length, 0, 0, 0, MPOL_MF_DO_NOT_MMIGRATE)
These calls override the default kernel policy in
favor of the policy specified. These call cause the bits
AS_DO_MMIGRATTE (or AS_DO_NOT_MMIGRATE) to be set in the
memory object pointed to by the VMA at the specified addresses
in the current process's address space. Setting such a "deep"
attribute is required so that the modification can be seen by
all address spaces that map the object.
The bits set by the above call are "sticky" in the sense that
they will remain set so long as the memory object exists. To
return the migration policy for that memory object to its
default setting is done by the following system call:
mbind(start, length, 0, 0, 0, MPOL_MF_MMIGRATE_DEFAULT)
The system call:
get_mempolicy(&policy, NULL, 0, (int *)start, (long) MPOL_F_MMIGRATE)
returns the policy migration bits from the memory object in the bottom
two bits of "policy".
Typical use by the user-level manual page migration code would
be to:
(1) Identify the file name whose migration policy needs modified.
(2) Open and mmap() the file into the current address space.
(3) Issue the appropriate mbind() call from the above list.
(4) (Assuming a successful return), unmap() and close the file.
Signed-off-by: Ray Bryant <raybry@sgi.com>
mmigrate.c | 31 ++++++++++++++++++++++---------
1 files changed, 22 insertions(+), 9 deletions(-)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c 2005-06-24 07:40:32.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c 2005-06-24 07:44:12.000000000 -0700
@@ -601,25 +601,38 @@ migrate_vma(struct task_struct *task, st
unsigned long vaddr;
int rc, count = 0, nr_busy;
LIST_HEAD(pglist);
+ struct address_space *as = NULL;
- /* can't migrate mlock()'d pages */
- if (vma->vm_flags & VM_LOCKED)
+ /* can't migrate these kinds of VMAs */
+ if ((vma->vm_flags & VM_LOCKED) || (vma->vm_flags & VM_IO))
return 0;
+ /* we always migrate anonymous pages */
+ if (!vma->vm_file)
+ goto do_migrate;
+ as = vma->vm_file->f_mapping;
+ /* we have to have both AS_DO_MMIGRATE and AS_DO_MOT_MMIGRATE to
+ * give user space full ability to override the kernel's default
+ * migration decisions */
+ if (test_bit(AS_DO_MMIGRATE, &as->flags))
+ goto do_migrate;
+ if (test_bit(AS_DO_NOT_MMIGRATE, &as->flags))
+ return 0;
+ if (!(vma->vm_flags & VM_WRITE))
+ return 0;
+
+do_migrate:
/* update the vma mempolicy, if needed */
rc = migrate_vma_policy(vma, node_map);
if (rc < 0)
return rc;
- /*
- * gather all of the pages to be migrated from this vma into pglist
- */
+
+ /* gather all of the pages to be migrated from this vma into pglist */
spin_lock(&mm->page_table_lock);
for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
page = follow_page(mm, vaddr, 0);
- /*
- * follow_page has been known to return pages with zero mapcount
- * and NULL mapping. Skip those pages as well
- */
+ /* follow_page has been known to return pages with zero mapcount
+ * and NULL mapping. Skip those pages as well */
if (!page || !page_mapcount(page))
continue;
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 2.6.13-rc1 9/11] mm: manual page migration-rc4 -- sys_migrate_pages-cpuset-support-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (7 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 8/11] mm: manual page migration-rc4 -- sys_migrate_pages-migration-selection-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 10/11] mm: manual page migration-rc4 -- sys_migrate_pages-permissions-check-rc4.patch Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 11/11] mm: manual page migration-rc4 -- N1.2-add-nodemap-to-try_to_migrate_pages-call.patch Ray Bryant
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Andi Kleen, Dave Hansen, Marcelo Tosatti
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This patch adds cpuset support to the migrate_pages() system call.
The idea of this patch is that in order to do a migration:
(1) The target task needs to be able to allocate pages on the
nodes that are being migrated to.
(2) However, the actual allocation of pages is not done by
the target task. Allocation is done by the task that is
running the migrate_pages() system call. Since it is
expected that the migration will be done by a batch manager
of some kind that is authorized to control the jobs running
in an enclosing cpuset, we make the requirement that the
current task ALSO must be able to allocate pages on the
nodes that are being migrated to.
Note well that if cpusets are not configured, the call to
cpuset_migration_allowed() gets optimizied away.
Signed-off-by: Ray Bryant <raybry@sgi.com>
include/linux/cpuset.h | 8 +++++++-
kernel/cpuset.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
mm/mmigrate.c | 15 +++++++++++----
3 files changed, 65 insertions(+), 6 deletions(-)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/cpuset.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/cpuset.h 2005-06-24 10:56:43.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/cpuset.h 2005-06-24 11:01:59.000000000 -0700
@@ -4,7 +4,7 @@
* cpuset interface
*
* Copyright (C) 2003 BULL SA
- * Copyright (C) 2004 Silicon Graphics, Inc.
+ * Copyright (C) 2004-2005 Silicon Graphics, Inc.
*
*/
@@ -26,6 +26,7 @@ int cpuset_zonelist_valid_mems_allowed(s
int cpuset_zone_allowed(struct zone *z);
extern struct file_operations proc_cpuset_operations;
extern char *cpuset_task_status_allowed(struct task_struct *task, char *buffer);
+extern int cpuset_migration_allowed(nodemask_t, struct task_struct *);
#else /* !CONFIG_CPUSETS */
@@ -59,6 +60,11 @@ static inline char *cpuset_task_status_a
return buffer;
}
+static inline int cpuset_migration_allowed(int *nodes, struct task *task)
+{
+ return 1;
+}
+
#endif /* !CONFIG_CPUSETS */
#endif /* _LINUX_CPUSET_H */
Index: linux-2.6.12-rc5-mhp1-page-migration-export/kernel/cpuset.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/kernel/cpuset.c 2005-06-24 10:56:43.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/kernel/cpuset.c 2005-06-24 11:01:59.000000000 -0700
@@ -4,7 +4,7 @@
* Processor and Memory placement constraints for sets of tasks.
*
* Copyright (C) 2003 BULL SA.
- * Copyright (C) 2004 Silicon Graphics, Inc.
+ * Copyright (C) 2004-2005 Silicon Graphics, Inc.
*
* Portions derived from Patrick Mochel's sysfs code.
* sysfs is Copyright (c) 2001-3 Patrick Mochel
@@ -1500,6 +1500,52 @@ int cpuset_zone_allowed(struct zone *z)
node_isset(z->zone_pgdat->node_id, current->mems_allowed);
}
+/**
+ * cpuset_mems_allowed - return mems_allowed mask from a tasks cpuset.
+ * @tsk: pointer to task_struct from which to obtain cpuset->mems_allowed.
+ *
+ * Description: Returns the nodemask_t mems_allowed of the cpuset
+ * attached to the specified @tsk.
+ *
+ **/
+
+static const nodemask_t cpuset_mems_allowed(const struct task_struct *tsk)
+{
+ nodemask_t mask;
+
+ down(&cpuset_sem);
+ task_lock((struct task_struct *)tsk);
+ guarantee_online_mems(tsk->cpuset, &mask);
+ task_unlock((struct task_struct *)tsk);
+ up(&cpuset_sem);
+
+ return mask;
+}
+
+/**
+ * cpuset_migration_allowed(int *nodes, struct task_struct *tsk)
+ * @mask: pointer to nodemask of nodes to be migrated to
+ * @tsk: pointer to task struct of task being migrated
+ *
+ * Description: Returns true if the migration should be allowed.
+ *
+ */
+int cpuset_migration_allowed(nodemask_t mask, struct task_struct *tsk)
+{
+ nodemask_t current_nodes_allowed, target_nodes_allowed;
+ current_nodes_allowed = cpuset_mems_allowed(current);
+
+ /* Obviously, the target task needs to be able to allocate on
+ * the new set of nodes. However, the migrated pages will
+ * actually be allocated by the current task, so the current
+ * task has to be able to allocate on those nodes as well */
+ target_nodes_allowed = cpuset_mems_allowed(tsk);
+ if (!nodes_subset(mask, current_nodes_allowed) ||
+ !nodes_subset(mask, target_nodes_allowed))
+ return 0;
+ return 1;
+}
+
/*
* proc_cpuset_show()
* - Print tasks cpuset path into seq_file.
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c 2005-06-24 11:01:59.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c 2005-06-24 11:02:20.000000000 -0700
@@ -26,6 +26,7 @@
#include <linux/delay.h>
#include <linux/nodemask.h>
#include <linux/mempolicy.h>
+#include <linux/cpuset.h>
#include <asm/bitops.h>
/*
@@ -690,7 +691,7 @@ sys_migrate_pages(pid_t pid, __u32 count
int *tmp_old_nodes = NULL;
int *tmp_new_nodes = NULL;
int *node_map = NULL;
- struct task_struct *task;
+ struct task_struct *task = NULL;
struct mm_struct *mm = NULL;
size_t size = count * sizeof(tmp_old_nodes[0]);
struct vm_area_struct *vma;
@@ -734,8 +735,10 @@ sys_migrate_pages(pid_t pid, __u32 count
if (task) {
task_lock(task);
mm = task->mm;
- if (mm)
+ if (mm) {
atomic_inc(&mm->mm_users);
+ get_task_struct(task);
+ }
task_unlock(task);
} else {
ret = -ESRCH;
@@ -746,7 +749,9 @@ sys_migrate_pages(pid_t pid, __u32 count
if (!mm)
goto out_einval;
- /* set up the node_map array */
+ if (!cpuset_migration_allowed(new_node_mask, task))
+ goto out_einval;
+
for (i = 0; i < MAX_NUMNODES; i++)
node_map[i] = -1;
for (i = 0; i < count; i++)
@@ -773,8 +778,10 @@ sys_migrate_pages(pid_t pid, __u32 count
ret = migrated;
out:
- if (mm)
+ if (mm) {
mmput(mm);
+ put_task_struct(task);
+ }
kfree(tmp_old_nodes);
kfree(tmp_new_nodes);
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 2.6.13-rc1 10/11] mm: manual page migration-rc4 -- sys_migrate_pages-permissions-check-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (8 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 9/11] mm: manual page migration-rc4 -- sys_migrate_pages-cpuset-support-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 11/11] mm: manual page migration-rc4 -- N1.2-add-nodemap-to-try_to_migrate_pages-call.patch Ray Bryant
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen, Dave Hansen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
Add permissions checking to migrate_pages() system call.
The basic idea is that if you could send an arbitary
signal to a process then you are allowed to migrate
that process, or if the calling process has capability
CAP_SYS_ADMIN. The permissions check is based
on that in check_kill_permission() in kernel/signal.c.
Signed-off-by: Ray Bryant <raybry@sgi.com>
include/linux/capability.h | 2 ++
mm/mmigrate.c | 12 ++++++++++++
2 files changed, 14 insertions(+)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/capability.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/capability.h 2005-06-24 11:02:20.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/capability.h 2005-06-24 11:02:30.000000000 -0700
@@ -233,6 +233,8 @@ typedef __u32 kernel_cap_t;
/* Allow enabling/disabling tagged queuing on SCSI controllers and sending
arbitrary SCSI commands */
/* Allow setting encryption key on loopback filesystem */
+/* Allow using the migrate_pages() system call to migrate a process's pages
+ from one set of NUMA nodes to another */
#define CAP_SYS_ADMIN 21
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c 2005-06-24 11:02:20.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c 2005-06-24 11:02:30.000000000 -0700
@@ -15,6 +15,8 @@
#include <linux/module.h>
#include <linux/swap.h>
#include <linux/pagemap.h>
+#include <linux/sched.h>
+#include <linux/capability.h>
#include <linux/init.h>
#include <linux/highmem.h>
#include <linux/writeback.h>
@@ -734,6 +736,16 @@ sys_migrate_pages(pid_t pid, __u32 count
task = find_task_by_pid(pid);
if (task) {
task_lock(task);
+ /* does this task have permission to migrate that task?
+ * (ala check_kill_permission() ) */
+ if ((current->euid ^ task->suid) && (current->euid ^ task->uid)
+ && (current->uid ^ task->suid) && (current->uid ^ task->uid)
+ && !capable(CAP_SYS_ADMIN)) {
+ ret = -EPERM;
+ task_unlock(task);
+ read_unlock(&tasklist_lock);
+ goto out;
+ }
mm = task->mm;
if (mm) {
atomic_inc(&mm->mm_users);
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 2.6.13-rc1 11/11] mm: manual page migration-rc4 -- N1.2-add-nodemap-to-try_to_migrate_pages-call.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (9 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 10/11] mm: manual page migration-rc4 -- sys_migrate_pages-permissions-check-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
Manual page migration adds a nodemap arg to try_to_migrate_pages().
The nodemap specifies where pages found on a particular node are to
be migrated. If all you want to do is to migrate the page off of
the current node, then you specify the nodemap argument as NULL.
Add the NULL to the try_to_migrate_pages() invocation.
This patch should be added to the Memory Hotplug series after patch
N1.1-pass-page_list-to-steal_page.patch (for 2.6.12-rc5-mhp1).
Signed-off-by: Ray Bryant <raybry@sgi.com>
--
page_alloc.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6.12-rc5-mhp1-memory-hotplug/mm/page_alloc.c
===================================================================
--- linux-2.6.12-rc5-mhp1-memory-hotplug.orig/mm/page_alloc.c 2005-06-21 10:43:14.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-memory-hotplug/mm/page_alloc.c 2005-06-21 10:43:14.000000000 -0700
@@ -823,7 +823,7 @@ retry:
on_each_cpu(lru_drain_schedule, NULL, 1, 1);
rest = grab_capturing_pages(&page_list, start_pfn, nr_pages);
- remains = try_to_migrate_pages(&page_list);
+ remains = try_to_migrate_pages(&page_list, NULL);
if (rest || !list_empty(&page_list)) {
if (remains == -ENOSPC) {
/* A swap device should be added. */
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread