* [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview
@ 2005-07-01 22:40 Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 1/11] mm: hirokazu-steal_page_from_lru.patch Ray Bryant
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:40 UTC (permalink / raw)
To: Hirokazu Takahashi, Andi Kleen, Dave Hansen, Marcelo Tosatti
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
Summary
-------
This is the -rc4 version of the manual page migration facility
that I proposed in February and that was discussed on the
linux-mm mailing list. This overview is relatively short since
the overview is effectively unchanged from what I submitted on
April 6, 2005. For details, see the overview I sent out then at:
http://marc.theaimsgroup.com/?l=linux-mm&m=111276123522952&w=2
For details of the -rc2 version of this patcheset, see:
http://marc.theaimsgroup.com/?l=linux-mm&m=111578651020174&w=2
And the -rc3 version is at:
http://marc.theaimsgroup.com/?l=linux-mm&m=111945947315561&w=2
This patch set differs from the previous patchset in the following:
(1) The previous patch was based on 2.6.12-rc5-mhp1, this patchset
is based on patch-2.6.13-rc1-mhp1-pm.gz from www.sr71.net/patches/
2.6.12 (of the Memory Hotplug project patchset maintained by
Dave Hansen).
(2) Changes suggested by Dave Hansen, Hirokazu Takahashi, and
Andi Kleen have been incorporated into this patchset.
If this patch is acceptable to the Memory Hotplug Team, I'd like
to see it added to the page migration sequence of patches in
the memory hotplug patch.
This patch adds a parameter to try_to_migrate_pages().
The last patch of this series:
N1.2-add-nodemap-to-try_to_migrate_pages-call.patch
Should be inserted in the memory hotplug patcheset after the
patch: N1.1-pass-page_list-to-steal_page.patch to fixup
the call to try_to_migrate_pages() from capture_page_range()
in mm/page_alloc.c.
This is the last version of the manual page migration patch
that I will be providing. As of today, Christoph Lameter
(clameter@sgi.com), will be responsible for handling this
patchset from an SGI perspective.
Suggestions, flames, etc should be directed to Christoph
at clameter@sgi.com. I can also continue to be reached
at raybry@austin.rr.com for discussion of this patchset.
Description of the patches in this patchset
-------------------------------------------
Recall that all of these patches apply to 2.6.13-rc1 with the
page-migration patches applied first. The simplest way to do
this is to obtain the Memory Hotplug broken out patches from
http://sr71.net/patches/2.6.12/2.6.12-rc5-mhp1/broken-out-2.6.12-rc5-mhp1.tar.gz
And then to add patches 1-10 of this patchset to the series file
after the patch "AA-PM-99-x86_64-IMMOVABLE.patch". (Patch 11
goes after N1.1-pass-page_list-to-steal_page.patch.) Then apply all
patches up through the 10th patch of this set and turn on the
CONFIG_MEMORY_MIGRATE option. This works on Altix, at least;
that is the only NUMA machine I have access to at the moment.
The 11th patch is only needed if you want to try to build the
entire mhp1 patchset after applying the manual page migration
patches.
Patch 1: hirokazu-steal_page_from_lru.patch
This patch (due to Hirokazu Tokahashi) simplifies the interface
to steal_page_from_lru() and is not yet present in the 2.6.12-rc5-mhp1
patchset. Unchanged since -rc3.
Patch 2: xfs-migrate-page-rc4.patch
This patch, due to Nathan Scott at SGI, provides a migrate_
page method for XFS. EXT2 and EXT3 already have such methods.
Unchanged from -rc3.
Patch 3: add-node_map-arg-to-try_to_migrate_pages-rc4.patch
This patch adds an additional argument to try_to_migrate_pages().
The additional argument controls where pages found on specific
nodes in the page_list passed into try_to_migrate_pages() are
migrated to. Unchanged from -rc3.
Patch 4: add-sys_migrate_pages-rc4.patch
This is the patch that adds the migrate_pages() system call.
This patch provides a simple version of the system call that
migrates all pages associated with a particular process, so
is really only useful for programs that are statically linked
(i. e. that don't map in any shared libraries).
The following changes have been made since -rc3:
Suggestions from Dave Hansen and Hirokazu Takahashi
have been incorporated.
Patch 5: sys_migrate_pages-mempolicy-migration-rc4.patch
This patch updates the memory policy data structures
as they are encountered in accordance with the migration
request. Unchanged from -rc3.
Patch 6: This patch is new in -rc4. This patch fixes a problem
with the mempolicy migration code for shared objects.
Andi Kleen pointed out this problem in -rc3.
Patch 7: add-mempolicy-control-rc4.patch
This patch extends the mbind() and get_mempolicy() system
calls to support the interface to override the default
kernel policy. Unchanged from -rc3.
Patch 8: sys_migrate_pages-migration-selection-rc4.patch
This patch uses the migration policy bits set by the code
from the last patch to control which mapped files are
migrated (or not). Unchanged from -rc3.
Patch 9: sys_migrate_pages-cpuset-support-rc4.patch
This patch makes migrate_pages() cooperate better with
cpusets. The following change has been made since -rc3:
The cpuset support has been split out entirely
to kernel/cpuset.c with only a single callout
from sys_migrate_pages(). This makes the
sys_migrate_pages() code cleaner. This change
was inspired by the changes Dave Hansen
suggested.
Patch 10: sys_migrate_pages-permissions-check.patch
This patch adds a permission check to make sure the
invoking process has the necessary permissions to migrate
the target task. Unchanged from -rc3.
Patch 11:N1.2-add-nodemap-to-try_to_migrate_pages-call.patch
This patch fixes the call to try_to_migrate_pages()
from capture_page_range() in mm/page_alloc.c that
is introduced in the N1.0-memsection_migrate.patch
of the memory hotplug series. Unchanged from -rc3.
Unresolved issues
-----------------
(1) This version of migrate_pages() works reliably only when the
process to be migrated has been stopped (e. g., using SIGSTOP)
before the migrate_pages() system call is executed.
(The system doesn't crash or oops, but sometimes the process
being migrated will be "Killed by VM" when it starts up again.
There may be a few messages put into the log as well at that time.)
At the moment, I am proposing that processes need to be
suspended before being migrated. This really should not
be a performance concern, since the delay imposed by page
migration far exceeds any delay imposed by SIGSTOPing the
processes before migration and SIGCONTinuing them afterward.
The problem with this approach is that there is no good way
to enforce this. (i. e. even if the process is stopped at
the start execution of the migrate_pages() system call,
there is no way to ensure it doesn't get started during
execution of the call.)
Christoph Lameter has some ideas about using PF_FREEZE to handle
this problem. I will leave further work in this area to
Christoph. I have provided some sample code to Christoph
to indicate one way to integrate his approach into the existing
page migration code.
(2) I'm still using system call #1279. On ia64 this is the
last slot in the system call table. A system call number
needs to be assigned to migrate_pages().
(3) As part of the discussion with Andi Kleen, we agreed to
provide some memory migration support under MPOL_MF_STRICT.
Currently, if one calls mbind() with the flag MPOL_MF_STRICT
set, and pages are found that don't follow the memory policy,
then the mbind() will return -EIO. Andi would like to be
able cause those pages to be migrated to the correct nodes.
This feature is not yet part of this patchset and will
be added as a distinct set of patches. I'm planning on
providing some sample code to Christoph indicating how
this might be done in the future.
Background
----------
The purpose of this set of patches is to introduce the necessary kernel
infrastructure to support "manual page migration". That phrase is
intended to describe a facility whereby some user program (most likely
a batch scheduler) is given the responsibility of managing where jobs
run on a large NUMA system. If it turns out that a job needs to be
run on a different set of nodes from where it is running now, then that
user program would invoke this facility to move the job to the new set
of nodes.
We use the word "manual" here to indicate that the facility is invoked
in a way that the kernel is told where to move things; we distinguish
this approach from "automatic page migration" facilities which have been
proposed in the past. To us, "automatic page migration" implies using
hardware counters to determine where pages should reside and having the
O/S automatically move misplaced pages. The utility of such facilities,
for example, on IRIX has, been mixed, and we are not currently proposing
such a facility for Linux.
The normal sequence of events would be as follows: A job is running
on, say nodes 5-8, and a higher priority job arrives and the only place
it can be run, for whatever reason, is nodes 5-8. Then the scheduler
would suspend the processes of the existing job (by, for example sending
them a SIGSTOP) and start the new job on those nodes. At some point in
the future, other nodes become available for use, and at this point the
batch scheduler would invoke the manual page migration facility to move
the processes of the suspended job from nodes 5-8 to the new set of nodes.
Note that not all of the pages of all of the processes will need to (or
should) be moved. For example, pages of shared libraries are likely to be
shared by many processes in the system; these pages should not be moved
merely because a few processes using these libraries have been migrated.
As discussed above, the kernel code handles this by migrating all
anonymous VMAs and all VMAs with the VM_WRITE bit set. VMAs that map
the code segments of a program don't have VM_WRITE set, so shared
library code segments will not be migrated (by default). Read-only mapped
files (e. g. files in /usr/lib for National Language support) are also
not migrated by default.
The default migration decisions of the kernel migration code can be
overridden for mmap()'d files using the extensions provided for the
mbind() system call, as described in patches 7 and 8 above. This call
can be used, for example, to cause the program executable to be migrated.
Similarly, if the user has a (non-system) data file mapped R/O, the
mbind() system call can be used to override the kernel default and cause
the mapped file to be migrated as well.
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2.6.13-rc1 1/11] mm: hirokazu-steal_page_from_lru.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
@ 2005-07-01 22:40 ` Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 2/11] mm: manual page migration-rc4 -- xfs-migrate-page-rc4.patch Ray Bryant
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:40 UTC (permalink / raw)
To: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen, Dave Hansen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
Hi Dave,
Would you apply the following patch right after
AA-PM-01-steal_page_from_lru.patch.
This patch makes steal_page_from_lru() and putback_page_to_lru()
check PageLRU() with zone->lur_lock held. Currently the process
migration code, where Ray is working on, only uses this code.
Thanks,
Hirokazu Takahashi.
Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp>
---
linux-2.6.12-rc3-taka/include/linux/mm_inline.h | 8 +++++---
1 files changed, 5 insertions, 3 deletions
diff -puN include/linux/mm_inline.h~taka-steal_page_from_lru-FIX include/linux/mm_inline.h
--- linux-2.6.12-rc3/include/linux/mm_inline.h~taka-steal_page_from_lru-FIX Mon May 23 02:26:57 2005
+++ linux-2.6.12-rc3-taka/include/linux/mm_inline.h Mon May 23 02:26:57 2005
@@ -80,9 +80,10 @@ static inline int
steal_page_from_lru(struct zone *zone, struct page *page,
struct list_head *dst)
{
- int ret;
+ int ret = 0;
spin_lock_irq(&zone->lru_lock);
- ret = __steal_page_from_lru(zone, page, dst);
+ if (PageLRU(page))
+ ret = __steal_page_from_lru(zone, page, dst);
spin_unlock_irq(&zone->lru_lock);
return ret;
}
@@ -102,7 +103,8 @@ static inline void
putback_page_to_lru(struct zone *zone, struct page *page)
{
spin_lock_irq(&zone->lru_lock);
- __putback_page_to_lru(zone, page);
+ if (!PageLRU(page))
+ __putback_page_to_lru(zone, page);
spin_unlock_irq(&zone->lru_lock);
}
_
-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Lhms-devel mailing list
Lhms-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lhms-devel
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2.6.13-rc1 2/11] mm: manual page migration-rc4 -- xfs-migrate-page-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 1/11] mm: hirokazu-steal_page_from_lru.patch Ray Bryant
@ 2005-07-01 22:40 ` Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 3/11] mm: manual page migration-rc4 -- add-node_map-arg-to-try_to_migrate_pages-rc4.patch Ray Bryant
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:40 UTC (permalink / raw)
To: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
Nathan Scott of SGI provided this patch for XFS that supports
the migrate_page method in the address_space operations vector.
It is basically the same as what is in ext2_migrate_page().
However, the routine "xfs_skip_migrate_page()" is added to
disallow migration of xfs metadata.
Signed-off-by: Ray Bryant <raybry@sgi.com>
xfs_aops.c | 10 ++++++++++
xfs_buf.c | 7 +++++++
2 files changed, 17 insertions(+)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/fs/xfs/linux-2.6/xfs_aops.c 2005-06-13 11:12:36.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/fs/xfs/linux-2.6/xfs_aops.c 2005-06-13 11:12:42.000000000 -0700
@@ -54,6 +54,7 @@
#include "xfs_iomap.h"
#include <linux/mpage.h>
#include <linux/writeback.h>
+#include <linux/mmigrate.h>
STATIC void xfs_count_page_state(struct page *, int *, int *, int *);
STATIC void xfs_convert_page(struct inode *, struct page *, xfs_iomap_t *,
@@ -1273,6 +1274,14 @@ linvfs_prepare_write(
return block_prepare_write(page, from, to, linvfs_get_block);
}
+STATIC int
+linvfs_migrate_page(
+ struct page *from,
+ struct page *to)
+{
+ return generic_migrate_page(from, to, migrate_page_buffer);
+}
+
struct address_space_operations linvfs_aops = {
.readpage = linvfs_readpage,
.readpages = linvfs_readpages,
@@ -1283,4 +1292,5 @@ struct address_space_operations linvfs_a
.commit_write = generic_commit_write,
.bmap = linvfs_bmap,
.direct_IO = linvfs_direct_IO,
+ .migrate_page = linvfs_migrate_page,
};
Index: linux-2.6.12-rc5-mhp1-page-migration-export/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/fs/xfs/linux-2.6/xfs_buf.c 2005-06-13 11:12:36.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/fs/xfs/linux-2.6/xfs_buf.c 2005-06-13 11:12:42.000000000 -0700
@@ -1626,6 +1626,12 @@ xfs_setsize_buftarg(
}
STATIC int
+xfs_skip_migrate_page(struct page *from, struct page *to)
+{
+ return -EBUSY;
+}
+
+STATIC int
xfs_mapping_buftarg(
xfs_buftarg_t *btp,
struct block_device *bdev)
@@ -1635,6 +1641,7 @@ xfs_mapping_buftarg(
struct address_space *mapping;
static struct address_space_operations mapping_aops = {
.sync_page = block_sync_page,
+ .migrate_page = xfs_skip_migrate_page,
};
inode = new_inode(bdev->bd_inode->i_sb);
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2.6.13-rc1 3/11] mm: manual page migration-rc4 -- add-node_map-arg-to-try_to_migrate_pages-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 1/11] mm: hirokazu-steal_page_from_lru.patch Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 2/11] mm: manual page migration-rc4 -- xfs-migrate-page-rc4.patch Ray Bryant
@ 2005-07-01 22:40 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 4/11] mm: manual page migration-rc4 -- add-sys_migrate_pages-rc4.patch Ray Bryant
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:40 UTC (permalink / raw)
To: Hirokazu Takahashi, Andi Kleen, Dave Hansen, Marcelo Tosatti
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This patch changes the interface to try_to_migrate_pages() so that one
can specify the nodes where the pages are to be migrated to. This is
done by adding a "node_map" argument to try_to_migrate_pages(), node_map
is of type "int *".
If this argument is NULL, then try_to_migrate_pages() behaves exactly
as before and this is the interface the rest of the memory hotplug
patch should use. (Note: This patchset does not include the changes
for the rest of the memory hotplug patch that will be necessary to use
this new interface [if it is accepted]. Those chagnes will be provided
as a distinct patch.)
If the argument is non-NULL, the node_map points at an array of shorts
of size MAX_NUMNODES. node_map[N] is either the id of an online node
or -1. If node_map[N] >=0 then pages found in the page list passed
to try_to_migrate_pages() that are found on node N are migrated to node
node_map[N]. if node_map[N] == -1, then pages found on node N are left
where they are.
This change depends on previous changes to migrate_onepage()
that support migrating a page to a specified node. These changes
are already part of the memory migration sub-patch of the memory
hotplug patch.
Signed-off-by: Ray Bryant <raybry@sgi.com>
include/linux/mmigrate.h | 11 ++++++++++-
mm/mmigrate.c | 10 ++++++----
2 files changed, 16 insertions(+), 5 deletions(-)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mmigrate.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/mmigrate.h 2005-06-10 14:47:25.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mmigrate.h 2005-06-13 10:22:22.000000000 -0700
@@ -16,7 +16,16 @@ extern int migrate_page_buffer(struct pa
extern int page_migratable(struct page *, struct page *, int,
struct list_head *);
extern struct page * migrate_onepage(struct page *, int nodeid);
-extern int try_to_migrate_pages(struct list_head *);
+extern int try_to_migrate_pages(struct list_head *, int *);
+
+static inline struct page *node_migrate_onepage(struct page *page, int *node_map)
+{
+ if (node_map)
+ return migrate_onepage(page, node_map[page_to_nid(page)]);
+ else
+ return migrate_onepage(page, MIGRATE_NODE_ANY);
+
+}
#else
static inline int generic_migrate_page(struct page *page, struct page *newpage,
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c 2005-06-10 14:47:25.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c 2005-06-13 10:22:02.000000000 -0700
@@ -501,9 +501,11 @@ out_unlock:
/*
* This is the main entry point to migrate pages in a specific region.
* If a page is inactive, the page may be just released instead of
- * migration.
+ * migration. node_map is supplied in those cases (on NUMA systems)
+ * where the caller wishes to specify to which nodes the pages are
+ * migrated. If node_map is null, the target node is MIGRATE_NODE_ANY.
*/
-int try_to_migrate_pages(struct list_head *page_list)
+int try_to_migrate_pages(struct list_head *page_list, int *node_map)
{
struct page *page, *page2, *newpage;
LIST_HEAD(pass1_list);
@@ -541,7 +543,7 @@ int try_to_migrate_pages(struct list_hea
list_for_each_entry_safe(page, page2, &pass1_list, lru) {
list_del(&page->lru);
if (PageLocked(page) || PageWriteback(page) ||
- IS_ERR(newpage = migrate_onepage(page, MIGRATE_NODE_ANY))) {
+ IS_ERR(newpage = node_migrate_onepage(page, node_map))) {
if (page_count(page) == 1) {
/* the page is already unused */
putback_page_to_lru(page_zone(page), page);
@@ -559,7 +561,7 @@ int try_to_migrate_pages(struct list_hea
*/
list_for_each_entry_safe(page, page2, &pass2_list, lru) {
list_del(&page->lru);
- if (IS_ERR(newpage = migrate_onepage(page, MIGRATE_NODE_ANY))) {
+ if (IS_ERR(newpage = node_migrate_onepage(page, node_map))) {
if (page_count(page) == 1) {
/* the page is already unused */
putback_page_to_lru(page_zone(page), page);
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2.6.13-rc1 4/11] mm: manual page migration-rc4 -- add-sys_migrate_pages-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (2 preceding siblings ...)
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 3/11] mm: manual page migration-rc4 -- add-node_map-arg-to-try_to_migrate_pages-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 5/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-rc4.patch Ray Bryant
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen, Dave Hansen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This is the main patch that creates the migrate_pages() system
call. Note that in this case, the system call number was more
or less arbitrarily assigned at 1279. This number needs to
allocated.
This patch sits on top of the page migration patches from
the Memory Hotplug project. This particular patchset is built
on top of:
http://www.sr71.net/patches/2.6.12/2.6.13-rc1-mhp1/page_migration/patch-2.6.13-rc1-mhp1-pm.gz
but it may apply on subsequent page migration patches as well.
This patch migrates all pages in the specified process (including
shared libraries.)
See the patches:
sys_migrate_pages-migration-selection-rc4.patch
add-mempolicy-control-rc4.patch
for details on the default kernel migration policy (this determines
which VMAs are actually migrated) and how this policy can be overridden
using the mbind() system call.
Updates since last release of this patchset:
Suggestions from Dave Hansen and Hirokazu Takahashi
have been incorporated.
Signed-off-by: Ray Bryant <raybry@sgi.com>
arch/ia64/kernel/entry.S | 2
kernel/sys_ni.c | 1
mm/mmigrate.c | 184 ++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 185 insertions(+), 2 deletions(-)
Index: linux-2.6.13-rc1-mhp1-page-migration/arch/ia64/kernel/entry.S
===================================================================
--- linux-2.6.13-rc1-mhp1-page-migration.orig/arch/ia64/kernel/entry.S 2005-06-28 22:57:29.000000000 -0700
+++ linux-2.6.13-rc1-mhp1-page-migration/arch/ia64/kernel/entry.S 2005-06-30 11:17:05.000000000 -0700
@@ -1582,6 +1582,6 @@ sys_call_table:
data8 sys_set_zone_reclaim
data8 sys_ni_syscall
data8 sys_ni_syscall
- data8 sys_ni_syscall
+ data8 sys_migrate_pages // 1279
.org sys_call_table + 8*NR_syscalls // guard against failures to increase NR_syscalls
Index: linux-2.6.13-rc1-mhp1-page-migration/mm/mmigrate.c
===================================================================
--- linux-2.6.13-rc1-mhp1-page-migration.orig/mm/mmigrate.c 2005-06-30 11:16:37.000000000 -0700
+++ linux-2.6.13-rc1-mhp1-page-migration/mm/mmigrate.c 2005-06-30 11:17:05.000000000 -0700
@@ -5,6 +5,9 @@
*
* Authors: IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
* Hirokazu Takahashi <taka@valinux.co.jp>
+ *
+ * sys_migrate_pages() added by Ray Bryant <raybry@sgi.com>
+ * Copyright (C) 2005, Silicon Graphics, Inc.
*/
#include <linux/config.h>
@@ -21,6 +24,8 @@
#include <linux/rmap.h>
#include <linux/mmigrate.h>
#include <linux/delay.h>
+#include <linux/nodemask.h>
+#include <asm/bitops.h>
/*
* The concept of memory migration is to replace a target page with
@@ -436,7 +441,7 @@ migrate_onepage(struct page *page, int n
if (nodeid == MIGRATE_NODE_ANY)
newpage = page_cache_alloc(mapping);
else
- newpage = alloc_pages_node(nodeid, mapping->flags, 0);
+ newpage = alloc_pages_node(nodeid, (unsigned int)mapping->flags, 0);
if (newpage == NULL) {
unlock_page(page);
return ERR_PTR(-ENOMEM);
@@ -587,6 +592,183 @@ int try_to_migrate_pages(struct list_hea
return nr_busy;
}
+static int
+migrate_vma(struct task_struct *task, struct mm_struct *mm,
+ struct vm_area_struct *vma, int *node_map)
+{
+ struct page *page, *page2;
+ unsigned long vaddr;
+ int count = 0, nr_busy;
+ LIST_HEAD(pglist);
+
+ /* can't migrate mlock()'d pages */
+ if (vma->vm_flags & VM_LOCKED)
+ return 0;
+
+ /*
+ * gather all of the pages to be migrated from this vma into pglist
+ */
+ spin_lock(&mm->page_table_lock);
+ for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
+ page = follow_page(mm, vaddr, 0);
+ /*
+ * follow_page has been known to return pages with zero mapcount
+ * and NULL mapping. Skip those pages as well
+ */
+ if (!page || !page_mapcount(page))
+ continue;
+
+ if (node_map[page_to_nid(page)] >= 0) {
+ if (steal_page_from_lru(page_zone(page), page, &pglist))
+ count++;
+ else
+ BUG();
+ }
+ }
+ spin_unlock(&mm->page_table_lock);
+
+ /* call the page migration code to move the pages */
+ if (!count)
+ return 0;
+
+ nr_busy = try_to_migrate_pages(&pglist, node_map);
+
+ if (nr_busy < 0)
+ return nr_busy;
+
+ if (nr_busy == 0)
+ return count;
+
+ /* return the unmigrated pages to the LRU lists */
+ list_for_each_entry_safe(page, page2, &pglist, lru) {
+ list_del(&page->lru);
+ putback_page_to_lru(page_zone(page), page);
+ }
+ return -EAGAIN;
+
+}
+
+static inline int nodes_invalid(int *nodes, __u32 count)
+{
+ int i;
+ for (i = 0; i < count; i++)
+ if (nodes[i] < 0 ||
+ nodes[i] > MAX_NUMNODES ||
+ !node_online(nodes[i]))
+ return 1;
+ return 0;
+}
+
+void lru_add_drain_per_cpu(void *info)
+{
+ lru_add_drain();
+}
+
+asmlinkage long
+sys_migrate_pages(pid_t pid, __u32 count, __u32 __user *old_nodes,
+ __u32 __user *new_nodes)
+{
+ int i, ret = 0, migrated = 0;
+ int *tmp_old_nodes = NULL;
+ int *tmp_new_nodes = NULL;
+ int *node_map = NULL;
+ struct task_struct *task;
+ struct mm_struct *mm = NULL;
+ size_t size = count * sizeof(tmp_old_nodes[0]);
+ struct vm_area_struct *vma;
+ nodemask_t old_node_mask, new_node_mask;
+
+ if ((count < 1) || (count > MAX_NUMNODES))
+ goto out_einval;
+
+ tmp_old_nodes = kmalloc(size, GFP_KERNEL);
+ tmp_new_nodes = kmalloc(size, GFP_KERNEL);
+ node_map = kmalloc(MAX_NUMNODES*sizeof(node_map[0]), GFP_KERNEL);
+
+ if (!tmp_old_nodes || !tmp_new_nodes || !node_map) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (copy_from_user(tmp_old_nodes, (void __user *)old_nodes, size) ||
+ copy_from_user(tmp_new_nodes, (void __user *)new_nodes, size)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ if (nodes_invalid(tmp_old_nodes, count) ||
+ nodes_invalid(tmp_new_nodes, count))
+ goto out_einval;
+
+ nodes_clear(old_node_mask);
+ nodes_clear(new_node_mask);
+ for (i = 0; i < count; i++) {
+ node_set(tmp_old_nodes[i], old_node_mask);
+ node_set(tmp_new_nodes[i], new_node_mask);
+
+ }
+
+ if (nodes_intersects(old_node_mask, new_node_mask))
+ goto out_einval;
+
+ read_lock(&tasklist_lock);
+ task = find_task_by_pid(pid);
+ if (task) {
+ task_lock(task);
+ mm = task->mm;
+ if (mm)
+ atomic_inc(&mm->mm_users);
+ task_unlock(task);
+ } else {
+ ret = -ESRCH;
+ read_unlock(&tasklist_lock);
+ goto out;
+ }
+ read_unlock(&tasklist_lock);
+ if (!mm)
+ goto out_einval;
+
+ /* set up the node_map array */
+ for (i = 0; i < MAX_NUMNODES; i++)
+ node_map[i] = -1;
+ for (i = 0; i < count; i++)
+ node_map[tmp_old_nodes[i]] = tmp_new_nodes[i];
+
+ /* prepare for lru list manipulation */
+ smp_call_function(&lru_add_drain_per_cpu, NULL, 0, 1);
+ lru_add_drain();
+
+ /* actually do the migration */
+ down_read(&mm->mmap_sem);
+ for (vma = mm->mmap; vma; vma = vma->vm_next) {
+ ret = migrate_vma(task, mm, vma, node_map);
+ if (ret < 0)
+ goto out_up_mmap_sem;
+ migrated += ret;
+ }
+ up_read(&mm->mmap_sem);
+ ret = migrated;
+
+out:
+ if (mm)
+ mmput(mm);
+
+ kfree(tmp_old_nodes);
+ kfree(tmp_new_nodes);
+ kfree(node_map);
+
+ return ret;
+
+out_einval:
+ ret = -EINVAL;
+ goto out;
+
+out_up_mmap_sem:
+ up_read(&mm->mmap_sem);
+ goto out;
+
+}
+
EXPORT_SYMBOL(generic_migrate_page);
EXPORT_SYMBOL(migrate_page_common);
EXPORT_SYMBOL(migrate_page_buffer);
Index: linux-2.6.13-rc1-mhp1-page-migration/kernel/sys_ni.c
===================================================================
--- linux-2.6.13-rc1-mhp1-page-migration.orig/kernel/sys_ni.c 2005-06-28 22:57:29.000000000 -0700
+++ linux-2.6.13-rc1-mhp1-page-migration/kernel/sys_ni.c 2005-06-30 11:17:48.000000000 -0700
@@ -40,6 +40,7 @@ cond_syscall(sys_shutdown);
cond_syscall(sys_sendmsg);
cond_syscall(sys_recvmsg);
cond_syscall(sys_socketcall);
+cond_syscall(sys_migrate_pages);
cond_syscall(sys_futex);
cond_syscall(compat_sys_futex);
cond_syscall(sys_epoll_create);
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2.6.13-rc1 5/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (3 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 4/11] mm: manual page migration-rc4 -- add-sys_migrate_pages-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 6/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-shared-policy-fixup-rc4.patch Ray Bryant
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This patch adds code that translates the memory policy structures
as they are encountered so that they continue to represent where
memory should be allocated after the page migration has completed.
Signed-off-by: Ray Bryant <raybry@sgi.com>
include/linux/mempolicy.h | 2
mm/mempolicy.c | 122 +++++++++++++++++++++++++++++++++++++++++++++-
mm/mmigrate.c | 14 ++++-
3 files changed, 135 insertions(+), 3 deletions(-)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mempolicy.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/mempolicy.h 2005-06-24 10:57:10.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mempolicy.h 2005-06-27 12:29:06.000000000 -0700
@@ -152,6 +152,8 @@ struct mempolicy *mpol_shared_policy_loo
extern void numa_default_policy(void);
extern void numa_policy_init(void);
+extern int migrate_process_policy(struct task_struct *, int *);
+extern int migrate_vma_policy(struct vm_area_struct *, int *);
#else
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mempolicy.c 2005-06-24 10:57:10.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c 2005-06-27 12:28:33.000000000 -0700
@@ -706,7 +706,6 @@ static unsigned offset_il_node(struct me
c++;
} while (c <= target);
BUG_ON(nid >= MAX_NUMNODES);
- BUG_ON(!test_bit(nid, pol->v.nodes));
return nid;
}
@@ -1136,3 +1135,124 @@ void numa_default_policy(void)
{
sys_set_mempolicy(MPOL_DEFAULT, NULL, 0);
}
+
+/*
+ * update a node mask according to a migration request
+ */
+static void migrate_node_mask(unsigned long *new_node_mask,
+ unsigned long *old_node_mask,
+ int *node_map)
+{
+ int i;
+
+ bitmap_zero(new_node_mask, MAX_NUMNODES);
+
+ i = find_first_bit(old_node_mask, MAX_NUMNODES);
+ while(i < MAX_NUMNODES) {
+ if (node_map[i] >= 0)
+ set_bit(node_map[i], new_node_mask);
+ else
+ set_bit(i, new_node_mask);
+ i = find_next_bit(old_node_mask, MAX_NUMNODES, i+1);
+ }
+}
+
+/*
+ * update a process or vma mempolicy according to a migration request
+ */
+static struct mempolicy *
+migrate_policy(struct mempolicy *old, int *node_map)
+{
+ struct mempolicy *new;
+ DECLARE_BITMAP(old_nodes, MAX_NUMNODES);
+ DECLARE_BITMAP(new_nodes, MAX_NUMNODES);
+ struct zone *z;
+ int i;
+
+ new = kmem_cache_alloc(policy_cache, GFP_KERNEL);
+ if (!new)
+ return ERR_PTR(-ENOMEM);
+ atomic_set(&new->refcnt, 0);
+ switch(old->policy) {
+ case MPOL_DEFAULT:
+ BUG();
+ case MPOL_INTERLEAVE:
+ migrate_node_mask(new->v.nodes, old->v.nodes, node_map);
+ break;
+ case MPOL_PREFERRED:
+ if (old->v.preferred_node>=0 &&
+ (node_map[old->v.preferred_node] >= 0))
+ new->v.preferred_node = node_map[old->v.preferred_node];
+ else
+ new->v.preferred_node = old->v.preferred_node;
+ break;
+ case MPOL_BIND:
+ bitmap_zero(old_nodes, MAX_NUMNODES);
+ for (i = 0; (z = old->v.zonelist->zones[i]) != NULL; i++)
+ set_bit(z->zone_pgdat->node_id, old_nodes);
+ migrate_node_mask(new_nodes, old_nodes, node_map);
+ new->v.zonelist = bind_zonelist(new_nodes);
+ if (!new->v.zonelist) {
+ kmem_cache_free(policy_cache, new);
+ return ERR_PTR(-ENOMEM);
+ }
+ }
+ new->policy = old->policy;
+ return new;
+}
+
+/*
+ * update a process mempolicy based on a migration request
+ */
+int migrate_process_policy(struct task_struct *task, int *node_map)
+{
+ struct mempolicy *new, *old = task->mempolicy;
+ int tmp;
+
+ if ((!old) || (old->policy == MPOL_DEFAULT))
+ return 0;
+
+ new = migrate_policy(task->mempolicy, node_map);
+ if (IS_ERR(new))
+ return (PTR_ERR(new));
+
+ mpol_get(new);
+ task->mempolicy = new;
+ mpol_free(old);
+
+ if (task->mempolicy->policy == MPOL_INTERLEAVE) {
+ /*
+ * If the task is still running and allocating storage, this
+ * is racy, but there is not much that can be done about it.
+ * In the worst case, this will allow an allocation of one
+ * page under the original policy (not the "new" one above).
+ * Since we update policies according to the migration,
+ * then migrate pages, that page should still get migrated
+ * correctly.
+ */
+ tmp = task->il_next;
+ if (node_map[tmp] >= 0)
+ task->il_next = node_map[tmp];
+ }
+
+ return 0;
+
+}
+
+/*
+ * update a vma mempolicy based on a migration request
+ */
+int migrate_vma_policy(struct vm_area_struct *vma, int *node_map)
+{
+
+ struct mempolicy *new;
+
+ if (!vma->vm_policy || vma->vm_policy->policy == MPOL_DEFAULT)
+ return 0;
+
+ new = migrate_policy(vma->vm_policy, node_map);
+ if (IS_ERR(new))
+ return (PTR_ERR(new));
+
+ return(policy_vma(vma, new));
+}
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c 2005-06-24 11:01:44.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c 2005-06-27 12:26:56.000000000 -0700
@@ -25,6 +25,7 @@
#include <linux/mmigrate.h>
#include <linux/delay.h>
#include <linux/nodemask.h>
+#include <linux/mempolicy.h>
#include <asm/bitops.h>
/*
@@ -598,13 +599,17 @@ migrate_vma(struct task_struct *task, st
{
struct page *page, *page2;
unsigned long vaddr;
- int count = 0, nr_busy;
+ int rc, count = 0, nr_busy;
LIST_HEAD(pglist);
/* can't migrate mlock()'d pages */
if (vma->vm_flags & VM_LOCKED)
return 0;
+ /* update the vma mempolicy, if needed */
+ rc = migrate_vma_policy(vma, node_map);
+ if (rc < 0)
+ return rc;
/*
* gather all of the pages to be migrated from this vma into pglist
*/
@@ -735,9 +740,14 @@ sys_migrate_pages(pid_t pid, __u32 count
node_map[tmp_old_nodes[i]] = tmp_new_nodes[i];
/* prepare for lru list manipulation */
- smp_call_function(&lru_add_drain_per_cpu, NULL, 0, 1);
+ smp_call_function(&lru_add_drain_per_cpu, NULL, 0, 1);
lru_add_drain();
+ /* update the process mempolicy, if needed */
+ ret = migrate_process_policy(task, node_map);
+ if (ret < 0)
+ goto out;
+
/* actually do the migration */
down_read(&mm->mmap_sem);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2.6.13-rc1 6/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-shared-policy-fixup-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (4 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 5/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 7/11] mm: manual page migration-rc4 -- add-mempolicy-control-rc4.patch Ray Bryant
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Andi Kleen, Dave Hansen, Marcelo Tosatti
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This code fixes a problem with migrating mempolicies for shared
objects (System V shared memory, tmpfs, etc) that Andi Kleen pointed
out in his review of the -rc3 version of the page migration code.
As currently implemented, this only really matters for System V shared
memory, since AFAIK that is the only shared object that has its own
vma->vm_policy->policy code. As code is added for the other cases,
the code below will work with the other shared objects.
One can argue that since the shared object exists outside of the
application, that one shouldn't migrate it at all. The approach taken
here is that if a shared object is mapped into the address space of a
process that is being migrated, then the mapped pages of the shared object
should be migrated with the process. (Pages in the shared object that are
not mapped will not be migrated. This is not perfect, but so it goes.)
Signed-off-by: Ray Bryant <raybry@sgi.com>
mempolicy.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mempolicy.c 2005-06-27 12:28:33.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c 2005-06-27 12:29:19.000000000 -0700
@@ -1245,12 +1245,13 @@ int migrate_process_policy(struct task_s
int migrate_vma_policy(struct vm_area_struct *vma, int *node_map)
{
+ struct mempolicy *old = get_vma_policy(vma, vma->vm_start);
struct mempolicy *new;
- if (!vma->vm_policy || vma->vm_policy->policy == MPOL_DEFAULT)
+ if (old->policy == MPOL_DEFAULT)
return 0;
- new = migrate_policy(vma->vm_policy, node_map);
+ new = migrate_policy(old, node_map);
if (IS_ERR(new))
return (PTR_ERR(new));
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2.6.13-rc1 7/11] mm: manual page migration-rc4 -- add-mempolicy-control-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (5 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 6/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-shared-policy-fixup-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 8/11] mm: manual page migration-rc4 -- sys_migrate_pages-migration-selection-rc4.patch Ray Bryant
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen, Dave Hansen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This patch allows a process to override the default kernel memory
migration policy (invoked via migrate_pages()) on a mapped file
by mapped file basis.
The default policy is to migrate all anonymous VMAs and all other
VMAs that have the VM_WRITE bit set. (See the patch:
sys_migrate_pages-migration-selection-rc4.patch
for details on how the default policy is implemented.)
This policy does not cause the program executable or any mapped
user data files that are mapped R/O to be migrated. These problems
can be detected and fixed in the user-level migration application,
but that user code needs an interface to do the "fix". This patch
supplies that interface via an extension to the mbind() system call.
The interface is as follows:
mbind(start, length, 0, 0, 0, MPOL_MF_DO_MMIGRATE)
mbind(start, length, 0, 0, 0, MPOL_MF_DO_NOT_MMIGRATE)
These calls override the default kernel policy in
favor of the policy specified. These call cause the bits
AS_DO_MMIGRATE (or AS_DO_NOT_MMIGRATE) to be set in the
memory object pointed to by the VMA at the specified addresses
in the current process's address space. Setting such a "deep"
attribute is required so that the modification can be seen by
all address spaces that map the object.
The bits set by the above call are "sticky" in the sense that
they will remain set so long as the memory object exists. To
return the migration policy for that memory object to its
default setting one issues the following system call:
mbind(start, length, 0, 0, 0, MPOL_MF_MMIGRATE_DEFAULT)
The system call:
get_mempolicy(&policy, NULL, 0, (int *)start, (long) MPOL_F_MMIGRATE)
returns the policy migration bits from the memory object in the bottom
two bits of "policy".
Typical use by the user-level manual page migration code would
be to:
(1) Identify the file name whose migration policy needs modified.
(2) Open and mmap() the file into the current address space.
(3) Issue the appropriate mbind() call from the above list.
(4) (Assuming a successful return), unmap() and close the file.
Note well that this interface allows the memory migration process
to modify the migration policy on a file-by-file basis for all proceses
that mmap() the specified file. This has two implications:
(1) All VMAs that map to the specified memory object will have
the same migration policy applied. There is no way to
specify a distinct migration policy for one of the VMAs that
map the file.
(2) The migration policy for anonymous memory cannot be changed,
since there is no memory object (where the migration policy
bits are stored) in that case.
To date, we have yet to identify any case where these restrictions
would need to be overcome in the manual page migration case.
Signed-off-by: Ray Bryant <raybry@sgi.com>
--
include/linux/mempolicy.h | 18 +++++++++
include/linux/pagemap.h | 4 ++
mm/mempolicy.c | 84 ++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 103 insertions(+), 3 deletions(-)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mempolicy.c 2005-06-13 11:47:46.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mempolicy.c 2005-06-13 12:20:12.000000000 -0700
@@ -76,6 +76,7 @@
#include <linux/init.h>
#include <linux/compat.h>
#include <linux/mempolicy.h>
+#include <linux/pagemap.h>
#include <asm/tlbflush.h>
#include <asm/uaccess.h>
@@ -354,6 +355,54 @@ static int mbind_range(struct vm_area_st
return err;
}
+static int mbind_migration_policy(struct mm_struct *mm, unsigned long start,
+ unsigned long end, unsigned flags)
+{
+ struct vm_area_struct *first, *vma;
+ struct address_space *as;
+ int err = 0;
+
+ /* only one of these bits may be set */
+ if (hweight_long(flags & (MPOL_MF_MMIGRATE_MASK)) > 1)
+ return -EINVAL;
+
+ down_read(&mm->mmap_sem);
+ first = find_vma(mm, start);
+ if (!first) {
+ err = -EFAULT;
+ goto out;
+ }
+ for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) {
+ if (!vma->vm_file)
+ continue;
+ as = vma->vm_file->f_mapping;
+ BUG_ON(!as);
+ switch (flags & MPOL_MF_MMIGRATE_MASK) {
+ case MPOL_MF_DO_MMIGRATE:
+ /* only one of these bits may be set */
+ if (test_bit(AS_DO_NOT_MMIGRATE, &as->flags))
+ clear_bit(AS_DO_NOT_MMIGRATE, &as->flags);
+ set_bit(AS_DO_MMIGRATE, &as->flags);
+ break;
+ case MPOL_MF_DO_NOT_MMIGRATE:
+ /* only one of these bits may be set */
+ if (test_bit(AS_DO_MMIGRATE, &as->flags))
+ clear_bit(AS_DO_MMIGRATE, &as->flags);
+ set_bit(AS_DO_NOT_MMIGRATE, &as->flags);
+ break;
+ case MPOL_MF_MMIGRATE_DEFAULT:
+ clear_bit(AS_DO_MMIGRATE, &as->flags);
+ clear_bit(AS_DO_NOT_MMIGRATE, &as->flags);
+ break;
+ default:
+ BUG();
+ }
+ }
+out:
+ up_read(&mm->mmap_sem);
+ return err;
+}
+
/* Change policy for a memory range */
asmlinkage long sys_mbind(unsigned long start, unsigned long len,
unsigned long mode,
@@ -367,7 +416,7 @@ asmlinkage long sys_mbind(unsigned long
DECLARE_BITMAP(nodes, MAX_NUMNODES);
int err;
- if ((flags & ~(unsigned long)(MPOL_MF_STRICT)) || mode > MPOL_MAX)
+ if ((flags & ~(unsigned long)(MPOL_MF_MASK)) || mode > MPOL_MAX)
return -EINVAL;
if (start & ~PAGE_MASK)
return -EINVAL;
@@ -380,6 +429,12 @@ asmlinkage long sys_mbind(unsigned long
if (end == start)
return 0;
+ if (flags & MPOL_MF_MMIGRATE_MASK)
+ return mbind_migration_policy(mm, start, end, flags);
+
+ if (mode == MPOL_DEFAULT)
+ flags &= ~MPOL_MF_STRICT;
+
err = get_nodes(nodes, nmask, maxnode, mode);
if (err)
return err;
@@ -492,17 +547,40 @@ asmlinkage long sys_get_mempolicy(int __
struct vm_area_struct *vma = NULL;
struct mempolicy *pol = current->mempolicy;
- if (flags & ~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR))
+ if (flags & ~(unsigned long)(MPOL_F_MASK))
return -EINVAL;
+ if ((flags & (MPOL_F_NODE | MPOL_F_ADDR)) &&
+ (flags & MPOL_F_MMIGRATE))
+ return -EINVAL;
if (nmask != NULL && maxnode < MAX_NUMNODES)
return -EINVAL;
- if (flags & MPOL_F_ADDR) {
+ if ((flags & MPOL_F_ADDR) || (flags & MPOL_F_MMIGRATE)) {
down_read(&mm->mmap_sem);
vma = find_vma_intersection(mm, addr, addr+1);
if (!vma) {
up_read(&mm->mmap_sem);
return -EFAULT;
}
+ if (flags & MPOL_F_MMIGRATE) {
+ struct address_space *as;
+ err = 0;
+ if (!vma->vm_file) {
+ err = -EINVAL;
+ goto out;
+ }
+ as = vma->vm_file->f_mapping;
+ BUG_ON(!as);
+ pval = 0;
+ if (test_bit(AS_DO_MMIGRATE, &as->flags))
+ pval |= MPOL_MF_DO_MMIGRATE;
+ if (test_bit(AS_DO_NOT_MMIGRATE, &as->flags))
+ pval |= MPOL_MF_DO_NOT_MMIGRATE;
+ if (policy && put_user(pval, policy)) {
+ err = -EFAULT;
+ goto out;
+ }
+ goto out;
+ }
if (vma->vm_ops && vma->vm_ops->get_policy)
pol = vma->vm_ops->get_policy(vma, addr);
else
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mempolicy.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/mempolicy.h 2005-06-13 11:47:46.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/mempolicy.h 2005-06-13 11:48:53.000000000 -0700
@@ -19,9 +19,27 @@
/* Flags for get_mem_policy */
#define MPOL_F_NODE (1<<0) /* return next IL mode instead of node mask */
#define MPOL_F_ADDR (1<<1) /* look up vma using address */
+#define MPOL_F_MMIGRATE (1<<2) /* return migration policy flags */
+
+#define MPOL_F_MASK (MPOL_F_NODE | MPOL_F_ADDR | MPOL_F_MMIGRATE)
/* Flags for mbind */
#define MPOL_MF_STRICT (1<<0) /* Verify existing pages in the mapping */
+/* FUTURE USE (1<<1) RESERVE for MPOL_MF_MOVE */
+/* Flags to set the migration policy for a memory range
+ * By default the kernel will memory migrate all writable VMAs
+ * (this includes anonymous memory) and the program exectuable.
+ * For non-anonymous memory, the user can change the default
+ * actions using the following flags to mbind:
+ */
+#define MPOL_MF_DO_MMIGRATE (1<<2) /* migrate pages of this mem object */
+#define MPOL_MF_DO_NOT_MMIGRATE (1<<3) /* don't migrate any of these pages */
+#define MPOL_MF_MMIGRATE_DEFAULT (1<<4) /* reset back to kernel default */
+
+#define MPOL_MF_MASK (MPOL_MF_STRICT | MPOL_MF_DO_MMIGRATE | \
+ MPOL_MF_DO_NOT_MMIGRATE | MPOL_MF_MMIGRATE_DEFAULT)
+#define MPOL_MF_MMIGRATE_MASK (MPOL_MF_DO_MMIGRATE | \
+ MPOL_MF_DO_NOT_MMIGRATE | MPOL_MF_MMIGRATE_DEFAULT)
#ifdef __KERNEL__
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/pagemap.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/pagemap.h 2005-06-13 11:47:46.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/pagemap.h 2005-06-13 11:48:53.000000000 -0700
@@ -19,6 +19,10 @@
#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */
#define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */
+/* (manual) memory migration control flags. set via mbind() in mempolicy.c */
+#define AS_DO_MMIGRATE (__GFP_BITS_SHIFT + 2) /* migrate pages */
+#define AS_DO_NOT_MMIGRATE (__GFP_BITS_SHIFT + 3) /* don't migrate any pages */
+
static inline unsigned int __nocast mapping_gfp_mask(struct address_space * mapping)
{
return mapping->flags & __GFP_BITS_MASK;
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2.6.13-rc1 8/11] mm: manual page migration-rc4 -- sys_migrate_pages-migration-selection-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (6 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 7/11] mm: manual page migration-rc4 -- add-mempolicy-control-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 9/11] mm: manual page migration-rc4 -- sys_migrate_pages-cpuset-support-rc4.patch Ray Bryant
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This patch allows a process to override the default kernel memory
migration policy (invoked via migrate_pages()) on a mapped file
by mapped file basis.
The default policy is to migrate all anonymous VMAs and all other
VMAs that have the VM_WRITE bit set. (See the patch:
sys_migrate_pages-migration-selection-rc4.patch
for details on how the default policy is implemented.)
This policy does not cause the program executable or any mapped
user data files that are mapped R/O to be migrated. These problems
can be detected and fixed in the user-level migration application,
but that user code needs an interface to do the "fix". This patch
supplies that interface via an extension to the mbind() system call.
The interface is as follows:
mbind(start, length, 0, 0, 0, MPOL_MF_DO_MMIGRATE)
mbind(start, length, 0, 0, 0, MPOL_MF_DO_NOT_MMIGRATE)
These calls override the default kernel policy in
favor of the policy specified. These call cause the bits
AS_DO_MMIGRATTE (or AS_DO_NOT_MMIGRATE) to be set in the
memory object pointed to by the VMA at the specified addresses
in the current process's address space. Setting such a "deep"
attribute is required so that the modification can be seen by
all address spaces that map the object.
The bits set by the above call are "sticky" in the sense that
they will remain set so long as the memory object exists. To
return the migration policy for that memory object to its
default setting is done by the following system call:
mbind(start, length, 0, 0, 0, MPOL_MF_MMIGRATE_DEFAULT)
The system call:
get_mempolicy(&policy, NULL, 0, (int *)start, (long) MPOL_F_MMIGRATE)
returns the policy migration bits from the memory object in the bottom
two bits of "policy".
Typical use by the user-level manual page migration code would
be to:
(1) Identify the file name whose migration policy needs modified.
(2) Open and mmap() the file into the current address space.
(3) Issue the appropriate mbind() call from the above list.
(4) (Assuming a successful return), unmap() and close the file.
Signed-off-by: Ray Bryant <raybry@sgi.com>
mmigrate.c | 31 ++++++++++++++++++++++---------
1 files changed, 22 insertions(+), 9 deletions(-)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c 2005-06-24 07:40:32.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c 2005-06-24 07:44:12.000000000 -0700
@@ -601,25 +601,38 @@ migrate_vma(struct task_struct *task, st
unsigned long vaddr;
int rc, count = 0, nr_busy;
LIST_HEAD(pglist);
+ struct address_space *as = NULL;
- /* can't migrate mlock()'d pages */
- if (vma->vm_flags & VM_LOCKED)
+ /* can't migrate these kinds of VMAs */
+ if ((vma->vm_flags & VM_LOCKED) || (vma->vm_flags & VM_IO))
return 0;
+ /* we always migrate anonymous pages */
+ if (!vma->vm_file)
+ goto do_migrate;
+ as = vma->vm_file->f_mapping;
+ /* we have to have both AS_DO_MMIGRATE and AS_DO_MOT_MMIGRATE to
+ * give user space full ability to override the kernel's default
+ * migration decisions */
+ if (test_bit(AS_DO_MMIGRATE, &as->flags))
+ goto do_migrate;
+ if (test_bit(AS_DO_NOT_MMIGRATE, &as->flags))
+ return 0;
+ if (!(vma->vm_flags & VM_WRITE))
+ return 0;
+
+do_migrate:
/* update the vma mempolicy, if needed */
rc = migrate_vma_policy(vma, node_map);
if (rc < 0)
return rc;
- /*
- * gather all of the pages to be migrated from this vma into pglist
- */
+
+ /* gather all of the pages to be migrated from this vma into pglist */
spin_lock(&mm->page_table_lock);
for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
page = follow_page(mm, vaddr, 0);
- /*
- * follow_page has been known to return pages with zero mapcount
- * and NULL mapping. Skip those pages as well
- */
+ /* follow_page has been known to return pages with zero mapcount
+ * and NULL mapping. Skip those pages as well */
if (!page || !page_mapcount(page))
continue;
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2.6.13-rc1 9/11] mm: manual page migration-rc4 -- sys_migrate_pages-cpuset-support-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (7 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 8/11] mm: manual page migration-rc4 -- sys_migrate_pages-migration-selection-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 10/11] mm: manual page migration-rc4 -- sys_migrate_pages-permissions-check-rc4.patch Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 11/11] mm: manual page migration-rc4 -- N1.2-add-nodemap-to-try_to_migrate_pages-call.patch Ray Bryant
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Andi Kleen, Dave Hansen, Marcelo Tosatti
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
This patch adds cpuset support to the migrate_pages() system call.
The idea of this patch is that in order to do a migration:
(1) The target task needs to be able to allocate pages on the
nodes that are being migrated to.
(2) However, the actual allocation of pages is not done by
the target task. Allocation is done by the task that is
running the migrate_pages() system call. Since it is
expected that the migration will be done by a batch manager
of some kind that is authorized to control the jobs running
in an enclosing cpuset, we make the requirement that the
current task ALSO must be able to allocate pages on the
nodes that are being migrated to.
Note well that if cpusets are not configured, the call to
cpuset_migration_allowed() gets optimizied away.
Signed-off-by: Ray Bryant <raybry@sgi.com>
include/linux/cpuset.h | 8 +++++++-
kernel/cpuset.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
mm/mmigrate.c | 15 +++++++++++----
3 files changed, 65 insertions(+), 6 deletions(-)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/cpuset.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/cpuset.h 2005-06-24 10:56:43.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/cpuset.h 2005-06-24 11:01:59.000000000 -0700
@@ -4,7 +4,7 @@
* cpuset interface
*
* Copyright (C) 2003 BULL SA
- * Copyright (C) 2004 Silicon Graphics, Inc.
+ * Copyright (C) 2004-2005 Silicon Graphics, Inc.
*
*/
@@ -26,6 +26,7 @@ int cpuset_zonelist_valid_mems_allowed(s
int cpuset_zone_allowed(struct zone *z);
extern struct file_operations proc_cpuset_operations;
extern char *cpuset_task_status_allowed(struct task_struct *task, char *buffer);
+extern int cpuset_migration_allowed(nodemask_t, struct task_struct *);
#else /* !CONFIG_CPUSETS */
@@ -59,6 +60,11 @@ static inline char *cpuset_task_status_a
return buffer;
}
+static inline int cpuset_migration_allowed(int *nodes, struct task *task)
+{
+ return 1;
+}
+
#endif /* !CONFIG_CPUSETS */
#endif /* _LINUX_CPUSET_H */
Index: linux-2.6.12-rc5-mhp1-page-migration-export/kernel/cpuset.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/kernel/cpuset.c 2005-06-24 10:56:43.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/kernel/cpuset.c 2005-06-24 11:01:59.000000000 -0700
@@ -4,7 +4,7 @@
* Processor and Memory placement constraints for sets of tasks.
*
* Copyright (C) 2003 BULL SA.
- * Copyright (C) 2004 Silicon Graphics, Inc.
+ * Copyright (C) 2004-2005 Silicon Graphics, Inc.
*
* Portions derived from Patrick Mochel's sysfs code.
* sysfs is Copyright (c) 2001-3 Patrick Mochel
@@ -1500,6 +1500,52 @@ int cpuset_zone_allowed(struct zone *z)
node_isset(z->zone_pgdat->node_id, current->mems_allowed);
}
+/**
+ * cpuset_mems_allowed - return mems_allowed mask from a tasks cpuset.
+ * @tsk: pointer to task_struct from which to obtain cpuset->mems_allowed.
+ *
+ * Description: Returns the nodemask_t mems_allowed of the cpuset
+ * attached to the specified @tsk.
+ *
+ **/
+
+static const nodemask_t cpuset_mems_allowed(const struct task_struct *tsk)
+{
+ nodemask_t mask;
+
+ down(&cpuset_sem);
+ task_lock((struct task_struct *)tsk);
+ guarantee_online_mems(tsk->cpuset, &mask);
+ task_unlock((struct task_struct *)tsk);
+ up(&cpuset_sem);
+
+ return mask;
+}
+
+/**
+ * cpuset_migration_allowed(int *nodes, struct task_struct *tsk)
+ * @mask: pointer to nodemask of nodes to be migrated to
+ * @tsk: pointer to task struct of task being migrated
+ *
+ * Description: Returns true if the migration should be allowed.
+ *
+ */
+int cpuset_migration_allowed(nodemask_t mask, struct task_struct *tsk)
+{
+ nodemask_t current_nodes_allowed, target_nodes_allowed;
+ current_nodes_allowed = cpuset_mems_allowed(current);
+
+ /* Obviously, the target task needs to be able to allocate on
+ * the new set of nodes. However, the migrated pages will
+ * actually be allocated by the current task, so the current
+ * task has to be able to allocate on those nodes as well */
+ target_nodes_allowed = cpuset_mems_allowed(tsk);
+ if (!nodes_subset(mask, current_nodes_allowed) ||
+ !nodes_subset(mask, target_nodes_allowed))
+ return 0;
+ return 1;
+}
+
/*
* proc_cpuset_show()
* - Print tasks cpuset path into seq_file.
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c 2005-06-24 11:01:59.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c 2005-06-24 11:02:20.000000000 -0700
@@ -26,6 +26,7 @@
#include <linux/delay.h>
#include <linux/nodemask.h>
#include <linux/mempolicy.h>
+#include <linux/cpuset.h>
#include <asm/bitops.h>
/*
@@ -690,7 +691,7 @@ sys_migrate_pages(pid_t pid, __u32 count
int *tmp_old_nodes = NULL;
int *tmp_new_nodes = NULL;
int *node_map = NULL;
- struct task_struct *task;
+ struct task_struct *task = NULL;
struct mm_struct *mm = NULL;
size_t size = count * sizeof(tmp_old_nodes[0]);
struct vm_area_struct *vma;
@@ -734,8 +735,10 @@ sys_migrate_pages(pid_t pid, __u32 count
if (task) {
task_lock(task);
mm = task->mm;
- if (mm)
+ if (mm) {
atomic_inc(&mm->mm_users);
+ get_task_struct(task);
+ }
task_unlock(task);
} else {
ret = -ESRCH;
@@ -746,7 +749,9 @@ sys_migrate_pages(pid_t pid, __u32 count
if (!mm)
goto out_einval;
- /* set up the node_map array */
+ if (!cpuset_migration_allowed(new_node_mask, task))
+ goto out_einval;
+
for (i = 0; i < MAX_NUMNODES; i++)
node_map[i] = -1;
for (i = 0; i < count; i++)
@@ -773,8 +778,10 @@ sys_migrate_pages(pid_t pid, __u32 count
ret = migrated;
out:
- if (mm)
+ if (mm) {
mmput(mm);
+ put_task_struct(task);
+ }
kfree(tmp_old_nodes);
kfree(tmp_new_nodes);
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2.6.13-rc1 10/11] mm: manual page migration-rc4 -- sys_migrate_pages-permissions-check-rc4.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (8 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 9/11] mm: manual page migration-rc4 -- sys_migrate_pages-cpuset-support-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 11/11] mm: manual page migration-rc4 -- N1.2-add-nodemap-to-try_to_migrate_pages-call.patch Ray Bryant
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Marcelo Tosatti, Andi Kleen, Dave Hansen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
Add permissions checking to migrate_pages() system call.
The basic idea is that if you could send an arbitary
signal to a process then you are allowed to migrate
that process, or if the calling process has capability
CAP_SYS_ADMIN. The permissions check is based
on that in check_kill_permission() in kernel/signal.c.
Signed-off-by: Ray Bryant <raybry@sgi.com>
include/linux/capability.h | 2 ++
mm/mmigrate.c | 12 ++++++++++++
2 files changed, 14 insertions(+)
Index: linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/capability.h
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/include/linux/capability.h 2005-06-24 11:02:20.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/include/linux/capability.h 2005-06-24 11:02:30.000000000 -0700
@@ -233,6 +233,8 @@ typedef __u32 kernel_cap_t;
/* Allow enabling/disabling tagged queuing on SCSI controllers and sending
arbitrary SCSI commands */
/* Allow setting encryption key on loopback filesystem */
+/* Allow using the migrate_pages() system call to migrate a process's pages
+ from one set of NUMA nodes to another */
#define CAP_SYS_ADMIN 21
Index: linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c
===================================================================
--- linux-2.6.12-rc5-mhp1-page-migration-export.orig/mm/mmigrate.c 2005-06-24 11:02:20.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-page-migration-export/mm/mmigrate.c 2005-06-24 11:02:30.000000000 -0700
@@ -15,6 +15,8 @@
#include <linux/module.h>
#include <linux/swap.h>
#include <linux/pagemap.h>
+#include <linux/sched.h>
+#include <linux/capability.h>
#include <linux/init.h>
#include <linux/highmem.h>
#include <linux/writeback.h>
@@ -734,6 +736,16 @@ sys_migrate_pages(pid_t pid, __u32 count
task = find_task_by_pid(pid);
if (task) {
task_lock(task);
+ /* does this task have permission to migrate that task?
+ * (ala check_kill_permission() ) */
+ if ((current->euid ^ task->suid) && (current->euid ^ task->uid)
+ && (current->uid ^ task->suid) && (current->uid ^ task->uid)
+ && !capable(CAP_SYS_ADMIN)) {
+ ret = -EPERM;
+ task_unlock(task);
+ read_unlock(&tasklist_lock);
+ goto out;
+ }
mm = task->mm;
if (mm) {
atomic_inc(&mm->mm_users);
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2.6.13-rc1 11/11] mm: manual page migration-rc4 -- N1.2-add-nodemap-to-try_to_migrate_pages-call.patch
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
` (9 preceding siblings ...)
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 10/11] mm: manual page migration-rc4 -- sys_migrate_pages-permissions-check-rc4.patch Ray Bryant
@ 2005-07-01 22:41 ` Ray Bryant
10 siblings, 0 replies; 12+ messages in thread
From: Ray Bryant @ 2005-07-01 22:41 UTC (permalink / raw)
To: Hirokazu Takahashi, Dave Hansen, Marcelo Tosatti, Andi Kleen
Cc: Christoph Hellwig, linux-mm, Nathan Scott, Ray Bryant,
lhms-devel, Ray Bryant, Paul Jackson, clameter
Manual page migration adds a nodemap arg to try_to_migrate_pages().
The nodemap specifies where pages found on a particular node are to
be migrated. If all you want to do is to migrate the page off of
the current node, then you specify the nodemap argument as NULL.
Add the NULL to the try_to_migrate_pages() invocation.
This patch should be added to the Memory Hotplug series after patch
N1.1-pass-page_list-to-steal_page.patch (for 2.6.12-rc5-mhp1).
Signed-off-by: Ray Bryant <raybry@sgi.com>
--
page_alloc.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6.12-rc5-mhp1-memory-hotplug/mm/page_alloc.c
===================================================================
--- linux-2.6.12-rc5-mhp1-memory-hotplug.orig/mm/page_alloc.c 2005-06-21 10:43:14.000000000 -0700
+++ linux-2.6.12-rc5-mhp1-memory-hotplug/mm/page_alloc.c 2005-06-21 10:43:14.000000000 -0700
@@ -823,7 +823,7 @@ retry:
on_each_cpu(lru_drain_schedule, NULL, 1, 1);
rest = grab_capturing_pages(&page_list, start_pfn, nr_pages);
- remains = try_to_migrate_pages(&page_list);
+ remains = try_to_migrate_pages(&page_list, NULL);
if (rest || !list_empty(&page_list)) {
if (remains == -ENOSPC) {
/* A swap device should be added. */
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant raybry@sgi.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2005-07-02 0:33 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-01 22:40 [PATCH 2.6.13-rc1 0/11] mm: manual page migration-rc4 -- overview Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 1/11] mm: hirokazu-steal_page_from_lru.patch Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 2/11] mm: manual page migration-rc4 -- xfs-migrate-page-rc4.patch Ray Bryant
2005-07-01 22:40 ` [PATCH 2.6.13-rc1 3/11] mm: manual page migration-rc4 -- add-node_map-arg-to-try_to_migrate_pages-rc4.patch Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 4/11] mm: manual page migration-rc4 -- add-sys_migrate_pages-rc4.patch Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 5/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-rc4.patch Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 6/11] mm: manual page migration-rc4 -- sys_migrate_pages-mempolicy-migration-shared-policy-fixup-rc4.patch Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 7/11] mm: manual page migration-rc4 -- add-mempolicy-control-rc4.patch Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 8/11] mm: manual page migration-rc4 -- sys_migrate_pages-migration-selection-rc4.patch Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 9/11] mm: manual page migration-rc4 -- sys_migrate_pages-cpuset-support-rc4.patch Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 10/11] mm: manual page migration-rc4 -- sys_migrate_pages-permissions-check-rc4.patch Ray Bryant
2005-07-01 22:41 ` [PATCH 2.6.13-rc1 11/11] mm: manual page migration-rc4 -- N1.2-add-nodemap-to-try_to_migrate_pages-call.patch Ray Bryant
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox