Status and the future of page migration

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Status and the future of page migration
@ 2006-05-12  0:06 Christoph Lameter
  2006-05-12  0:56 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Lameter @ 2006-05-12  0:06 UTC (permalink / raw)
  To: linux-mm
  Cc: ak, pj, kravetz, marcelo.tosatti, kamezawa.hiroyu, taka,
	lee.schermerhorn, haveblue

The current page migration in Linus tree uses swap entries to track unmapped
anonymous pages and has the side effect of removing all references to file
backed pages. If multiple migrations run concurrently then we typically are
limited by contention around the tree_lock for swap space. We see migration
rates of around 600-900 MB/sec for a single migration and around 250MB/sec
for 4 concurrent migrations.

The code in Andrew's tree uses migration entries, restores ptes
to file backed pages and preserves the write enable bit. This means
that a process can be repeatedly migrated without loosing
the file backed pages that were not referenced in the intermediate
period. Also we avoid useless COW faults. The contention around
the swap tree_lock has been removed and so we see increased
migration rates for a single process of around 800-1GB/sec that then
only slightly degrades for 4 concurrent processes.

I would like to keep the features of page migraton as they are right now
in Andrew's tree until the patches have made it into Linus tree.

Some additional patches for page migration are at
ftp://ftp.kernel.org/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc3-mm1/.
These are in testing and need work. Feedback on these would be useful.

1. Restructure migrate_pages() so that the current goto mess is avoided. This
   extracts two functions from migrate pages that deal with either taking the
   page lock for the source or destination page.

2. Dispose of migrated pages immediately. Moves the recycling of migrated
   pages into migrate_pages(). Callers only have to deal with pages that
   are still candidates for still could be repeated. This simplifies handling
   but prevents potential necessary post processing of migrated pages.
   Should we do this at all?

3. Uses arrays to pass list of pages to migrate_pages().
   Doing so will make a 1-1 association possible between the pages to be
   migrated. If we have this 1-1 association then we can accurately allocate
   pages for MPOL_INTERLEAVE during migration. Specifying
   MPOL_INTERLEAVE|MPOL_MF_MOVE to mbind() could move all pages so that they
   follow the best interleave pattern accurately.

4. A new system call for the migration of lists of pages (incomplete
   implementation!)

   sys_move_pages([int pid,?] int nr_pages, unsigned long *addresses,
   		int *nodes, unsigned int flags);

   This function would migrate individual pages of a process to specific nodes.
   F.e. user space tools exist that can provide off node access statistics
   that show from what node a pages is most frequently accessed.
   Additional code could then use this new system call to migrate the lists
   of pages to the more advantageous location. Automatic page migration
   could be implemented in user space. Many of us remain unconvinced that
   automatic page migration can provide a consistent benefit.
   This API would allow the implementation of various automatic migration
   methods without changes to the kernel.

5. vma migration hooks
   Adds a new function call "migrate" to the vm_operations structure. The
   vm_ops migration method may be used by vmas without page structs (PFN_MAP?)
   to implement their own migration schemes. Currently there is no user of
   such functionality. The uncached allocator for IA64 could potentially use
   such vma migration hooks.

Potential future work:

- Implement the migration of mlocked pages. This would mean to ignore
  VM_LOCKED in try_to_unmap. Currently VM_LOCKED can be used to prevent the
  migration of pages. If we allow the migration of mlocked pages then we
  would need to introduce some alternate means of being able to declare a
  page not migratable (VM_DONTMIGRATE?).
  Not sure if this should be done at all.

- Migration of pages outside of a process context.
  Currently page migration requires that a read lock on mmap_sem is held to
  prevent the anonymous vmas from vanishing while we migrate pages.
  If page migration would be used to remove all pages from a zone (like needed
  by the memory hotplug project) then we would need to first find a way
  to insure that the anon_vmas do not vanish under us. We could f.e. take
  a read_lock on the one of the mm_structs that may be discovered via the
  reverse maps.

Did I miss anything?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Status and the future of page migration
  2006-05-12  0:06 Status and the future of page migration Christoph Lameter
@ 2006-05-12  0:56 ` KAMEZAWA Hiroyuki
  2006-05-12  1:06   ` Christoph Lameter
  0 siblings, 1 reply; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-05-12  0:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, ak, pj, kravetz, marcelo.tosatti, taka,
	lee.schermerhorn, haveblue

On Thu, 11 May 2006 17:06:31 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> Some additional patches for page migration are at
> ftp://ftp.kernel.org/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc3-mm1/.
> These are in testing and need work. Feedback on these would be useful.
> 
Thank you for clarification and, of course, your works.

> 1. Restructure migrate_pages() so that the current goto mess is avoided. This
>    extracts two functions from migrate pages that deal with either taking the
>    page lock for the source or destination page.
> 
> 2. Dispose of migrated pages immediately. Moves the recycling of migrated
>    pages into migrate_pages(). Callers only have to deal with pages that
>    are still candidates for still could be repeated. This simplifies handling
>    but prevents potential necessary post processing of migrated pages.
>    Should we do this at all?
> 
I don't think this is necessary now.
Some codes may be going to use migrated pages, I think. For example, migrating
pages to create Hugepage size contigous pages. But this will not come in near
future ;)

> 3. Uses arrays to pass list of pages to migrate_pages().
>    Doing so will make a 1-1 association possible between the pages to be
>    migrated. If we have this 1-1 association then we can accurately allocate
>    pages for MPOL_INTERLEAVE during migration. Specifying
>    MPOL_INTERLEAVE|MPOL_MF_MOVE to mbind() could move all pages so that they
>    follow the best interleave pattern accurately.
> 
I like this. 

> 4. A new system call for the migration of lists of pages (incomplete
>    implementation!)
> 
>    sys_move_pages([int pid,?] int nr_pages, unsigned long *addresses,
>    		int *nodes, unsigned int flags);
> 
>    This function would migrate individual pages of a process to specific nodes.
>    F.e. user space tools exist that can provide off node access statistics
>    that show from what node a pages is most frequently accessed.
>    Additional code could then use this new system call to migrate the lists
>    of pages to the more advantageous location. Automatic page migration
>    could be implemented in user space. Many of us remain unconvinced that
>    automatic page migration can provide a consistent benefit.
>    This API would allow the implementation of various automatic migration
>    methods without changes to the kernel.
> 
Maybe implementing the interface to show necessary information to do this is
necessary before doing this. A user process can get enough precise information now ?


> 5. vma migration hooks
>    Adds a new function call "migrate" to the vm_operations structure. The
>    vm_ops migration method may be used by vmas without page structs (PFN_MAP?)
>    to implement their own migration schemes. Currently there is no user of
>    such functionality. The uncached allocator for IA64 could potentially use
>    such vma migration hooks.
> 
uncached allocator doesn't use struct address_space ?

> Potential future work:
> 
> - Implement the migration of mlocked pages. This would mean to ignore
>   VM_LOCKED in try_to_unmap. Currently VM_LOCKED can be used to prevent the
>   migration of pages. If we allow the migration of mlocked pages then we
>   would need to introduce some alternate means of being able to declare a
>   page not migratable (VM_DONTMIGRATE?).
>   Not sure if this should be done at all.
> 
I think VM_LOCKED just means the address has the physical page. So I think
migration is Okay. But I don't think VM_DONTMIGRATE is necessary..

> - Migration of pages outside of a process context.
>   Currently page migration requires that a read lock on mmap_sem is held to
>   prevent the anonymous vmas from vanishing while we migrate pages.
>   If page migration would be used to remove all pages from a zone (like needed
>   by the memory hotplug project) then we would need to first find a way
>   to insure that the anon_vmas do not vanish under us.
>   We could f.e. take a read_lock on the one of the mm_structs that may be
>   discovered via the reverse maps.
> 
I think taking anon_vma->lock while migration is one way. But this will make
try_to_umap() dirtier...

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Status and the future of page migration
  2006-05-12  0:56 ` KAMEZAWA Hiroyuki
@ 2006-05-12  1:06   ` Christoph Lameter
  2006-05-12  1:35     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Lameter @ 2006-05-12  1:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, ak, pj, kravetz, marcelo.tosatti, taka,
	lee.schermerhorn, haveblue

On Fri, 12 May 2006, KAMEZAWA Hiroyuki wrote:

> > 4. A new system call for the migration of lists of pages (incomplete
> >    implementation!)
> > 
> >    sys_move_pages([int pid,?] int nr_pages, unsigned long *addresses,
> >    		int *nodes, unsigned int flags);
> > 
> >    This function would migrate individual pages of a process to specific nodes.
> >    F.e. user space tools exist that can provide off node access statistics
> >    that show from what node a pages is most frequently accessed.
> >    Additional code could then use this new system call to migrate the lists
> >    of pages to the more advantageous location. Automatic page migration
> >    could be implemented in user space. Many of us remain unconvinced that
> >    automatic page migration can provide a consistent benefit.
> >    This API would allow the implementation of various automatic migration
> >    methods without changes to the kernel.
> > 
> Maybe implementing the interface to show necessary information to do this is
> necessary before doing this. A user process can get enough precise information now ?

What precise information would be needed? We could return the current node 
information in a status array. Right I forgot to include the status array 
that returns success / or failure of the call. The status array would 
allow to find out the failure reason for each page.

> > 5. vma migration hooks
> >    Adds a new function call "migrate" to the vm_operations structure. The
> >    vm_ops migration method may be used by vmas without page structs (PFN_MAP?)
> >    to implement their own migration schemes. Currently there is no user of
> >    such functionality. The uncached allocator for IA64 could potentially use
> >    such vma migration hooks.
> > 
> uncached allocator doesn't use struct address_space ?

Right.

> > - Implement the migration of mlocked pages. This would mean to ignore
> >   VM_LOCKED in try_to_unmap. Currently VM_LOCKED can be used to prevent the
> >   migration of pages. If we allow the migration of mlocked pages then we
> >   would need to introduce some alternate means of being able to declare a
> >   page not migratable (VM_DONTMIGRATE?).
> >   Not sure if this should be done at all.
> > 
> I think VM_LOCKED just means the address has the physical page. So I think
> migration is Okay. But I don't think VM_DONTMIGRATE is necessary..

You are right but there may be system components (such as device drivers) 
that require the page not to be moved. Without page migration VM_LOCKED 
implies that the physical address stays the same. Kernel code may assume 
that VM_LOCKED -> dont migrate.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Status and the future of page migration
  2006-05-12  1:06   ` Christoph Lameter
@ 2006-05-12  1:35     ` KAMEZAWA Hiroyuki
  2006-05-12  1:43       ` Christoph Lameter
  0 siblings, 1 reply; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-05-12  1:35 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, ak, pj, kravetz, marcelo.tosatti, taka,
	lee.schermerhorn, haveblue

On Thu, 11 May 2006 18:06:20 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Fri, 12 May 2006, KAMEZAWA Hiroyuki wrote:
> 
> > > 4. A new system call for the migration of lists of pages (incomplete
> > >    implementation!)
> > > 
> > >    sys_move_pages([int pid,?] int nr_pages, unsigned long *addresses,
> > >    		int *nodes, unsigned int flags);
> > > 
> > >    This function would migrate individual pages of a process to specific nodes.
> > >    F.e. user space tools exist that can provide off node access statistics
> > >    that show from what node a pages is most frequently accessed.
> > >    Additional code could then use this new system call to migrate the lists
> > >    of pages to the more advantageous location. Automatic page migration
> > >    could be implemented in user space. Many of us remain unconvinced that
> > >    automatic page migration can provide a consistent benefit.
> > >    This API would allow the implementation of various automatic migration
> > >    methods without changes to the kernel.
> > > 
> > Maybe implementing the interface to show necessary information to do this is
> > necessary before doing this. A user process can get enough precise information now ?
> 
> What precise information would be needed? We could return the current node 
> information in a status array. Right I forgot to include the status array 
> that returns success / or failure of the call. The status array would 
> allow to find out the failure reason for each page.
> 
I'm sorry I missed "F.e. user space..."
BTW, we can get statistics of off-node-access for each vma now ?



> > > - Implement the migration of mlocked pages. This would mean to ignore
> > >   VM_LOCKED in try_to_unmap. Currently VM_LOCKED can be used to prevent the
> > >   migration of pages. If we allow the migration of mlocked pages then we
> > >   would need to introduce some alternate means of being able to declare a
> > >   page not migratable (VM_DONTMIGRATE?).
> > >   Not sure if this should be done at all.
> > > 
> > I think VM_LOCKED just means the address has the physical page. So I think
> > migration is Okay. But I don't think VM_DONTMIGRATE is necessary..
> 
> You are right but there may be system components (such as device drivers) 
> that require the page not to be moved. Without page migration VM_LOCKED 
> implies that the physical address stays the same. Kernel code may assume 
> that VM_LOCKED -> dont migrate.
> 
Hmm.. I think such pages should have extra refcnt to prevent migration.


-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Status and the future of page migration
  2006-05-12  1:35     ` KAMEZAWA Hiroyuki
@ 2006-05-12  1:43       ` Christoph Lameter
  2006-05-12  2:08         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Lameter @ 2006-05-12  1:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, ak, pj, kravetz, marcelo.tosatti, taka,
	lee.schermerhorn, haveblue

On Fri, 12 May 2006, KAMEZAWA Hiroyuki wrote:

> > What precise information would be needed? We could return the current node 
> > information in a status array. Right I forgot to include the status array 
> > that returns success / or failure of the call. The status array would 
> > allow to find out the failure reason for each page.
> > 
> I'm sorry I missed "F.e. user space..."
> BTW, we can get statistics of off-node-access for each vma now ?

You can do that by programming the PMU (IA64) to notify you on each long 
latency memory access.

> > You are right but there may be system components (such as device drivers) 
> > that require the page not to be moved. Without page migration VM_LOCKED 
> > implies that the physical address stays the same. Kernel code may assume 
> > that VM_LOCKED -> dont migrate.
> > 
> Hmm.. I think such pages should have extra refcnt to prevent migration.

refcnts are for temporary use. An extra refcnt will make page migration 
retry until it gives up. It should not try to migrate an unmovable page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Status and the future of page migration
  2006-05-12  1:43       ` Christoph Lameter
@ 2006-05-12  2:08         ` KAMEZAWA Hiroyuki
  2006-05-12  3:21           ` Christoph Lameter
  0 siblings, 1 reply; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-05-12  2:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, ak, pj, kravetz, marcelo.tosatti, taka,
	lee.schermerhorn, haveblue

On Thu, 11 May 2006 18:43:13 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> > > You are right but there may be system components (such as device drivers) 
> > > that require the page not to be moved. Without page migration VM_LOCKED 
> > > implies that the physical address stays the same. Kernel code may assume 
> > > that VM_LOCKED -> dont migrate.
> > > 
> > Hmm.. I think such pages should have extra refcnt to prevent migration.
> 
> refcnts are for temporary use. An extra refcnt will make page migration 
> retry until it gives up. It should not try to migrate an unmovable page.
> 
Hmm...it seems the kernel drivers assumes the pages will not moved if VM_LOCKED.
I'm not sure which is better to replace all driver's VM_LOCKED to VM_DONTMOVE or
to add VM_KEEPONMEMORY for mlock() codes and just modify the kernel core.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Status and the future of page migration
  2006-05-12  2:08         ` KAMEZAWA Hiroyuki
@ 2006-05-12  3:21           ` Christoph Lameter
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-05-12  3:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, ak, pj, kravetz, marcelo.tosatti, taka,
	lee.schermerhorn, haveblue

On Fri, 12 May 2006, KAMEZAWA Hiroyuki wrote:

> Hmm...it seems the kernel drivers assumes the pages will not moved if VM_LOCKED.
> I'm not sure which is better to replace all driver's VM_LOCKED to VM_DONTMOVE or
> to add VM_KEEPONMEMORY for mlock() codes and just modify the kernel core.

We could add a MCL_DONTMOVE to mlockall() because we need also some way 
for user space to pin pages and then add a VM_DONTMOVE to the vm 
flags. Then do a global search through the kernel source and replace 
VM_LOCKED in the drivers with VM_DONTMOVE. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-05-12  3:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-12  0:06 Status and the future of page migration Christoph Lameter
2006-05-12  0:56 ` KAMEZAWA Hiroyuki
2006-05-12  1:06   ` Christoph Lameter
2006-05-12  1:35     ` KAMEZAWA Hiroyuki
2006-05-12  1:43       ` Christoph Lameter
2006-05-12  2:08         ` KAMEZAWA Hiroyuki
2006-05-12  3:21           ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox