linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Robin Holt <holt@sgi.com>
To: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Robin Holt <holt@sgi.com>, ": Paul Jackson" <pj@sgi.com>,
	haveblue@us.ibm.com, raybry@sgi.com, taka@valinux.co.jp,
	hugh@veritas.com, akpm@osdl.org, marcello@cyclades.com,
	raybry@austin.rr.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate
Date: Tue, 15 Feb 2005 16:51:52 -0600	[thread overview]
Message-ID: <20050215225152.GA26753@lnx-holt.americas.sgi.com> (raw)
In-Reply-To: <16914.28795.316835.291470@wombat.chubb.wattle.id.au>

On Wed, Feb 16, 2005 at 08:58:19AM +1100, Peter Chubb wrote:
> >>>>> "Robin" == Robin Holt <holt@sgi.com> writes:
> 
> Robin> On Tue, Feb 15, 2005 at 08:35:29AM -0800, Paul Jackson wrote:
> >> What about the suggestion I had that you sort of skipped over,
> >> which amounted to changing the system call from a node array to
> >> just one node:
> >> 
> >> sys_page_migrate(pid, va_start, va_end, count, old_nodes,
> >> new_nodes);
> >> 
> >> to:
> >> 
> >> sys_page_migrate(pid, va_start, va_end, old_node, new_node);
> >> 
> >> Doesn't that let you do all you need to?  Is it insane too?
> 
> Robin> Migration could be done in most cases and would only fall apart
> Robin> when there are overlapping node lists and no nodes available as
> Robin> temp space and we are not moving large chunks of data.
> 
> A possibly stupid suggestion: 
> 
> Can page migration be done lazily, instead of all at once?  Move the
> process, mark its pages as candidates for migration, and when 
> the page faults, decide whether to copy across or not...
> 
> That way you only copy the pages the process is using, and only copy
> each page once.  It makes copy for replication easier in some future
> incarnation, too, because the same basic infrastructure can be used.

I would agree that lazy might be possible, but then we need to keep track
of the desired destination and can not rely upon first touch as that
will likely result in scrambling the memory of the application.

I have been very lax in describing how a typical MPI application works.
This method has been in place for years and is commonly accepted practice.

In the MPI model, a set of large mappings are done by the first process.
It then forks x number of worker threads which touch their chunk of
memory and rendezvous with the other workers.  Once all workers have
redezvoused, they are allowed to start their processing.  A typical
worker thread will reference their memory set 85-97% of the time and
reference other memory sets in a read-only fashion the other part
of the time.

It is important to performance that the worker threads memory remains
as close to its cpu as possible.  Any time the memory is on a different
node, the performance of that thread degrades (memory is further away)
and performance of the other thread is hindered (its memory controller
is more busy) and the read portions of the neighbor threads to both
of the afor mentioned worker threads is hindered as there is more
NUMA activity.  Given all that, there is a common concept in MPI called
a barrier where when worker threads complete a work set, they awaken
threads waiting at the barrier associated with the work set.  As a
result of this wait, by slowing down a single thread you can have a
cascade effect which slows down the entire application significantly
as barriers are missed.

Because of all this discussion, memory placement needs be thought of
as relative to the worker threads and maintained relatively consistent
before and after the migration.

Another issue with making it a lazy migrate is the real impetus for
this is to free up memory on a node so a job can be stopped on one
node, migrated to a different node and thereby free up the original
node for a second job which would not fit with the original job
taking up a section of the machine which would cause the other
job to perform too poorly.

Sorry for the long rambling explanation.  I guess I will try to
break this into smaller chunks on the upcoming discussion on the
linux-mm list.

Thanks,
Robin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  parent reply	other threads:[~2005-02-15 22:51 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-12  3:25 [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview Ray Bryant
2005-02-12  3:25 ` [RFC 2.6.11-rc2-mm2 1/7] mm: manual page migration -- cleanup 1 Ray Bryant
2005-02-12  3:25 ` [RFC 2.6.11-rc2-mm2 2/7] mm: manual page migration -- cleanup 2 Ray Bryant
2005-02-12  3:25 ` [RFC 2.6.11-rc2-mm2 3/7] mm: manual page migration -- cleanup 3 Ray Bryant
2005-02-12  3:26 ` [RFC 2.6.11-rc2-mm2 4/7] mm: manual page migration -- cleanup 4 Ray Bryant
2005-02-12  3:26 ` [RFC 2.6.11-rc2-mm2 5/7] mm: manual page migration -- cleanup 5 Ray Bryant
2005-02-12  3:26 ` [RFC 2.6.11-rc2-mm2 6/7] mm: manual page migration -- add node_map arg to try_to_migrate_pages() Ray Bryant
2005-02-12  3:26 ` [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate Ray Bryant
2005-02-12  8:08   ` Paul Jackson
2005-02-12 12:34   ` Arjan van de Ven
2005-02-12 14:48     ` Andi Kleen
2005-02-12 20:51       ` Paul Jackson
2005-02-12 21:04   ` Dave Hansen
2005-02-12 21:44     ` Paul Jackson
2005-02-14 13:52     ` Robin Holt
2005-02-14 18:50       ` Dave Hansen
2005-02-14 22:01         ` Robin Holt
2005-02-14 22:22           ` Dave Hansen
2005-02-15 10:50             ` Robin Holt
2005-02-15 15:38               ` Paul Jackson
2005-02-15 18:39               ` Dave Hansen
2005-02-15 18:54                 ` Ray Bryant
2005-02-15 15:49           ` Paul Jackson
2005-02-15 16:21             ` Robin Holt
2005-02-15 16:35               ` Paul Jackson
2005-02-15 18:59                 ` Robin Holt
2005-02-15 20:54                   ` Dave Hansen
2005-02-15 21:58                   ` Peter Chubb
2005-02-15 22:10                     ` Paul Jackson
2005-02-15 22:51                     ` Robin Holt [this message]
2005-02-15 23:00                       ` Paul Jackson
2005-02-15 23:21                     ` Ray Bryant
2005-02-15 23:51                       ` Martin J. Bligh
2005-02-16  0:38                         ` Ray Bryant
2005-02-16  0:44                           ` Andi Kleen
2005-02-16  0:54                             ` Martin J. Bligh
2005-02-16 10:02                               ` Andi Kleen
2005-02-16 15:21                                 ` Martin J. Bligh
2005-02-16 15:49                                   ` Paul Jackson
2005-02-16 16:08                                     ` Andi Kleen
2005-02-16 16:55                                       ` Martin J. Bligh
2005-02-16 23:35                                         ` Ray Bryant
2005-02-16  0:50                           ` Martin J. Bligh
2005-02-15 15:40         ` Paul Jackson
2005-02-12 11:17 ` [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview Andi Kleen
2005-02-12 12:12   ` Robin Holt
2005-02-14 19:18     ` Andi Kleen
2005-02-15  1:02       ` Steve Longerbeam
2005-02-12 15:54   ` Marcelo Tosatti
2005-02-12 16:18     ` Marcelo Tosatti
2005-02-12 21:29     ` Andi Kleen
2005-02-14 16:38       ` Robin Holt
2005-02-14 19:15         ` Andi Kleen
2005-02-14 23:49           ` Ray Bryant
2005-02-15  3:16             ` Paul Jackson
2005-02-15  9:14               ` Ray Bryant
2005-02-15 15:21                 ` Paul Jackson
2005-02-15  0:29   ` Ray Bryant
2005-02-15 11:05     ` Robin Holt
2005-02-15 17:44       ` Ray Bryant
2005-02-15 11:53     ` Andi Kleen
2005-02-15 12:15       ` Robin Holt
2005-02-15 15:07         ` Paul Jackson
2005-02-15 15:11         ` Paul Jackson
2005-02-15 18:16       ` Ray Bryant
2005-02-15 18:24         ` Andi Kleen
2005-02-15 12:14     ` [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II Andi Kleen
2005-02-15 18:38       ` Ray Bryant
2005-02-15 21:48         ` Andi Kleen
2005-02-15 22:37           ` Paul Jackson
2005-02-16  3:44           ` Ray Bryant
2005-02-17 23:54             ` Andi Kleen
2005-02-18  8:38               ` Ray Bryant
2005-02-18 13:02                 ` Andi Kleen
2005-02-18 16:18                   ` Paul Jackson
2005-02-18 16:20                   ` Paul Jackson
2005-02-18 16:22                   ` Paul Jackson
2005-02-18 16:25                   ` Paul Jackson
2005-02-19  1:01                   ` Ray Bryant
2005-02-20 21:49                     ` Andi Kleen
2005-02-20 22:30                       ` Paul Jackson
2005-02-20 22:35                         ` Andi Kleen
2005-02-21  1:50                           ` Paul Jackson
2005-02-21  7:39                             ` Ray Bryant
2005-02-21  7:29                           ` Ray Bryant
2005-02-21  9:57                             ` Andi Kleen
2005-02-21 12:02                               ` Paul Jackson
2005-02-21  8:42                           ` Ray Bryant
2005-02-21 12:10                             ` Andi Kleen
2005-02-21 17:12                               ` Ray Bryant
2005-02-22 18:03                                 ` Andi Kleen
2005-02-23  3:33                                   ` Ray Bryant
2005-02-22  6:40                               ` Ray Bryant
2005-02-22 18:01                                 ` Andi Kleen
2005-02-22 18:45                                   ` Ray Bryant
2005-02-22 18:49                                     ` Andi Kleen
2005-02-26 18:22                                       ` Ray Bryant
2005-02-22 22:04                                   ` Ray Bryant
2005-02-22  6:44                               ` Ray Bryant
2005-02-21  4:20                       ` Ray Bryant
2005-02-18 16:58               ` Ray Bryant
2005-02-18 17:02               ` Ray Bryant
2005-02-18 17:11               ` Ray Bryant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050215225152.GA26753@lnx-holt.americas.sgi.com \
    --to=holt@sgi.com \
    --cc=akpm@osdl.org \
    --cc=haveblue@us.ibm.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=marcello@cyclades.com \
    --cc=peterc@gelato.unsw.edu.au \
    --cc=pj@sgi.com \
    --cc=raybry@austin.rr.com \
    --cc=raybry@sgi.com \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox