* manual page migration, revisited...
@ 2004-11-05 22:50 Ray Bryant
2004-11-05 23:02 ` Nigel Cunningham
0 siblings, 1 reply; 8+ messages in thread
From: Ray Bryant @ 2004-11-05 22:50 UTC (permalink / raw)
To: Marcelo Tosatti, Hirokazu Takahashi; +Cc: linux-mm
Marcelo and Takahashi-san (and anyone else who would like to comment),
This is a little off topic, but this is as good of thread as any to start this
discussion on. Feel free to peel this off as a separate discussion thread
asap if you like.
We have a requirement (for a potential customer) to do the following kind of
thing:
(1) Suspend and swap out a running process so that the node where the process
is running can be reassigned to a higher priority job.
(2) Resume and swap back in those suspended jobs, restoring the original
memory layout on the original nodes, or
(3) Resume and swap back in those suspended jobs on a new set of nodes, with
as similar topological layout as possible. (It's also possible we may
want to just move the jobs directly from one set of nodes to another
without swapping them out first.
This is all in the context of a batch scheduler being used to run jobs on
a large paralell machine.
As I understand it, there are various patches floating around (including the
migration code that you are working on, the memory hotplug removal code, etc)
that do parts of this, but I've had a little trouble piecing together the
status of those various patches and where to get them. (e. g. where do I get
the latest migration cache code?).
There was also a thread in early April 2004 on this list about manual page
migration, I think, but I don't know where that went, if anywhere (that would
satisfy requirement 3.)
So the question I am asking, I guess, is where would you suggest we start on
an implementation for something like the above? Which existing bits and
peices can I pick up, if anything, from your migration cache work and or the
memory hotplug work, do you think? Or, which patches should I be looking at
for ideas?
I'm not asking you to >>do<< this work, of course, I'm just trying to get a
start on the above and not unecessarily duplicate anyone's previous work in
this area. Any pointers or advice would be greatly appeciated.
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant
512-453-9679 (work) 512-507-7807 (cell)
raybry@sgi.com raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: manual page migration, revisited...
2004-11-05 22:50 manual page migration, revisited Ray Bryant
@ 2004-11-05 23:02 ` Nigel Cunningham
2004-11-06 17:48 ` Marcelo Tosatti
2004-11-07 4:57 ` Ray Bryant
0 siblings, 2 replies; 8+ messages in thread
From: Nigel Cunningham @ 2004-11-05 23:02 UTC (permalink / raw)
To: Ray Bryant; +Cc: Marcelo Tosatti, Hirokazu Takahashi, Linux Memory Management
Hi.
On Sat, 2004-11-06 at 09:50, Ray Bryant wrote:
> Marcelo and Takahashi-san (and anyone else who would like to comment),
>
> This is a little off topic, but this is as good of thread as any to start this
> discussion on. Feel free to peel this off as a separate discussion thread
> asap if you like.
>
> We have a requirement (for a potential customer) to do the following kind of
> thing:
>
> (1) Suspend and swap out a running process so that the node where the process
> is running can be reassigned to a higher priority job.
>
> (2) Resume and swap back in those suspended jobs, restoring the original
> memory layout on the original nodes, or
>
> (3) Resume and swap back in those suspended jobs on a new set of nodes, with
> as similar topological layout as possible. (It's also possible we may
> want to just move the jobs directly from one set of nodes to another
> without swapping them out first.
You may not even need any kernel patches to accomplish this. Bernard
Blackham wrote some code called cryopid: http://cryopid.berlios.de/. I
haven't tried it myself, but it sounds like it might be at least part of
what you're after.
Regards,
Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901
You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: manual page migration, revisited...
2004-11-05 23:02 ` Nigel Cunningham
@ 2004-11-06 17:48 ` Marcelo Tosatti
2004-11-07 2:58 ` Nigel Cunningham
2004-11-07 5:08 ` Ray Bryant
2004-11-07 4:57 ` Ray Bryant
1 sibling, 2 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2004-11-06 17:48 UTC (permalink / raw)
To: Nigel Cunningham; +Cc: Ray Bryant, Hirokazu Takahashi, Linux Memory Management
On Sat, Nov 06, 2004 at 10:02:22AM +1100, Nigel Cunningham wrote:
> Hi.
>
> On Sat, 2004-11-06 at 09:50, Ray Bryant wrote:
> > Marcelo and Takahashi-san (and anyone else who would like to comment),
> >
> > This is a little off topic, but this is as good of thread as any to start this
> > discussion on. Feel free to peel this off as a separate discussion thread
> > asap if you like.
> >
> > We have a requirement (for a potential customer) to do the following kind of
> > thing:
> >
> > (1) Suspend and swap out a running process so that the node where the process
> > is running can be reassigned to a higher priority job.
> >
> > (2) Resume and swap back in those suspended jobs, restoring the original
> > memory layout on the original nodes, or
> >
> > (3) Resume and swap back in those suspended jobs on a new set of nodes, with
> > as similar topological layout as possible. (It's also possible we may
> > want to just move the jobs directly from one set of nodes to another
> > without swapping them out first.
>
> You may not even need any kernel patches to accomplish this. Bernard
> Blackham wrote some code called cryopid: http://cryopid.berlios.de/. I
> haven't tried it myself, but it sounds like it might be at least part of
> what you're after.
Hi Ray, Nigel,
And the swsusp code itself, isnt it what its doing? Stopping all processes,
saving their memory to disk, and resuming later on.
You should just need an API to stop a specific process?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: manual page migration, revisited...
2004-11-06 17:48 ` Marcelo Tosatti
@ 2004-11-07 2:58 ` Nigel Cunningham
2004-11-07 5:08 ` Ray Bryant
1 sibling, 0 replies; 8+ messages in thread
From: Nigel Cunningham @ 2004-11-07 2:58 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Ray Bryant, Hirokazu Takahashi, Linux Memory Management
Hi.
On Sun, 2004-11-07 at 04:48, Marcelo Tosatti wrote:
> On Sat, Nov 06, 2004 at 10:02:22AM +1100, Nigel Cunningham wrote:
> > On Sat, 2004-11-06 at 09:50, Ray Bryant wrote:
> > > Marcelo and Takahashi-san (and anyone else who would like to comment),
> > >
> > > This is a little off topic, but this is as good of thread as any to start this
> > > discussion on. Feel free to peel this off as a separate discussion thread
> > > asap if you like.
> > >
> > > We have a requirement (for a potential customer) to do the following kind of
> > > thing:
> > >
> > > (1) Suspend and swap out a running process so that the node where the process
> > > is running can be reassigned to a higher priority job.
> > >
> > > (2) Resume and swap back in those suspended jobs, restoring the original
> > > memory layout on the original nodes, or
> > >
> > > (3) Resume and swap back in those suspended jobs on a new set of nodes, with
> > > as similar topological layout as possible. (It's also possible we may
> > > want to just move the jobs directly from one set of nodes to another
> > > without swapping them out first.
> >
> > You may not even need any kernel patches to accomplish this. Bernard
> > Blackham wrote some code called cryopid: http://cryopid.berlios.de/. I
> > haven't tried it myself, but it sounds like it might be at least part of
> > what you're after.
>
> Hi Ray, Nigel,
>
> And the swsusp code itself, isnt it what its doing? Stopping all processes,
> saving their memory to disk, and resuming later on.
Software suspend does the whole machine; I was understanding, perhaps
wrongly, that Ray only wants to move particular processes.
> You should just need an API to stop a specific process?
(And save it's state).
Regards,
Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901
You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: manual page migration, revisited...
2004-11-05 23:02 ` Nigel Cunningham
2004-11-06 17:48 ` Marcelo Tosatti
@ 2004-11-07 4:57 ` Ray Bryant
2004-11-07 21:11 ` Nigel Cunningham
1 sibling, 1 reply; 8+ messages in thread
From: Ray Bryant @ 2004-11-07 4:57 UTC (permalink / raw)
To: ncunningham; +Cc: Marcelo Tosatti, Hirokazu Takahashi, Linux Memory Management
Nigel Cunningham wrote:
> Hi.
>
> On Sat, 2004-11-06 at 09:50, Ray Bryant wrote:
>
>>Marcelo and Takahashi-san (and anyone else who would like to comment),
>>
>>This is a little off topic, but this is as good of thread as any to start this
>>discussion on. Feel free to peel this off as a separate discussion thread
>>asap if you like.
>>
>>We have a requirement (for a potential customer) to do the following kind of
>>thing:
>>
>>(1) Suspend and swap out a running process so that the node where the process
>> is running can be reassigned to a higher priority job.
>>
>>(2) Resume and swap back in those suspended jobs, restoring the original
>> memory layout on the original nodes, or
>>
>>(3) Resume and swap back in those suspended jobs on a new set of nodes, with
>> as similar topological layout as possible. (It's also possible we may
>> want to just move the jobs directly from one set of nodes to another
>> without swapping them out first.
>
>
> You may not even need any kernel patches to accomplish this. Bernard
> Blackham wrote some code called cryopid: http://cryopid.berlios.de/. I
> haven't tried it myself, but it sounds like it might be at least part of
> what you're after.
>
> Regards,
>
> Nigel
Nigel,
I think that having the resumed processes show up with a different pid than
they had before is show-stopper. In a multiprocess parallel program, we have
no idea whether the program itself has saved way pid's and is using them to
send signals or whatnot. So I don't think there is a user space-only solution
that will solve this problem for us, but it an interesting alternative to
the kernel-only solutions I've been contemplating. There is probably some
intermediate ground there which holds the real solution.
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant
512-453-9679 (work) 512-507-7807 (cell)
raybry@sgi.com raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: manual page migration, revisited...
2004-11-06 17:48 ` Marcelo Tosatti
2004-11-07 2:58 ` Nigel Cunningham
@ 2004-11-07 5:08 ` Ray Bryant
2004-11-07 11:19 ` Hirokazu Takahashi
1 sibling, 1 reply; 8+ messages in thread
From: Ray Bryant @ 2004-11-07 5:08 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Nigel Cunningham, Hirokazu Takahashi, Linux Memory Management
Marcelo Tosatti wrote:
>>You may not even need any kernel patches to accomplish this. Bernard
>>Blackham wrote some code called cryopid: http://cryopid.berlios.de/. I
>>haven't tried it myself, but it sounds like it might be at least part of
>>what you're after.
>
>
> Hi Ray, Nigel,
>
> And the swsusp code itself, isnt it what its doing? Stopping all processes,
> saving their memory to disk, and resuming later on.
>
> You should just need an API to stop a specific process?
>
I think that sending the process a SIGSTOP is probably good enough to stop
it for our purposes. But in addition to that, the reason we stopped the
process is so we can start up another process on that node. Now, we can
wait for memory pressure to grow to the point that kswap will force out
the stopped processes's pages, but, why should the VM have to go to the
effort to figure that out? Why not tell them VM somehow, that we don't
want these pages in memory, and to please swap them out to make space for
the new program that is running?
Of course, one can argue that we don't know for sure that the new program
will use enough space to force the other process out, but we worry that in
that case, the new program could still end up with non-local memory allocation
and that is an anathema to the HPC world where we require the good performance
that local storage allocation provides. We want the new process that is
run on the node to get as good performance as it would have gotten if it had
started on an idle node.
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant
512-453-9679 (work) 512-507-7807 (cell)
raybry@sgi.com raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: manual page migration, revisited...
2004-11-07 5:08 ` Ray Bryant
@ 2004-11-07 11:19 ` Hirokazu Takahashi
0 siblings, 0 replies; 8+ messages in thread
From: Hirokazu Takahashi @ 2004-11-07 11:19 UTC (permalink / raw)
To: raybry; +Cc: marcelo.tosatti, ncunningham, linux-mm
Hi, Ray,
> Marcelo Tosatti wrote:
>
> >>You may not even need any kernel patches to accomplish this. Bernard
> >>Blackham wrote some code called cryopid: http://cryopid.berlios.de/. I
> >>haven't tried it myself, but it sounds like it might be at least part of
> >>what you're after.
> >
> >
> > Hi Ray, Nigel,
> >
> > And the swsusp code itself, isnt it what its doing? Stopping all processes,
> > saving their memory to disk, and resuming later on.
looks interesting.
> > You should just need an API to stop a specific process?
> >
>
> I think that sending the process a SIGSTOP is probably good enough to stop
> it for our purposes. But in addition to that, the reason we stopped the
> process is so we can start up another process on that node. Now, we can
> wait for memory pressure to grow to the point that kswap will force out
> the stopped processes's pages, but, why should the VM have to go to the
> effort to figure that out? Why not tell them VM somehow, that we don't
> want these pages in memory, and to please swap them out to make space for
> the new program that is running?
I agree stopping the target processes is enough.
I thing you want to introduce whole process swapout mechanism
which linux haven't implemented.
I feel it isn't difficult to implement it. The following steps
may work.
1. stop the target processes with SIGSTOP signal.
2. choose the pages, which depend on the processes.
3. pass them to shrink_list() with proper parameters.
shrink_list() may have to be called several times to handle
active pages and wait for the completion of the writeback I/Os
which the previous shrink_list() has started.
If you just want to make the pages migrated to another node,
the migration code may help you. This is called process migration
which NUMA guys may be also interested in.
1. select the target node where the processes are going to move,
and move them to the target runqueue.
2. choose the pages, which depend on the processes.
3. start memory-migration against the pages.
> Of course, one can argue that we don't know for sure that the new program
> will use enough space to force the other process out, but we worry that in
> that case, the new program could still end up with non-local memory allocation
> and that is an anathema to the HPC world where we require the good performance
> that local storage allocation provides. We want the new process that is
> run on the node to get as good performance as it would have gotten if it had
> started on an idle node.
> --
Thanks,
Hirokazu Takahashi.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: manual page migration, revisited...
2004-11-07 4:57 ` Ray Bryant
@ 2004-11-07 21:11 ` Nigel Cunningham
0 siblings, 0 replies; 8+ messages in thread
From: Nigel Cunningham @ 2004-11-07 21:11 UTC (permalink / raw)
To: Ray Bryant; +Cc: Marcelo Tosatti, Hirokazu Takahashi, Linux Memory Management
Hi.
On Sun, 2004-11-07 at 15:57, Ray Bryant wrote:
> I think that having the resumed processes show up with a different pid than
> they had before is show-stopper. In a multiprocess parallel program, we have
> no idea whether the program itself has saved way pid's and is using them to
> send signals or whatnot. So I don't think there is a user space-only solution
> that will solve this problem for us, but it an interesting alternative to
> the kernel-only solutions I've been contemplating. There is probably some
> intermediate ground there which holds the real solution.
I agree; it should be pretty trivial to add a patch to check that a
given PID is not in use, allocate it and get the resumed program known
by that PID. I won't offer to do it, though. I've got enough work at the
moment :>
Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901
You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-11-07 21:11 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-05 22:50 manual page migration, revisited Ray Bryant
2004-11-05 23:02 ` Nigel Cunningham
2004-11-06 17:48 ` Marcelo Tosatti
2004-11-07 2:58 ` Nigel Cunningham
2004-11-07 5:08 ` Ray Bryant
2004-11-07 11:19 ` Hirokazu Takahashi
2004-11-07 4:57 ` Ray Bryant
2004-11-07 21:11 ` Nigel Cunningham
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox