From: Oren Laadan <orenl@cs.columbia.edu>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>,
mingo@elte.hu, linux-api@vger.kernel.org,
containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
torvalds@linux-foundation.org, viro@zeniv.linux.org.uk,
hpa@zytor.com, tglx@linutronix.de
Subject: Re: [RFC v13][PATCH 00/14] Kernel based checkpoint/restart
Date: Thu, 12 Mar 2009 22:45:52 -0400 [thread overview]
Message-ID: <49B9C8E0.5080500@cs.columbia.edu> (raw)
In-Reply-To: <20090213152836.0fbbfa7d.akpm@linux-foundation.org>
Hi,
Just got back from 3 weeks with practically no internet, and I see
that I missed a big party !
Trying to catch up with what's been said so far --
"An app really has to know whether it can reliably checkpoint+restart."
It was suggested (Dave) to either have an "uncheckpointable" flag at containter,
or process, or resource level. Another suggestion (Serge, Alexey) was to let
the app try to checkpoint and return an error.
For what it's worth, I vote for the latter. Have the checkpoint code always
return an error if the checkpoint cannot be taken. If checkpoint succeeds
then the app/user is guaranteed that restart will succeed (if it is given
the right starting conditions, e.g. correct file system view).
To figure out what/when went wrong, the c/r code can indicate the _reason_
to the failure (e.g. output to the console, or other means) so that the
frustrated user/developer/app can report it. I also think it's cleaner as
it keep c/r consideration within the c/r subsystem and not scattered around
different locations in the kernel.
Andrew Morton wrote:
> On Thu, 12 Feb 2009 10:11:22 -0800
> Dave Hansen <dave@linux.vnet.ibm.com> wrote:
>
>> ...
>>
>>> - In bullet-point form, what features are missing, and should be added?
>> * support for more architectures than i386
>> * file descriptors:
>> * sockets (network, AF_UNIX, etc...)
>> * devices files
>> * shmfs, hugetlbfs
>> * epoll
>> * unlinked files
>> * Filesystem state
>> * contents of files
>> * mount tree for individual processes
>> * flock
>> * threads and sessions
>> * CPU and NUMA affinity
>> * sys_remap_file_pages()
>>
>> This is a very minimal list that is surely incomplete and sure to grow.
>
> That's a worry.
>
>>> For extra marks:
>>>
>>> - Will any of this involve non-trivial serialisation of kernel
>>> objects? If so, that's getting into the
>>> unacceptably-expensive-to-maintain space, I suspect.
>> We have some structures that are certainly tied to the kernel-internal
>> ones. However, we are certainly *not* simply writing kernel structures
>> to userspace. We could do that with /dev/mem. We are carefully pulling
>> out the minimal bits of information from the kernel structures that we
>> *need* to recreate the function of the structure at restart. There is a
>> maintenance burden here but, so far, that burden is almost entirely in
>> checkpoint/*.c. We intend to test this functionality thoroughly to
>> ensure that we don't regress once we have integrated it.
>
> I guess my question can be approximately simplified to: "will it end up
> looking like openvz"? (I don't believe that we know of any other way
> of implementing this?)
>
> Because if it does then that's a concern, because my assessment when I
> looked at that code (a number of years ago) was that having code of
> that nature in mainline would be pretty costly to us, and rather
> unwelcome.
I originally implemented c/r for linux as as kernel module, without
requiring any changes from the kernel. (Doing the namespaces as a kernel
module was much harder). For more details, see:
https://www.ncl.cs.columbia.edu/research/migrate
The current set of patches is the beginning of a re-implementation
based on that work and other lessons learned, as well as feedback and
collaboration with other players.
I am confident that the the vast majority of the code will end up as a
separate "subsystem", and that relatively few changes will be required
from the existing kernel.
> The broadest form of the question is "will we end up regretting having
> done this".
I bet that once this works for a critical mass of apps/users - we will
never regret having done this. (We may regret - and fix - having done
specific part this way or another).
>
> If we can arrange for the implementation to sit quietly over in a
> corner with a team of people maintaining it and not screwing up other
> people's work then I guess we'd be OK - if it breaks then the breakage
> is localised.
In my experience, there is very little code of the c/r that affects
other parts of the kernel, it's mostly isolated. So I believe this
will be the case.
>
> And it's not just a matter of "does the diffstat only affect a single
> subdirectory". We also should watch out for the imposition of new
> rules which kernel code must follow. "you can't do that, because we
> can't serialise it", or something.
>
> Similar to the way in which perfectly correct and normal kernel
> sometimes has to be changed because it unexpectedly upsets the -rt
> patch.
>
> Do you expect that any restrictions of this type will be imposed?
>
That an excellent point. Again, judging from past experience -
it is possible (but not always pretty) to implement c/r as a kernel
module, without requiring _any_ kernel changes. I can't think of
any such restrictions, but we'll certainly have to keep our eyes
open.
Oren
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-03-13 2:46 UTC|newest]
Thread overview: 121+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-27 17:07 Oren Laadan
2009-01-27 17:07 ` [RFC v13][PATCH 01/14] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-01-27 17:20 ` Randy Dunlap
2009-01-27 17:08 ` [RFC v13][PATCH 02/14] Checkpoint/restart: initial documentation Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 03/14] Make file_pos_read/write() public Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 04/14] General infrastructure for checkpoint restart Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 05/14] x86 support for checkpoint/restart Oren Laadan
2009-02-24 7:47 ` Nathan Lynch
2009-02-24 16:06 ` Dave Hansen
2009-03-18 7:21 ` Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 06/14] Dump memory address space Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 07/14] Restore " Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 08/14] Infrastructure for shared objects Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 09/14] Dump open file descriptors Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 10/14] Restore open file descriprtors Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 11/14] External checkpoint of a task other than ourself Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 12/14] Track in-kernel when we expect checkpoint/restart to work Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 13/14] Checkpoint multiple processes Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 14/14] Restart " Oren Laadan
2009-02-10 17:05 ` [RFC v13][PATCH 00/14] Kernel based checkpoint/restart Dave Hansen
2009-02-11 22:14 ` Andrew Morton
2009-02-12 9:17 ` Ingo Molnar
2009-02-12 18:11 ` Dave Hansen
2009-02-12 20:48 ` Serge E. Hallyn
2009-02-13 10:20 ` Ingo Molnar
2009-02-12 18:11 ` Dave Hansen
2009-02-12 19:30 ` Matt Mackall
2009-02-12 19:42 ` Andrew Morton
2009-02-12 21:51 ` What can OpenVZ do? Dave Hansen
2009-02-12 22:10 ` Andrew Morton
2009-02-12 23:04 ` How much of a mess does OpenVZ make? ;) Was: " Dave Hansen
2009-02-26 15:57 ` Alexey Dobriyan
2009-03-10 21:53 ` Alexey Dobriyan
2009-03-10 23:28 ` Serge E. Hallyn
2009-03-11 8:26 ` Cedric Le Goater
2009-03-12 14:53 ` Serge E. Hallyn
2009-03-12 21:01 ` Greg Kurz
2009-03-12 21:21 ` Serge E. Hallyn
2009-03-13 4:29 ` Ying Han
2009-03-13 5:34 ` Sukadev Bhattiprolu
2009-03-13 6:19 ` Ying Han
2009-03-13 17:27 ` Linus Torvalds
2009-03-13 19:02 ` Serge E. Hallyn
2009-03-13 19:35 ` Alexey Dobriyan
2009-03-13 21:01 ` Linus Torvalds
2009-03-13 21:51 ` Dave Hansen
2009-03-13 22:15 ` Oren Laadan
2009-03-14 0:27 ` Eric W. Biederman
2009-03-14 8:12 ` Ingo Molnar
2009-03-16 22:33 ` Kevin Fox
2009-03-19 21:19 ` Eric W. Biederman
2009-03-14 0:20 ` Alexey Dobriyan
2009-03-14 8:25 ` Ingo Molnar
[not found] ` <20090314082532.GB16436-X9Un+BFzKDI@public.gmane.org>
2009-03-14 17:11 ` Joseph Ruscio
2009-03-16 6:01 ` Oren Laadan
2009-03-13 20:48 ` Mike Waychison
2009-03-13 22:35 ` Oren Laadan
2009-03-18 18:54 ` Mike Waychison
2009-03-18 19:04 ` Oren Laadan
2009-03-13 15:27 ` Cedric Le Goater
2009-03-13 17:11 ` Greg Kurz
2009-03-13 17:37 ` Serge E. Hallyn
2009-03-13 15:47 ` Cedric Le Goater
2009-03-13 16:35 ` Serge E. Hallyn
2009-03-13 16:53 ` Cedric Le Goater
2009-02-26 16:27 ` Alexey Dobriyan
2009-02-26 17:33 ` Ingo Molnar
2009-02-26 18:30 ` Greg Kurz
2009-02-26 22:17 ` Alexey Dobriyan
2009-02-27 9:19 ` Greg Kurz
2009-02-27 10:53 ` Alexey Dobriyan
2009-02-27 14:33 ` Cedric Le Goater
2009-02-27 9:36 ` Cedric Le Goater
2009-02-26 22:31 ` Alexey Dobriyan
2009-02-27 9:03 ` Ingo Molnar
2009-02-27 9:19 ` Andrew Morton
2009-02-27 10:57 ` Alexey Dobriyan
2009-02-27 9:22 ` Andrew Morton
2009-02-27 10:59 ` Alexey Dobriyan
2009-02-27 16:14 ` Dave Hansen
2009-02-27 21:57 ` Alexey Dobriyan
2009-02-27 21:54 ` Dave Hansen
2009-03-01 1:33 ` Alexey Dobriyan
2009-03-01 20:02 ` Serge E. Hallyn
2009-03-01 20:56 ` Alexey Dobriyan
2009-03-01 22:21 ` Serge E. Hallyn
2009-03-03 16:17 ` Cedric Le Goater
2009-03-03 18:28 ` Serge E. Hallyn
2009-02-13 10:53 ` Ingo Molnar
2009-02-16 20:51 ` Dave Hansen
2009-02-17 22:23 ` Ingo Molnar
2009-02-17 22:30 ` Dave Hansen
2009-02-18 0:32 ` Ingo Molnar
2009-02-18 0:40 ` Dave Hansen
2009-02-18 5:11 ` Alexey Dobriyan
2009-02-18 18:16 ` Ingo Molnar
2009-02-18 21:27 ` Dave Hansen
2009-02-18 23:15 ` Ingo Molnar
2009-02-19 19:06 ` Banning checkpoint (was: Re: What can OpenVZ do?) Alexey Dobriyan
2009-02-19 19:11 ` Dave Hansen
2009-02-24 4:47 ` Alexey Dobriyan
2009-02-24 5:11 ` Dave Hansen
2009-02-24 15:43 ` Serge E. Hallyn
2009-02-24 20:09 ` Alexey Dobriyan
2009-02-12 22:17 ` What can OpenVZ do? Alexey Dobriyan
2009-02-13 10:27 ` Ingo Molnar
2009-02-13 11:32 ` Alexey Dobriyan
2009-02-13 11:45 ` Ingo Molnar
2009-02-13 22:28 ` Alexey Dobriyan
2009-03-14 0:04 ` Eric W. Biederman
2009-03-14 0:26 ` Serge E. Hallyn
2009-02-12 22:57 ` [RFC v13][PATCH 00/14] Kernel based checkpoint/restart Dave Hansen
2009-02-12 23:05 ` Matt Mackall
2009-02-12 23:13 ` Dave Hansen
2009-02-13 23:28 ` Andrew Morton
2009-02-14 23:08 ` Ingo Molnar
2009-02-14 23:31 ` Andrew Morton
2009-02-14 23:50 ` Ingo Molnar
2009-02-16 17:37 ` Dave Hansen
2009-03-13 2:45 ` Oren Laadan [this message]
2009-03-13 3:57 ` Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49B9C8E0.5080500@cs.columbia.edu \
--to=orenl@cs.columbia.edu \
--cc=akpm@linux-foundation.org \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=hpa@zytor.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox