From: Mike Waychison <mikew@google.com>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Oren Laadan <orenl@cs.columbia.edu>,
jeremy@goop.org, arnd@arndb.de, linux-api@vger.kernel.org,
containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Linux Torvalds <torvalds@osdl.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
"H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>
Subject: Re: [RFC v11][PATCH 03/13] General infrastructure for checkpoint restart
Date: Tue, 16 Dec 2008 14:43:32 -0800 [thread overview]
Message-ID: <49482F14.1040407@google.com> (raw)
In-Reply-To: <1229465641.17206.350.camel@nimitz>
Dave Hansen wrote:
> On Tue, 2008-12-16 at 13:54 -0800, Mike Waychison wrote:
>> Oren Laadan wrote:
>>> diff --git a/checkpoint/sys.c b/checkpoint/sys.c
>>> index 375129c..bd14ef9 100644
>>> --- a/checkpoint/sys.c
>>> +++ b/checkpoint/sys.c
>>> +/*
>>> + * During checkpoint and restart the code writes outs/reads in data
>>> + * to/from the checkpoint image from/to a temporary buffer (ctx->hbuf).
>>> + * Because operations can be nested, use cr_hbuf_get() to reserve space
>>> + * in the buffer, then cr_hbuf_put() when you no longer need that space.
>>> + */
>> This seems a bit over-kill for buffer management no? The only large
>> header seems to be cr_hdr_head and the blowup comes from utsinfo string
>> data (which could easily be moved out to be in it's own CR_HDR_STRING
>> blocks).
>>
>> Wouldn't it be easier to use stack-local storage than balancing the
>> cr_hbuf_get/put routines?
>
> I've asked the same question, so I'll give you Oren's response that I
> remember:
>
> cr_hbuf_get/put() are more of an API that we can use later. For now,
> those buffers really are temporary. But, in a case where we want to do
> a really fast checkpoint (to reduce "downtime" during the checkpoint) we
> store the image entirely in kernel memory to be written out later.
>
Hmm, if I'm understanding you correctly, adding ref counts explicitly
(like you suggest below) would be used to let a lower layer defer
writes. Seems like this could be just as easily done with explicits
kmallocs and transferring ownership of the allocated memory to the
in-kernel representation handling layer below (which in turn queues the
data structures for writes).
Any such layer would probably need to hold references to objects
enqueued for write-out, so they will still a full cleanup path in case
of success/error/abort (which means that any advantage of creating a
pool of allocations for O(1) cleanup disappears).
Reference counting these guys doesn't have a clear advantage to me.
They seem to have a pretty linear lifetime.
> In that case, cr_hbuf_put() stops doing anything at all because we keep
> the memory around.
>
> cr_hbuf_get() becomes, "I need some memory to write some checkpointy
> things into".
>
> cr_hbuf_put() becomes, "I'm done with this for now, only keep it if
> someone else needs it."
>
> This might all be a lot clearer if we just kept some more explicit
> accounting around about who is using the objects. Something like:
>
> struct cr_buf {
> struct kref ref;
> int size;
> char buf[0];
> };
>
> /* replaces cr_hbuf_get() */
> struct cr_buf *alloc_cr_buf(int size, gfp_t flags)
> {
> struct cr_buf *buf;
>
> buf = kmalloc(sizeof(cr_buf) + size, flags);
> if (!buf)
> return NULL;
> buf->ref = 1; /* or whatever */
> buf->size = size;
> return buf;
> }
>
> int cr_kwrite(struct cr_buf *buf)
> {
> if (writing_checkpoint_now) {
> // or whatever this write call was...
> vfs_write(&buf->buf[0], buf->size);
> } else if (deferring_write) {
> kref_get(buf->kref);
> }
> }
>
> -- Dave
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-12-16 22:42 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-05 17:31 [RFC v11][PATCH 00/13] Kernel based checkpoint/restart Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 01/13] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 02/13] Checkpoint/restart: initial documentation Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 03/13] General infrastructure for checkpoint restart Oren Laadan
2008-12-06 7:26 ` Joe Perches
2008-12-16 19:04 ` Mike Waychison
2008-12-16 19:28 ` Linus Torvalds
2008-12-16 21:54 ` Mike Waychison
2008-12-16 22:14 ` Dave Hansen
2008-12-16 22:43 ` Mike Waychison [this message]
2008-12-17 0:13 ` Dave Hansen
2008-12-16 23:42 ` Oren Laadan
2008-12-17 0:42 ` Mike Waychison
2008-12-17 2:08 ` Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 04/13] x86 support for checkpoint/restart Oren Laadan
2008-12-17 2:19 ` Mike Waychison
2008-12-17 15:23 ` Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 05/13] Dump memory address space Oren Laadan
2008-12-18 2:26 ` Mike Waychison
2008-12-18 11:10 ` Oren Laadan
2008-12-18 15:05 ` Dave Hansen
2008-12-18 15:54 ` Dave Hansen
2008-12-18 20:00 ` Oren Laadan
2008-12-18 18:15 ` Mike Waychison
2008-12-18 18:21 ` Dave Hansen
2008-12-18 20:11 ` Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 06/13] Restore " Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 07/13] Infrastructure for shared objects Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 08/13] Dump open file descriptors Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 09/13] Restore open file descriprtors Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 10/13] External checkpoint of a task other than ourself Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 11/13] Track in-kernel when we expect checkpoint/restart to work Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 12/13] Checkpoint multiple processes Oren Laadan
2008-12-05 17:31 ` [RFC v11][PATCH 13/13] Restart " Oren Laadan
2008-12-06 0:19 ` [RFC v11][PATCH 00/13] Kernel based checkpoint/restart Serge E. Hallyn
2008-12-09 19:42 ` Serge E. Hallyn
2008-12-16 18:43 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49482F14.1040407@google.com \
--to=mikew@google.com \
--cc=arnd@arndb.de \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=hpa@zytor.com \
--cc=jeremy@goop.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=orenl@cs.columbia.edu \
--cc=tglx@linutronix.de \
--cc=torvalds@osdl.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox