Re: [RFC v12][PATCH 14/14] Restart multiple processes

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Oren Laadan <orenl@cs.columbia.edu>
To: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@osdl.org>,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-api@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Mike Waychison <mikew@google.com>
Subject: Re: [RFC v12][PATCH 14/14] Restart multiple processes
Date: Tue, 06 Jan 2009 22:20:59 -0500	[thread overview]
Message-ID: <49641F9B.9020803@cs.columbia.edu> (raw)
In-Reply-To: <20090104201957.GA12725@us.ibm.com>



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl@cs.columbia.edu):
>> Restarting of multiple processes expects all restarting tasks to call
>> sys_restart(). Once inside the system call, each task will restart
>> itself at the same order that they were saved. The internals of the
>> syscall will take care of in-kernel synchronization bewteen tasks.

[...]

> Thanks, Oren.
> 
> Acked-by: Serge Hallyn <serue@us.ibm.com>
> 
> with a few comments below.

Thanks for the comments.

[...]

>> +/* FIXME: this should be per container */
>> +DECLARE_WAIT_QUEUE_HEAD(cr_restart_waitq);
>> +
>> +static int do_restart_task(struct cr_ctx *ctx, pid_t pid)
> 
> Passing ctx in here, when it is always NULL, is just a bit
> confusing, and, since you goto out and cr_ctx_put(ctx) without
> initializing ctx, you make verifying that that's ok a step
> harder.
> 
> Do you intend to ever pass in non-NULL in later patches?

Yes; for instance I expect Andrey's create-tasks-in-kernel patch
(when ready) to eventually have each thread call do_restart_task().
I also liked the symmetry with do_restart_root()...

You are correct that it's confusing now, so I'll take it out.

> 
>> +{
>> +	struct task_struct *root_task;
>> +	int ret;
>> +
>> +	rcu_read_lock();
>> +	root_task = find_task_by_pid_ns(pid, current->nsproxy->pid_ns);
>> +	if (root_task)
>> +		get_task_struct(root_task);
>> +	rcu_read_unlock();
>> +
>> +	if (!root_task)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * wait for container init to initialize the restart context, then
>> +	 * grab a reference to that context, and if we're the last task to
>> +	 * do it, notify the container init.
>> +	 */
>> +	ret = wait_event_interruptible(cr_restart_waitq,
>> +				       root_task->checkpoint_ctx);
> 
> Would seem more sensible to do the above using the ctx which
> you grabbed under task_lock(root_task) next.  Your whole
> locking of root_task->checkpoint_ctx seems haphazard (see below).

At this point 'root_task->checkpoint_ctx' may still be NULL if the
container init did not yet initialize it. That's why we first wait
for it to become non-NULL.

The code solves a coordination problem between the container init and
the other tasks, given that:
a) the order in which they call sys_restart() is unknown
b) the container init creates a 'ctx' that is to be used by everyone

The container init simply (1) allocates a shared 'ctx' and makes it
visible via 'root_task->chcekpoint_ctx', then waits for all tasks to
grab it (thus, be ready to restart), then initiates the restart chain,
and finally waits for all tasks to finish.

The other tasks, because of (a) and (b), may enter the kernel before
the container init completes its step (1). So they must first wait
for that 'ctx' to be available

> 
>> +	if (ret < 0)
>> +		goto out;
>> +
>> +	task_lock(root_task);
>> +	ctx = root_task->checkpoint_ctx;
>> +	if (ctx)
>> +		cr_ctx_get(ctx);
>> +	task_unlock(root_task);

So the other tasks work in two steps:
1) wait for the container init ->checkpoint_ctx (we don't know when !)
2) grab a reference to that 'ctx'

Step (2) ensures that our reference to 'ctx' is valid even if the
container init exits (due to error, signal etc). The last user of
the 'ctx' will be the one to actually free it.

I use the lock to protect against time-of-check to time-of-use race
(e.g.the container init may exit due to an error or a signal); the
container init also grabs the lock before clearing its ->checkpoint_ctx.

>> +
>> +	if (!ctx) {
>> +		ret = -EAGAIN;
>> +		goto out;
>> +	}
>> +
>> +	if (atomic_dec_and_test(&ctx->tasks_count))
>> +		complete(&ctx->complete);
>> +
>> +	/* wait for our turn, do the restore, and tell next task in line */
>> +	ret = cr_wait_task(ctx);
>> +	if (ret < 0)
>> +		goto out;
>> +	ret = cr_read_task(ctx);
>> +	if (ret < 0)
>> +		goto out;
>> +	ret = cr_next_task(ctx);
>> +
>> + out:
>> +	cr_ctx_put(ctx);
>> +	put_task_struct(root_task);
>> +	return ret;
>> +}
>> +
>> +static int cr_wait_all_tasks_start(struct cr_ctx *ctx)
>> +{
>> +	int ret;
>> +
>> +	if (ctx->pids_nr == 1)
>> +		return 0;
>> +
>> +	init_completion(&ctx->complete);
>> +	current->checkpoint_ctx = ctx;
>> +
>> +	wake_up_all(&cr_restart_waitq);
>> +
>> +	ret = wait_for_completion_interruptible(&ctx->complete);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	task_lock(current);
>> +	current->checkpoint_ctx = NULL;
>> +	task_unlock(current);
> 
> Who can you be racing with here?  All other tasks should have
> already dereferenced current->checkpoint_ctx.
> 
> If you want to always lock root_task around setting and
> reading of (root_task|current)->checkpoint_ctx, that's
> ok, but I think you can sufficiently argue that your
> completion is completely protecint readers from the sole
> writer.

Only lock around clearing and reading; setting is safe.

The completion is interruptible so that you can signal or kill
the container init if something goes wrong, e.g. bad checkpoint
image causes it to wait for non-existing processes forever.

At a second glance, the clearing of '->checkpoint_ctx' must
happen regardless of whether an error occurred, so the test for
'ret < 0' should follow (and not precede) it.

[...]

Thanks,

Oren.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-01-07  3:21 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-29  9:16 [RFC v12][PATCH 00/14] Kernel based checkpoint/restart Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 01/14] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-01-14 18:04   ` Balbir Singh
2009-01-14 22:31     ` Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 02/14] Checkpoint/restart: initial documentation Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 03/14] Make file_pos_read/write() public Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 04/14] General infrastructure for checkpoint restart Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 05/14] x86 support for checkpoint/restart Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 06/14] Dump memory address space Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 07/14] Restore " Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 08/14] Infrastructure for shared objects Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 09/14] Dump open file descriptors Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 10/14] Restore open file descriprtors Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 11/14] External checkpoint of a task other than ourself Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 12/14] Track in-kernel when we expect checkpoint/restart to work Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 13/14] Checkpoint multiple processes Oren Laadan
2009-01-12 23:14   ` Nathan Lynch
2009-01-14 17:27     ` Oren Laadan
2008-12-29  9:16 ` [RFC v12][PATCH 14/14] Restart " Oren Laadan
2009-01-04 20:19   ` Serge E. Hallyn
2009-01-07  3:20     ` Oren Laadan [this message]
2009-01-06 20:05 ` [RFC v12][PATCH 00/14] Kernel based checkpoint/restart Dave Hansen
2009-01-06 20:23   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49641F9B.9020803@cs.columbia.edu \
    --to=orenl@cs.columbia.edu \
    --cc=akpm@linux-foundation.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mikew@google.com \
    --cc=mingo@elte.hu \
    --cc=serue@us.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@osdl.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox