From: Oren Laadan <orenl@cs.columbia.edu>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@osdl.org>,
containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-api@vger.kernel.org, Serge Hallyn <serue@us.ibm.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Pavel Emelyanov <xemul@openvz.org>,
Alexey Dobriyan <adobriyan@gmail.com>,
Dan Smith <danms@us.ibm.com>, Oren Laadan <orenl@cs.columbia.edu>
Subject: [RFC v16][PATCH 29/43] c/r: support for UTS namespace
Date: Wed, 27 May 2009 13:32:55 -0400 [thread overview]
Message-ID: <1243445589-32388-30-git-send-email-orenl@cs.columbia.edu> (raw)
In-Reply-To: <1243445589-32388-1-git-send-email-orenl@cs.columbia.edu>
From: Dan Smith <danms@us.ibm.com>
This patch adds a "phase" of checkpoint that saves out information about any
namespaces the task(s) may have. Do this by tracking the namespace objects
of the tasks and making sure that tasks with the same namespace that follow
get properly referenced in the checkpoint stream.
Changes:
- Take uts_sem around access to uts data
- Remove the kernel restore path
- Punt on nested namespaces
- Use __NEW_UTS_LEN in nodename and domainname buffers
- Add a note to Documentation/checkpoint/internals.txt to indicate where
in the save/restore process the UTS information is kept
- Store (and track) the objref of the namespace itself instead of the
nsproxy (based on comments from Dave on IRC)
- Remove explicit check for non-root nsproxy
- Store the nodename and domainname lengths and use ckpt_write_string()
to store the actual name strings
- Catch failure of ckpt_obj_add_ptr() in ckpt_write_namespaces()
- Remove "types" bitfield and use the "is this new" flag to determine
whether or not we should write out a new ns descriptor
- Replace kernel restore path
- Move the namespace information to be directly after the task
information record
- Update Documentation to reflect new location of namespace info
- Support checkpoint and restart of nested UTS namespaces
Signed-off-by: Dan Smith <danms@us.ibm.com>
Signed-off-by: Oren Laadan <orenl@cs.columbia.edu>
---
checkpoint/checkpoint.c | 2 -
checkpoint/objhash.c | 26 +++++++
checkpoint/process.c | 160 +++++++++++++++++++++++++++++++++++++++-
include/linux/checkpoint_hdr.h | 15 ++++
4 files changed, 200 insertions(+), 3 deletions(-)
diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
index e66f82b..904f19b 100644
--- a/checkpoint/checkpoint.c
+++ b/checkpoint/checkpoint.c
@@ -310,8 +310,6 @@ static int may_checkpoint_task(struct ckpt_ctx *ctx, struct task_struct *t)
rcu_read_lock();
nsproxy = task_nsproxy(t);
- if (nsproxy->uts_ns != ctx->root_nsproxy->uts_ns)
- ret = -EPERM;
if (nsproxy->ipc_ns != ctx->root_nsproxy->ipc_ns)
ret = -EPERM;
if (nsproxy->mnt_ns != ctx->root_nsproxy->mnt_ns)
diff --git a/checkpoint/objhash.c b/checkpoint/objhash.c
index 56553ae..8b7adc6 100644
--- a/checkpoint/objhash.c
+++ b/checkpoint/objhash.c
@@ -143,6 +143,22 @@ static int obj_ns_users(void *ptr)
return atomic_read(&((struct nsproxy *) ptr)->count);
}
+static int obj_uts_ns_grab(void *ptr)
+{
+ get_uts_ns((struct uts_namespace *) ptr);
+ return 0;
+}
+
+static void obj_uts_ns_drop(void *ptr)
+{
+ put_uts_ns((struct uts_namespace *) ptr);
+}
+
+static int obj_uts_ns_users(void *ptr)
+{
+ return atomic_read(&((struct uts_namespace *) ptr)->kref.refcount);
+}
+
static struct ckpt_obj_ops ckpt_obj_ops[] = {
/* ignored object */
{
@@ -200,6 +216,16 @@ static struct ckpt_obj_ops ckpt_obj_ops[] = {
.checkpoint = checkpoint_ns,
.restore = restore_ns,
},
+ /* uts_ns object */
+ {
+ .obj_name = "UTS_NS",
+ .obj_type = CKPT_OBJ_UTS_NS,
+ .ref_drop = obj_uts_ns_drop,
+ .ref_grab = obj_uts_ns_grab,
+ .ref_users = obj_uts_ns_users,
+ .checkpoint = checkpoint_bad,
+ .restore = restore_bad,
+ },
};
diff --git a/checkpoint/process.c b/checkpoint/process.c
index fbe0d16..a827987 100644
--- a/checkpoint/process.c
+++ b/checkpoint/process.c
@@ -16,8 +16,10 @@
#include <linux/posix-timers.h>
#include <linux/futex.h>
#include <linux/poll.h>
+#include <linux/utsname.h>
#include <linux/checkpoint.h>
#include <linux/checkpoint_hdr.h>
+#include <linux/syscalls.h>
/***********************************************************************
* Checkpoint
@@ -50,10 +52,69 @@ static int checkpoint_task_struct(struct ckpt_ctx *ctx, struct task_struct *t)
return ckpt_write_string(ctx, t->comm, TASK_COMM_LEN);
}
+static int checkpoint_uts_ns(struct ckpt_ctx *ctx, struct uts_namespace *uts_ns)
+{
+ struct ckpt_hdr_utsns *h;
+ int domainname_len;
+ int nodename_len;
+ int ret;
+
+ h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_UTS_NS);
+ if (!h)
+ return -ENOMEM;
+
+ nodename_len = sizeof(uts_ns->name.nodename);
+ domainname_len = sizeof(uts_ns->name.domainname);
+
+ h->nodename_len = nodename_len;
+ h->domainname_len = domainname_len;
+
+ ret = ckpt_write_obj(ctx, &h->h);
+ ckpt_hdr_put(ctx, h);
+ if (ret < 0)
+ return ret;
+
+ down_read(&uts_sem);
+ ret = ckpt_write_string(ctx, uts_ns->name.nodename, nodename_len);
+ if (ret < 0)
+ goto up;
+ ret = ckpt_write_string(ctx, uts_ns->name.domainname, domainname_len);
+ up:
+ up_read(&uts_sem);
+ return ret;
+}
static int do_checkpoint_ns(struct ckpt_ctx *ctx, struct nsproxy *nsproxy)
{
- return 0;
+ struct ckpt_hdr_ns *h;
+ int ns_flags = 0;
+ int uts_objref;
+ int first, ret;
+
+ uts_objref = ckpt_obj_lookup_add(ctx, nsproxy->uts_ns,
+ CKPT_OBJ_UTS_NS, &first);
+ if (uts_objref <= 0)
+ return uts_objref;
+ if (first)
+ ns_flags |= CLONE_NEWUTS;
+
+ h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_NS);
+ if (!h)
+ return -ENOMEM;
+
+ h->flags = ns_flags;
+ h->uts_objref = uts_objref;
+
+ ret = ckpt_write_obj(ctx, &h->h);
+ ckpt_hdr_put(ctx, h);
+ if (ret < 0)
+ return ret;
+
+ if (ns_flags & CLONE_NEWUTS)
+ ret = checkpoint_uts_ns(ctx, nsproxy->uts_ns);
+
+ /* FIX: Write other namespaces here */
+ return ret;
}
int checkpoint_ns(struct ckpt_ctx *ctx, void *ptr)
@@ -300,10 +361,107 @@ static int restore_task_struct(struct ckpt_ctx *ctx)
return ret;
}
+static int do_restore_uts_ns(struct ckpt_ctx *ctx)
+{
+ struct ckpt_hdr_utsns *h;
+ struct uts_namespace *ns;
+ int ret;
+
+ h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_UTS_NS);
+ if (IS_ERR(h))
+ return PTR_ERR(h);
+
+ ret = -EINVAL;
+ if (h->nodename_len > sizeof(ns->name.nodename) ||
+ h->domainname_len > sizeof(ns->name.domainname))
+ goto out;
+
+ ns = current->nsproxy->uts_ns;
+
+ /* no need to take uts_sem because we are the sole users */
+
+ memset(ns->name.nodename, 0, sizeof(ns->name.nodename));
+ ret = _ckpt_read_string(ctx, ns->name.nodename, h->nodename_len);
+ if (ret < 0)
+ goto out;
+ memset(ns->name.domainname, 0, sizeof(ns->name.domainname));
+ ret = _ckpt_read_string(ctx, ns->name.domainname, h->domainname_len);
+ out:
+ ckpt_hdr_put(ctx, h);
+ return ret;
+}
+
+static int restore_uts_ns(struct ckpt_ctx *ctx, int ns_objref, int flags)
+{
+ struct uts_namespace *uts_ns;
+ int ret = 0;
+
+ uts_ns = ckpt_obj_fetch(ctx, ns_objref, CKPT_OBJ_UTS_NS);
+ if (PTR_ERR(uts_ns) == -EINVAL)
+ uts_ns = NULL;
+ else if (IS_ERR(uts_ns))
+ return PTR_ERR(uts_ns);
+
+ /* sanity: CLONE_NEWUTS if-and-only-if uts_ns is NULL (first timer) */
+ if (!!uts_ns ^ !(flags & CLONE_NEWUTS))
+ return -EINVAL;
+
+ if (!uts_ns) {
+ ret = do_restore_uts_ns(ctx);
+ if (ret < 0)
+ return ret;
+ ret = ckpt_obj_insert(ctx, current->nsproxy->uts_ns,
+ ns_objref, CKPT_OBJ_UTS_NS);
+ } else {
+ struct uts_namespace *old_uts_ns;
+
+ /* safe because nsproxy->count must be 1 ... */
+ BUG_ON(atomic_read(¤t->nsproxy->count) != 1);
+
+ old_uts_ns = current->nsproxy->uts_ns;
+ current->nsproxy->uts_ns = uts_ns;
+ get_uts_ns(uts_ns);
+ put_uts_ns(old_uts_ns);
+ }
+
+ return ret;
+}
+
static struct nsproxy *do_restore_ns(struct ckpt_ctx *ctx)
{
+ struct ckpt_hdr_ns *h;
struct nsproxy *nsproxy;
+ int ret;
+
+ h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_NS);
+ if (IS_ERR(h))
+ return (struct nsproxy *) h;
+
+ ret = -EINVAL;
+ if (h->uts_objref <= 0)
+ goto out;
+ if (h->flags & ~CLONE_NEWUTS)
+ goto out;
+ /* each unseen-before namespace will be un-shared now */
+ ret = sys_unshare(h->flags);
+ if (ret)
+ goto out;
+
+ /*
+ * For each unseen-before namespace 'xxx', it is now safe to
+ * modify the nsproxy->xxx_ns without locking because unshare()
+ * gave a brand new nsproxy and nsproxy->xxx_ns, and we're the
+ * sole users at this point.
+ */
+ ret = restore_uts_ns(ctx, h->uts_objref, h->flags);
+ ckpt_debug("uts ns: %d\n", ret);
+
+ /* FIX: add more namespaces here */
+ out:
+ ckpt_hdr_put(ctx, h);
+ if (ret < 0)
+ return ERR_PTR(ret);
nsproxy = task_nsproxy(current);
get_nsproxy(nsproxy);
return nsproxy;
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index da1ae79..1603279 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -53,6 +53,8 @@ enum {
CKPT_HDR_RESTART_BLOCK,
CKPT_HDR_THREAD,
CKPT_HDR_CPU,
+ CKPT_HDR_NS,
+ CKPT_HDR_UTS_NS,
/* 201-299: reserved for arch-dependent */
@@ -92,6 +94,7 @@ enum obj_type {
CKPT_OBJ_FILE,
CKPT_OBJ_MM,
CKPT_OBJ_NS,
+ CKPT_OBJ_UTS_NS,
CKPT_OBJ_MAX
};
@@ -160,6 +163,12 @@ struct ckpt_hdr_task_ns {
__s32 ns_objref;
} __attribute__((aligned(8)));
+struct ckpt_hdr_ns {
+ struct ckpt_hdr h;
+ __u32 flags;
+ __s32 uts_objref;
+} __attribute__((aligned(8)));
+
/* task's shared resources */
struct ckpt_hdr_task_objs {
struct ckpt_hdr h;
@@ -235,6 +244,12 @@ struct ckpt_hdr_file_pipe_state {
__s32 pipe_len;
} __attribute__((aligned(8)));
+struct ckpt_hdr_utsns {
+ struct ckpt_hdr h;
+ __u32 nodename_len;
+ __u32 domainname_len;
+} __attribute__((aligned(8)));
+
/* memory layout */
struct ckpt_hdr_mm {
struct ckpt_hdr h;
--
1.6.0.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-05-27 17:43 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-27 17:32 [RFC v16][PATCH 00/43] Kernel based checkpoint/restart Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 01/43] c/r: extend arch_setup_additional_pages() Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 02/43] c/r: make file_pos_read/write() public Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 03/43] c/r: create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 04/43] c/r: documentation Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 05/43] c/r: basic infrastructure for checkpoint/restart Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 06/43] c/r: x86_32 support " Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 07/43] c/r: infrastructure for shared objects Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 08/43] c/r: introduce '->checkpoint()' method in 'struct file_operations' Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 09/43] c/r: dump open file descriptors Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 10/43] c/r: restore " Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 11/43] c/r: add generic '->checkpoint' f_op to ext fses Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 12/43] c/r: add generic '->checkpoint()' f_op to simple devices Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 13/43] c/r: introduce method '->checkpoint()' in struct vm_operations_struct Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 14/43] c/r: dump memory address space (private memory) Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 15/43] c/r: restore " Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 16/43] c/r: export shmem_getpage() to support shared memory Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 17/43] c/r: dump anonymous- and file-mapped- " Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 18/43] c/r: restore " Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 19/43] c/r: external checkpoint of a task other than ourself Oren Laadan
2009-05-27 21:19 ` Alexey Dobriyan
2009-05-27 22:32 ` Oren Laadan
2009-05-28 16:33 ` Alexey Dobriyan
2009-05-27 17:32 ` [RFC v16][PATCH 20/43] c/r: export functionality used in next patch for restart-blocks Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 21/43] c/r: restart-blocks Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 22/43] c/r: checkpoint multiple processes Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 23/43] c/r: restart " Oren Laadan
2009-05-27 19:37 ` Alexey Dobriyan
2009-05-27 21:38 ` Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 24/43] c/r: detect resource leaks for whole-container checkpoint Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 25/43] tee: don't return 0 when another task drains/fills a pipe Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 26/43] splice: added support for pipe-to-pipe splice() Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 27/43] c/r: support for open pipes Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 28/43] c/r: make ckpt_may_checkpoint_task() check each namespace individually Oren Laadan
2009-05-27 17:32 ` Oren Laadan [this message]
2009-05-27 17:32 ` [RFC v16][PATCH 30/43] c/r: stub implementation for IPC namespace Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 31/43] deferqueue: generic queue to defer work Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 32/43] c/r (ipc): allow allocation of a desired ipc identifier Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 33/43] c/r (ipc): helpers to save and restore kern_ipc_perm structures Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 34/43] c/r: save and restore ipc namespace basics Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 35/43] c/r (ipc): export interface from ipc/shm.c to delete ipc shm Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 36/43] c/r: support share-memory sysv-ipc Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 37/43] c/r (ipc): make 'struct msg_msgseg' visible in ipc/util.h Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 38/43] c/r: support message-queues sysv-ipc Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 39/43] c/r (ipc): export interface from ipc/sem.c to cleanup ipc sem Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 40/43] c/r: support semaphore sysv-ipc Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 41/43] c/r: (s390): expose a constant for the number of words (CRs) Oren Laadan
2009-05-27 18:39 ` Alexey Dobriyan
2009-05-27 17:33 ` [RFC v16][PATCH 42/43] c/r: add CKPT_COPY() macro Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 43/43] c/r: define s390-specific checkpoint-restart code Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1243445589-32388-30-git-send-email-orenl@cs.columbia.edu \
--to=orenl@cs.columbia.edu \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=containers@lists.linux-foundation.org \
--cc=danms@us.ibm.com \
--cc=dave@linux.vnet.ibm.com \
--cc=hpa@zytor.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=serue@us.ibm.com \
--cc=torvalds@osdl.org \
--cc=viro@zeniv.linux.org.uk \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox