From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"menage@google.com" <menage@google.com>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
"lizf@cn.fujitsu.com" <lizf@cn.fujitsu.com>
Subject: [PATCH 7/9] cgroup: Support CSS ID
Date: Tue, 16 Dec 2008 18:19:09 +0900 [thread overview]
Message-ID: <20081216181909.2d500446.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20081216180936.d6b65abf.kamezawa.hiroyu@jp.fujitsu.com>
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Patch for Per-CSS ID and private hierarchy code.
This patch tries to assign a ID to each css. Attach unique ID to each
css and provides following functions.
- css_lookup(subsys, id)
returns struct cgroup of id.
- css_get_next(subsys, id, rootid, depth, foundid)
returns the next cgroup under "root" by scanning bitmap (not by tree-walk)
When cgrou_subsys->use_id is set, id field and bitmap for css is maintained.
kernel/cgroup.c just parepare
- css_id of root css for subsys
- alloc/free id functions.
So, each subsys should allocate ID in attach() callback if necessary.
There is several reasons to develop this.
- Saving space .... For example, memcg's swap_cgroup is array of
pointers to cgroup. But it is not necessary to be very fast.
By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
reduce much amount of memory usage.
- Scanning without lock.
CSS_ID provides "scan id under this ROOT" function. By this, scanning
css under root can be written without locks.
ex)
do {
rcu_read_lock();
next = cgroup_get_next(subsys, id, root, &found);
/* check sanity of next here */
css_tryget();
rcu_read_unlock();
id = found + 1
} while(...)
Characteristics:
- Each css has unique ID under subsys.
- Lifetime of ID is controlled by subsys.
- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
- scan-by-ID v.s. scan-by-tree-walk.
As /proc's pid scan does, scan-by-ID is robust when scanning is done
by following kind of routine.
scan -> rest a while(release a lock) -> conitunue from interrupted
memcg's hierarchical reclaim does this.
- When subsys->use_id is set, # of css in the system is limited to
65534.
Changelog: (v5) -> (v6)
- max depth is removed.
- changed arguments to "scan" from ID to pointer of root css.
Changelog: (v4) -> (v5)
- Totally re-designed as per-css ID.
Changelog:(v3) -> (v4)
- updated comments.
- renamed hierarchy_code[] to stack[]
- merged prepare_id routines.
Changelog (v2) -> (v3)
- removed cgroup_id_getref().
- added cgroup_id_tryget().
Changelog (v1) -> (v2):
- Design change: show only ID(integer) to outside of cgroup.c
- moved cgroup ID definition from include/ to kernel/cgroup.c
- struct cgroup_id is freed by RCU.
- changed interface from pointer to "int"
- kill_sb() is handled.
- ID 0 as unused ID.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/cgroup.h | 49 +++++++++
include/linux/idr.h | 1
kernel/cgroup.c | 251 ++++++++++++++++++++++++++++++++++++++++++++++++-
lib/idr.c | 46 ++++++++
4 files changed, 345 insertions(+), 2 deletions(-)
Index: mmotm-2.6.28-Dec15/include/linux/cgroup.h
===================================================================
--- mmotm-2.6.28-Dec15.orig/include/linux/cgroup.h
+++ mmotm-2.6.28-Dec15/include/linux/cgroup.h
@@ -15,13 +15,14 @@
#include <linux/cgroupstats.h>
#include <linux/prio_heap.h>
#include <linux/rwsem.h>
-
+#include <linux/idr.h>
#ifdef CONFIG_CGROUPS
struct cgroupfs_root;
struct cgroup_subsys;
struct inode;
struct cgroup;
+struct css_id;
extern int cgroup_init_early(void);
extern int cgroup_init(void);
@@ -59,6 +60,8 @@ struct cgroup_subsys_state {
atomic_t refcnt;
unsigned long flags;
+ /* ID for this css, if possible */
+ struct css_id *id;
};
/* bits in struct cgroup_subsys_state flags field */
@@ -360,6 +363,11 @@ struct cgroup_subsys {
int active;
int disabled;
int early_init;
+ /*
+ * set 1 if subsys uses ID. ID is not available before cgroup_init()
+ * (not available in early_init time.
+ */
+ int use_id;
#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
@@ -373,6 +381,9 @@ struct cgroup_subsys {
/* Protected by this->hierarchy_mutex and cgroup_lock() */
struct cgroupfs_root *root;
struct list_head sibling;
+ /* used when use_id == 1 */
+ struct idr idr;
+ spinlock_t id_lock;
};
#define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys;
@@ -426,6 +437,42 @@ void cgroup_iter_end(struct cgroup *cgrp
int cgroup_scan_tasks(struct cgroup_scanner *scan);
int cgroup_attach_task(struct cgroup *, struct task_struct *);
+/*
+ * CSS ID is a ID for all css struct under subsys. Only works when
+ * cgroup_subsys->use_id != 0. It can be used for look up and scanning
+ * Cgroup ID is assined at cgroup allocation (create) and removed
+ * when refcnt to ID goes down to 0. Refcnt is inremented when subsys want to
+ * avoid reuse of ID for persistent objects. In usual, refcnt to ID will be 0
+ * when cgroup is removed.
+ * This look-up and scan function should be called under rcu_read_lock().
+ * cgroup_lock() is not necessary.
+ *
+ * Note: At using ID, max depth of the hierarchy is determined by
+ * cgroup_subsys->max_id_depth.
+ */
+
+/* called at create() */
+int css_id_alloc(struct cgroup_subsys *ss, struct cgroup_subsys_state *parent,
+ struct cgroup_subsys_state *css);
+/* called at destroy(), or somewhere we can free ID */
+void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css);
+
+/* Find a cgroup which the "ID" is attached. */
+struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id);
+/*
+ * Get next cgroup under tree. Returning a cgroup which has equal or greater
+ * ID than "id" in argument.
+ */
+struct cgroup_subsys_state *css_get_next(struct cgroup_subsys *ss, int id,
+ struct cgroup_subsys_state *root, int *foundid);
+
+/* get id and depth of css */
+unsigned short css_id(struct cgroup_subsys_state *css);
+unsigned short css_depth(struct cgroup_subsys_state *css);
+/* returns non-zero if root is ancestor of cg */
+int css_is_ancestor(struct cgroup_subsys_state *cg,
+ struct cgroup_subsys_state *root);
+
#else /* !CONFIG_CGROUPS */
static inline int cgroup_init_early(void) { return 0; }
Index: mmotm-2.6.28-Dec15/kernel/cgroup.c
===================================================================
--- mmotm-2.6.28-Dec15.orig/kernel/cgroup.c
+++ mmotm-2.6.28-Dec15/kernel/cgroup.c
@@ -46,7 +46,7 @@
#include <linux/cgroupstats.h>
#include <linux/hash.h>
#include <linux/namei.h>
-
+#include <linux/idr.h>
#include <asm/atomic.h>
static DEFINE_MUTEX(cgroup_mutex);
@@ -185,6 +185,8 @@ struct cg_cgroup_link {
static struct css_set init_css_set;
static struct cg_cgroup_link init_css_set_link;
+static int cgroup_subsys_init_idr(struct cgroup_subsys *ss);
+
/* css_set_lock protects the list of css_set objects, and the
* chain of tasks off each css_set. Nests outside task->alloc_lock
* due to cgroup_iter_start() */
@@ -2684,6 +2686,8 @@ int __init cgroup_init(void)
struct cgroup_subsys *ss = subsys[i];
if (!ss->early_init)
cgroup_init_subsys(ss);
+ if (ss->use_id)
+ cgroup_subsys_init_idr(ss);
}
/* Add init_css_set to the hash table */
@@ -3215,3 +3219,248 @@ static int __init cgroup_disable(char *s
return 1;
}
__setup("cgroup_disable=", cgroup_disable);
+
+/*
+ * CSS ID
+ */
+struct css_id {
+ /*
+ * The cgroup to whiech this ID points. If cgroup is removed,
+ * this will point to NULL.
+ */
+ struct cgroup_subsys_state *css;
+ /*
+ * ID of this css.
+ */
+ unsigned short id;
+ /*
+ * Depth in hierarchy which this ID belongs to.
+ */
+ unsigned short depth;
+ /*
+ * ID is freed by RCU. (and lookup routine is RCU safe.)
+ */
+ struct rcu_head rcu_head;
+ /*
+ * Hierarchy of CSS ID belongs to.
+ */
+ unsigned short stack[0]; /* Length of this field is defined by depth */
+};
+
+/*
+ * To get ID other than 0, this should be called when !cgroup_is_removed().
+ */
+unsigned short css_id(struct cgroup_subsys_state *css)
+{
+ struct css_id *cssid = rcu_dereference(css->id);
+
+ if (cssid)
+ return cssid->id;
+ return 0;
+}
+
+unsigned short css_depth(struct cgroup_subsys_state *css)
+{
+ struct css_id *cssid = rcu_dereference(css->id);
+
+ if (cssid)
+ return cssid->depth;
+ return 0;
+}
+
+int css_is_ancestor(struct cgroup_subsys_state *css,
+ struct cgroup_subsys_state *root)
+{
+ struct css_id *id = css->id;
+ struct css_id *ans = root->id;
+
+ if (!id || !ans || (id->depth < ans->depth))
+ return 0;
+ return id->stack[ans->depth] == ans->id;
+}
+
+static void __free_css_id_cb(struct rcu_head *head)
+{
+ struct css_id *id;
+
+ id = container_of(head, struct css_id, rcu_head);
+ kfree(id);
+}
+
+void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css)
+{
+ struct css_id *id = css->id;
+
+ BUG_ON(!ss->use_id);
+
+ rcu_assign_pointer(id->css, NULL);
+ rcu_assign_pointer(css->id, NULL);
+ spin_lock(&ss->id_lock);
+ idr_remove(&ss->idr, id->id);
+ spin_unlock(&ss->id_lock);
+ call_rcu(&id->rcu_head, __free_css_id_cb);
+}
+
+
+static int __get_and_prepare_newid(struct cgroup_subsys *ss,
+ int depth, struct css_id **ret)
+{
+ struct css_id *newid;
+ int myid, error, size;
+
+ BUG_ON(!ss->use_id);
+
+ size = sizeof(struct css_id) + sizeof(unsigned short) * (depth + 1);
+ newid = kzalloc(size, GFP_KERNEL);
+ if (!newid)
+ return -ENOMEM;
+ /* get id */
+ if (unlikely(!idr_pre_get(&ss->idr, GFP_KERNEL))) {
+ error = -ENOMEM;
+ goto err_out;
+ }
+ spin_lock(&ss->id_lock);
+ /* Don't use 0 */
+ error = idr_get_new_above(&ss->idr, newid, 1, &myid);
+ spin_unlock(&ss->id_lock);
+
+ /* Returns error when there are no free spaces for new ID.*/
+ if (error) {
+ error = -ENOSPC;
+ goto err_out;
+ }
+
+ newid->id = myid;
+ newid->depth = depth;
+ *ret = newid;
+ return 0;
+err_out:
+ kfree(newid);
+ return error;
+
+}
+
+
+static int __init cgroup_subsys_init_idr(struct cgroup_subsys *ss)
+{
+ struct css_id *newid;
+ struct cgroup_subsys_state *rootcss;
+ int err = -ENOMEM;
+
+ spin_lock_init(&ss->id_lock);
+ idr_init(&ss->idr);
+
+ rootcss = init_css_set.subsys[ss->subsys_id];
+ err = __get_and_prepare_newid(ss, 0, &newid);
+ if (err)
+ return err;
+
+ newid->stack[0] = newid->id;
+ newid->css = rootcss;
+ rootcss->id = newid;
+ return 0;
+}
+
+int css_id_alloc(struct cgroup_subsys *ss,
+ struct cgroup_subsys_state *parent,
+ struct cgroup_subsys_state *css)
+{
+ int i, depth = 0;
+ struct css_id *cssid, *parent_id = NULL;
+ int error;
+
+ if (parent) {
+ parent_id = parent->id;
+ depth = parent_id->depth + 1;
+ }
+
+ error = __get_and_prepare_newid(ss, depth, &cssid);
+ if (error)
+ return error;
+
+ for (i = 0; i < depth; i++)
+ cssid->stack[i] = parent_id->stack[i];
+ cssid->stack[depth] = cssid->id;
+
+ rcu_assign_pointer(cssid->css, css);
+ rcu_assign_pointer(css->id, cssid);
+
+ return 0;
+}
+
+/**
+ * css_lookup - lookup css by id
+ * @id: the id of cgroup to be looked up
+ *
+ * Returns pointer to css if there is valid css with id, NULL if not.
+ * Should be called under rcu_read_lock()
+ */
+
+struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id)
+{
+ struct cgroup_subsys_state *css = NULL;
+ struct css_id *cssid = NULL;
+
+ BUG_ON(!ss->use_id);
+ rcu_read_lock();
+ cssid = idr_find(&ss->idr, id);
+
+ if (unlikely(!cssid))
+ goto out;
+
+ css = rcu_dereference(cssid->css);
+out:
+ rcu_read_unlock();
+ return css;
+}
+
+/**
+ * css_get_next - lookup next cgroup under specified hierarchy.
+ * @ss: pointer to subsystem
+ * @id: current position of iteration.
+ * @root: pointer to css. search tree under this.
+ * @foundid: position of found object.
+ *
+ * Search next css under the specified hierarchy of rootid. Calling under
+ * rcu_read_lock() is necessary. Returns NULL if it reaches the end.
+ */
+struct cgroup_subsys_state *
+css_get_next(struct cgroup_subsys *ss, int id,
+ struct cgroup_subsys_state *root, int *foundid)
+{
+ struct cgroup_subsys_state *ret = NULL;
+ struct css_id *tmp;
+ int tmpid;
+ int rootid = css_id(root);
+ int depth = css_depth(root);
+
+ if (!rootid)
+ return NULL;
+
+ BUG_ON(!ss->use_id);
+ rcu_read_lock();
+ tmpid = id;
+ while (1) {
+ /* scan next entry from bitmap(tree) */
+ spin_lock(&ss->id_lock);
+ tmp = idr_get_next(&ss->idr, &tmpid);
+ spin_unlock(&ss->id_lock);
+
+ if (!tmp) {
+ ret = NULL;
+ break;
+ }
+ if (tmp->depth >= depth && tmp->stack[depth] == rootid) {
+ ret = rcu_dereference(tmp->css);
+ /* Sanity check and check hierarchy */
+ if (ret && !css_is_removed(ret))
+ break;
+ }
+ tmpid = tmpid + 1;
+ }
+
+ rcu_read_unlock();
+ *foundid = tmpid;
+ return ret;
+}
+
Index: mmotm-2.6.28-Dec15/include/linux/idr.h
===================================================================
--- mmotm-2.6.28-Dec15.orig/include/linux/idr.h
+++ mmotm-2.6.28-Dec15/include/linux/idr.h
@@ -106,6 +106,7 @@ int idr_get_new(struct idr *idp, void *p
int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id);
int idr_for_each(struct idr *idp,
int (*fn)(int id, void *p, void *data), void *data);
+void *idr_get_next(struct idr *idp, int *nextid);
void *idr_replace(struct idr *idp, void *ptr, int id);
void idr_remove(struct idr *idp, int id);
void idr_remove_all(struct idr *idp);
Index: mmotm-2.6.28-Dec15/lib/idr.c
===================================================================
--- mmotm-2.6.28-Dec15.orig/lib/idr.c
+++ mmotm-2.6.28-Dec15/lib/idr.c
@@ -579,6 +579,52 @@ int idr_for_each(struct idr *idp,
EXPORT_SYMBOL(idr_for_each);
/**
+ * idr_get_next - lookup next object of id to given id.
+ * @idp: idr handle
+ * @id: pointer to lookup key
+ *
+ * Returns pointer to registered object with id, which is next number to
+ * given id.
+ */
+
+void *idr_get_next(struct idr *idp, int *nextidp)
+{
+ struct idr_layer *p, *pa[MAX_LEVEL];
+ struct idr_layer **paa = &pa[0];
+ int id = *nextidp;
+ int n, max;
+
+ /* find first ent */
+ n = idp->layers * IDR_BITS;
+ max = 1 << n;
+ p = rcu_dereference(idp->top);
+ if (!p)
+ return NULL;
+
+ while (id < max) {
+ while (n > 0 && p) {
+ n -= IDR_BITS;
+ *paa++ = p;
+ p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
+ }
+
+ if (p) {
+ *nextidp = id;
+ return p;
+ }
+
+ id += 1 << n;
+ while (n < fls(id)) {
+ n += IDR_BITS;
+ p = *--paa;
+ }
+ }
+ return NULL;
+}
+
+
+
+/**
* idr_replace - replace pointer for given id
* @idp: idr handle
* @ptr: pointer you want associated with the id
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-12-16 9:18 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-16 9:09 [PATCH] memcg updates (2008/12/16) KAMEZAWA Hiroyuki
2008-12-16 9:10 ` [PATCH 1/9] fix error path of mem_cgroup_create and refnct handling KAMEZAWA Hiroyuki
2008-12-17 2:26 ` Daisuke Nishimura
2008-12-16 9:12 ` [PATCH 2/9] cgroup hierarchy mutex KAMEZAWA Hiroyuki
2008-12-16 9:13 ` [PATCH 3/9] use hierarchy mutex in memcg KAMEZAWA Hiroyuki
2008-12-16 9:14 ` [PATCH 4/9] cgroup add css_tryget() KAMEZAWA Hiroyuki
2008-12-16 9:15 ` [PATCH 5/9] Add css_is_remvoed KAMEZAWA Hiroyuki
2008-12-16 9:17 ` [PATCH 6/9] memcg: use css_tryget() KAMEZAWA Hiroyuki
2008-12-18 1:50 ` Daisuke Nishimura
2008-12-18 2:03 ` KAMEZAWA Hiroyuki
2008-12-16 9:19 ` KAMEZAWA Hiroyuki [this message]
2008-12-16 10:24 ` [PATCH 7/9] cgroup: Support CSS ID Paul Menage
2008-12-16 11:09 ` KAMEZAWA Hiroyuki
2008-12-16 9:21 ` [PATCH 8/9] memcg: hierarchical memory reclaim by round-robin KAMEZAWA Hiroyuki
2008-12-16 9:22 ` [PATCH 9/9] memcg : fix OOM killer under hierarchy KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081216181909.2d500446.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox