* [PATCH 0/3] CGroups: Hierarchy locking/refcount changes
@ 2008-12-16 11:30 menage
2008-12-16 11:30 ` [PATCH 1/3] CGroups: Add a per-subsystem hierarchy_mutex menage
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: menage @ 2008-12-16 11:30 UTC (permalink / raw)
To: akpm, kamezawa.hiroyu, lizf; +Cc: linux-kernel, linux-mm
These patches introduce new locking/refcount support for cgroups to
reduce the need for subsystems to call cgroup_lock(). This will
ultimately allow the atomicity of cgroup_rmdir() (which was removed
recently) to be restored.
These three patches give:
1/3 - introduce a per-subsystem hierarchy_mutex which a subsystem can
use to prevent changes to its own cgroup tree
2/3 - use hierarchy_mutex in place of calling cgroup_lock() in the
memory controller
3/3 - introduce a css_tryget() function similar to the one recently
proposed by Kamezawa, but avoiding spurious refcount failures in
the event of a race between a css_tryget() and an unsuccessful
cgroup_rmdir()
Future patches will likely involve:
- using hierarchy mutex in place of cgroup_lock() in more subsystems
where appropriate
- restoring the atomicity of cgroup_rmdir() with respect to cgroup_create()
Signed-off-by: Paul Menage <menage@google.com>
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/3] CGroups: Add a per-subsystem hierarchy_mutex
2008-12-16 11:30 [PATCH 0/3] CGroups: Hierarchy locking/refcount changes menage
@ 2008-12-16 11:30 ` menage
2008-12-17 5:16 ` KAMEZAWA Hiroyuki
2008-12-16 11:30 ` [PATCH 2/3] CGroups: Use hierarchy_mutex in memory controller menage
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: menage @ 2008-12-16 11:30 UTC (permalink / raw)
To: akpm, kamezawa.hiroyu, lizf; +Cc: linux-kernel, linux-mm
[-- Attachment #1: cgroup_hierarchy_lock.patch --]
[-- Type: text/plain, Size: 5598 bytes --]
This patch adds a hierarchy_mutex to the cgroup_subsys object that
protects changes to the hierarchy observed by that subsystem. It is
taken by the cgroup subsystem (in addition to cgroup_mutex) for the
following operations:
- linking a cgroup into that subsystem's cgroup tree
- unlinking a cgroup from that subsystem's cgroup tree
- moving the subsystem to/from a hierarchy (including across the
bind() callback)
Thus if the subsystem holds its own hierarchy_mutex, it can safely
traverse its own hierarchy.
Signed-off-by: Paul Menage <menage@google.com>
---
Documentation/cgroups/cgroups.txt | 2 +-
include/linux/cgroup.h | 17 ++++++++++++++++-
kernel/cgroup.c | 37 +++++++++++++++++++++++++++++++++++--
3 files changed, 52 insertions(+), 4 deletions(-)
Index: hierarchy_lock-mmotm-2008-12-09/include/linux/cgroup.h
===================================================================
--- hierarchy_lock-mmotm-2008-12-09.orig/include/linux/cgroup.h
+++ hierarchy_lock-mmotm-2008-12-09/include/linux/cgroup.h
@@ -337,8 +337,23 @@ struct cgroup_subsys {
#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
+ /*
+ * Protects sibling/children links of cgroups in this
+ * hierarchy, plus protects which hierarchy (or none) the
+ * subsystem is a part of (i.e. root/sibling). To avoid
+ * potential deadlocks, the following operations should not be
+ * undertaken while holding any hierarchy_mutex:
+ *
+ * - allocating memory
+ * - initiating hotplug events
+ */
+ struct mutex hierarchy_mutex;
+
+ /*
+ * Link to parent, and list entry in parent's children.
+ * Protected by this->hierarchy_mutex and cgroup_lock()
+ */
struct cgroupfs_root *root;
-
struct list_head sibling;
};
Index: hierarchy_lock-mmotm-2008-12-09/kernel/cgroup.c
===================================================================
--- hierarchy_lock-mmotm-2008-12-09.orig/kernel/cgroup.c
+++ hierarchy_lock-mmotm-2008-12-09/kernel/cgroup.c
@@ -714,23 +714,26 @@ static int rebind_subsystems(struct cgro
BUG_ON(cgrp->subsys[i]);
BUG_ON(!dummytop->subsys[i]);
BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
+ mutex_lock(&ss->hierarchy_mutex);
cgrp->subsys[i] = dummytop->subsys[i];
cgrp->subsys[i]->cgroup = cgrp;
list_move(&ss->sibling, &root->subsys_list);
ss->root = root;
if (ss->bind)
ss->bind(ss, cgrp);
-
+ mutex_unlock(&ss->hierarchy_mutex);
} else if (bit & removed_bits) {
/* We're removing this subsystem */
BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
+ mutex_lock(&ss->hierarchy_mutex);
if (ss->bind)
ss->bind(ss, dummytop);
dummytop->subsys[i]->cgroup = dummytop;
cgrp->subsys[i] = NULL;
subsys[i]->root = &rootnode;
list_move(&ss->sibling, &rootnode.subsys_list);
+ mutex_unlock(&ss->hierarchy_mutex);
} else if (bit & final_bits) {
/* Subsystem state should already exist */
BUG_ON(!cgrp->subsys[i]);
@@ -2326,6 +2329,29 @@ static void init_cgroup_css(struct cgrou
cgrp->subsys[ss->subsys_id] = css;
}
+static void cgroup_lock_hierarchy(struct cgroupfs_root *root)
+{
+ /* We need to take each hierarchy_mutex in a consistent order */
+ int i;
+
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ struct cgroup_subsys *ss = subsys[i];
+ if (ss->root == root)
+ mutex_lock_nested(&ss->hierarchy_mutex, i);
+ }
+}
+
+static void cgroup_unlock_hierarchy(struct cgroupfs_root *root)
+{
+ int i;
+
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ struct cgroup_subsys *ss = subsys[i];
+ if (ss->root == root)
+ mutex_unlock(&ss->hierarchy_mutex);
+ }
+}
+
/*
* cgroup_create - create a cgroup
* @parent: cgroup that will be parent of the new cgroup
@@ -2374,7 +2400,9 @@ static long cgroup_create(struct cgroup
init_cgroup_css(css, ss, cgrp);
}
+ cgroup_lock_hierarchy(root);
list_add(&cgrp->sibling, &cgrp->parent->children);
+ cgroup_unlock_hierarchy(root);
root->number_of_cgroups++;
err = cgroup_create_dir(cgrp, dentry, mode);
@@ -2492,8 +2520,12 @@ static int cgroup_rmdir(struct inode *un
if (!list_empty(&cgrp->release_list))
list_del(&cgrp->release_list);
spin_unlock(&release_list_lock);
- /* delete my sibling from parent->children */
+
+ cgroup_lock_hierarchy(cgrp->root);
+ /* delete this cgroup from parent->children */
list_del(&cgrp->sibling);
+ cgroup_unlock_hierarchy(cgrp->root);
+
spin_lock(&cgrp->dentry->d_lock);
d = dget(cgrp->dentry);
spin_unlock(&d->d_lock);
@@ -2535,6 +2567,7 @@ static void __init cgroup_init_subsys(st
* need to invoke fork callbacks here. */
BUG_ON(!list_empty(&init_task.tasks));
+ mutex_init(&ss->hierarchy_mutex);
ss->active = 1;
}
Index: hierarchy_lock-mmotm-2008-12-09/Documentation/cgroups/cgroups.txt
===================================================================
--- hierarchy_lock-mmotm-2008-12-09.orig/Documentation/cgroups/cgroups.txt
+++ hierarchy_lock-mmotm-2008-12-09/Documentation/cgroups/cgroups.txt
@@ -528,7 +528,7 @@ example in cpusets, no task may attach b
up.
void bind(struct cgroup_subsys *ss, struct cgroup *root)
-(cgroup_mutex held by caller)
+(cgroup_mutex and ss->hierarchy_mutex held by caller)
Called when a cgroup subsystem is rebound to a different hierarchy
and root cgroup. Currently this will only involve movement between
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/3] CGroups: Use hierarchy_mutex in memory controller
2008-12-16 11:30 [PATCH 0/3] CGroups: Hierarchy locking/refcount changes menage
2008-12-16 11:30 ` [PATCH 1/3] CGroups: Add a per-subsystem hierarchy_mutex menage
@ 2008-12-16 11:30 ` menage
2008-12-17 5:17 ` KAMEZAWA Hiroyuki
2008-12-16 11:30 ` [PATCH 3/3] CGroups: Add css_tryget() menage
2008-12-17 22:03 ` [PATCH 0/3] CGroups: Hierarchy locking/refcount changes Andrew Morton
3 siblings, 1 reply; 11+ messages in thread
From: menage @ 2008-12-16 11:30 UTC (permalink / raw)
To: akpm, kamezawa.hiroyu, lizf; +Cc: linux-kernel, linux-mm
[-- Attachment #1: mm-destroy-fix.patch --]
[-- Type: text/plain, Size: 2495 bytes --]
This patch updates the memory controller to use its hierarchy_mutex
rather than calling cgroup_lock() to protected against
cgroup_mkdir()/cgroup_rmdir() from occurring in its hierarchy.
Signed-off-by: Paul Menage <menage@google.com>
---
mm/memcontrol.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
Index: hierarchy_lock-mmotm-2008-12-09/mm/memcontrol.c
===================================================================
--- hierarchy_lock-mmotm-2008-12-09.orig/mm/memcontrol.c
+++ hierarchy_lock-mmotm-2008-12-09/mm/memcontrol.c
@@ -154,7 +154,7 @@ struct mem_cgroup {
/*
* While reclaiming in a hiearchy, we cache the last child we
- * reclaimed from. Protected by cgroup_lock()
+ * reclaimed from. Protected by hierarchy_mutex
*/
struct mem_cgroup *last_scanned_child;
/*
@@ -529,7 +529,7 @@ unsigned long mem_cgroup_isolate_pages(u
/*
* This routine finds the DFS walk successor. This routine should be
- * called with cgroup_mutex held
+ * called with hierarchy_mutex held
*/
static struct mem_cgroup *
mem_cgroup_get_next_node(struct mem_cgroup *curr, struct mem_cgroup *root_mem)
@@ -598,7 +598,7 @@ mem_cgroup_get_first_node(struct mem_cgr
/*
* Scan all children under the mem_cgroup mem
*/
- cgroup_lock();
+ mutex_lock(&mem_cgroup_subsys.hierarchy_mutex);
if (list_empty(&root_mem->css.cgroup->children)) {
ret = root_mem;
goto done;
@@ -619,7 +619,7 @@ mem_cgroup_get_first_node(struct mem_cgr
done:
root_mem->last_scanned_child = ret;
- cgroup_unlock();
+ mutex_unlock(&mem_cgroup_subsys.hierarchy_mutex);
return ret;
}
@@ -683,18 +683,16 @@ static int mem_cgroup_hierarchical_recla
while (next_mem != root_mem) {
if (next_mem->obsolete) {
mem_cgroup_put(next_mem);
- cgroup_lock();
next_mem = mem_cgroup_get_first_node(root_mem);
- cgroup_unlock();
continue;
}
ret = try_to_free_mem_cgroup_pages(next_mem, gfp_mask, noswap,
get_swappiness(next_mem));
if (mem_cgroup_check_under_limit(root_mem))
return 0;
- cgroup_lock();
+ mutex_lock(&mem_cgroup_subsys.hierarchy_mutex);
next_mem = mem_cgroup_get_next_node(next_mem, root_mem);
- cgroup_unlock();
+ mutex_unlock(&mem_cgroup_subsys.hierarchy_mutex);
}
return ret;
}
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/3] CGroups: Add css_tryget()
2008-12-16 11:30 [PATCH 0/3] CGroups: Hierarchy locking/refcount changes menage
2008-12-16 11:30 ` [PATCH 1/3] CGroups: Add a per-subsystem hierarchy_mutex menage
2008-12-16 11:30 ` [PATCH 2/3] CGroups: Use hierarchy_mutex in memory controller menage
@ 2008-12-16 11:30 ` menage
2008-12-17 5:19 ` KAMEZAWA Hiroyuki
2008-12-17 22:07 ` Andrew Morton
2008-12-17 22:03 ` [PATCH 0/3] CGroups: Hierarchy locking/refcount changes Andrew Morton
3 siblings, 2 replies; 11+ messages in thread
From: menage @ 2008-12-16 11:30 UTC (permalink / raw)
To: akpm, kamezawa.hiroyu, lizf; +Cc: linux-kernel, linux-mm
[-- Attachment #1: cgroup_refcnt.patch --]
[-- Type: text/plain, Size: 6843 bytes --]
This patch adds css_tryget(), that obtains a counted reference on a
CSS. It is used in situations where the caller has a "weak" reference
to the CSS, i.e. one that does not protect the cgroup from removal via
a reference count, but would instead be cleaned up by a destroy()
callback.
css_tryget() will return true on success, or false if the cgroup is
being removed.
This is similar to Kamezawa Hiroyuki's patch from a week or two ago,
but with the difference that in the event of css_tryget() racing with
a cgroup_rmdir(), css_tryget() will only return false if the cgroup
really does get removed.
This implementation is done by biasing css->refcnt, so that a refcnt
of 1 means "releasable" and 0 means "released or releasing". In the
event of a race, css_tryget() distinguishes between "released" and
"releasing" by checking for the CSS_REMOVED flag in css->flags.
Signed-off-by: Paul Menage <menage@google.com>
---
include/linux/cgroup.h | 38 +++++++++++++++++++++++++-----
kernel/cgroup.c | 61 ++++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 88 insertions(+), 11 deletions(-)
Index: hierarchy_lock-mmotm-2008-12-09/include/linux/cgroup.h
===================================================================
--- hierarchy_lock-mmotm-2008-12-09.orig/include/linux/cgroup.h
+++ hierarchy_lock-mmotm-2008-12-09/include/linux/cgroup.h
@@ -52,9 +52,9 @@ struct cgroup_subsys_state {
* hierarchy structure */
struct cgroup *cgroup;
- /* State maintained by the cgroup system to allow
- * subsystems to be "busy". Should be accessed via css_get()
- * and css_put() */
+ /* State maintained by the cgroup system to allow subsystems
+ * to be "busy". Should be accessed via css_get(),
+ * css_tryget() and and css_put(). */
atomic_t refcnt;
@@ -64,11 +64,14 @@ struct cgroup_subsys_state {
/* bits in struct cgroup_subsys_state flags field */
enum {
CSS_ROOT, /* This CSS is the root of the subsystem */
+ CSS_REMOVED, /* This CSS is dead */
};
/*
- * Call css_get() to hold a reference on the cgroup;
- *
+ * Call css_get() to hold a reference on the css; it can be used
+ * for a reference obtained via:
+ * - an existing ref-counted reference to the css
+ * - task->cgroups for a locked task
*/
static inline void css_get(struct cgroup_subsys_state *css)
@@ -77,9 +80,32 @@ static inline void css_get(struct cgroup
if (!test_bit(CSS_ROOT, &css->flags))
atomic_inc(&css->refcnt);
}
+
+static inline bool css_is_removed(struct cgroup_subsys_state *css)
+{
+ return test_bit(CSS_REMOVED, &css->flags);
+}
+
+/*
+ * Call css_tryget() to take a reference on a css if your existing
+ * (known-valid) reference isn't already ref-counted. Returns false if
+ * the css has been destroyed.
+ */
+
+static inline bool css_tryget(struct cgroup_subsys_state *css)
+{
+ if (test_bit(CSS_ROOT, &css->flags))
+ return true;
+ while (!atomic_inc_not_zero(&css->refcnt)) {
+ if (test_bit(CSS_REMOVED, &css->flags))
+ return false;
+ }
+ return true;
+}
+
/*
* css_put() should be called to release a reference taken by
- * css_get()
+ * css_get() or css_tryget()
*/
extern void __css_put(struct cgroup_subsys_state *css);
Index: hierarchy_lock-mmotm-2008-12-09/kernel/cgroup.c
===================================================================
--- hierarchy_lock-mmotm-2008-12-09.orig/kernel/cgroup.c
+++ hierarchy_lock-mmotm-2008-12-09/kernel/cgroup.c
@@ -2321,7 +2321,7 @@ static void init_cgroup_css(struct cgrou
struct cgroup *cgrp)
{
css->cgroup = cgrp;
- atomic_set(&css->refcnt, 0);
+ atomic_set(&css->refcnt, 1);
css->flags = 0;
if (cgrp == dummytop)
set_bit(CSS_ROOT, &css->flags);
@@ -2453,7 +2453,7 @@ static int cgroup_has_css_refs(struct cg
{
/* Check the reference count on each subsystem. Since we
* already established that there are no tasks in the
- * cgroup, if the css refcount is also 0, then there should
+ * cgroup, if the css refcount is also 1, then there should
* be no outstanding references, so the subsystem is safe to
* destroy. We scan across all subsystems rather than using
* the per-hierarchy linked list of mounted subsystems since
@@ -2474,12 +2474,62 @@ static int cgroup_has_css_refs(struct cg
* matter, since it can only happen if the cgroup
* has been deleted and hence no longer needs the
* release agent to be called anyway. */
- if (css && atomic_read(&css->refcnt))
+ if (css && (atomic_read(&css->refcnt) > 1))
return 1;
}
return 0;
}
+/*
+ * Atomically mark all (or else none) of the cgroup's CSS objects as
+ * CSS_REMOVED. Return true on success, or false if the cgroup has
+ * busy subsystems. Call with cgroup_mutex held
+ */
+
+static int cgroup_clear_css_refs(struct cgroup *cgrp)
+{
+ struct cgroup_subsys *ss;
+ unsigned long flags;
+ bool failed = false;
+ local_irq_save(flags);
+ for_each_subsys(cgrp->root, ss) {
+ struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
+ int refcnt;
+ do {
+ /* We can only remove a CSS with a refcnt==1 */
+ refcnt = atomic_read(&css->refcnt);
+ if (refcnt > 1) {
+ failed = true;
+ goto done;
+ }
+ BUG_ON(!refcnt);
+ /*
+ * Drop the refcnt to 0 while we check other
+ * subsystems. This will cause any racing
+ * css_tryget() to spin until we set the
+ * CSS_REMOVED bits or abort
+ */
+ } while (atomic_cmpxchg(&css->refcnt, refcnt, 0) != refcnt);
+ }
+ done:
+ for_each_subsys(cgrp->root, ss) {
+ struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
+ if (failed) {
+ /*
+ * Restore old refcnt if we previously managed
+ * to clear it from 1 to 0
+ */
+ if (!atomic_read(&css->refcnt))
+ atomic_set(&css->refcnt, 1);
+ } else {
+ /* Commit the fact that the CSS is removed */
+ set_bit(CSS_REMOVED, &css->flags);
+ }
+ }
+ local_irq_restore(flags);
+ return !failed;
+}
+
static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
{
struct cgroup *cgrp = dentry->d_fsdata;
@@ -2510,7 +2560,7 @@ static int cgroup_rmdir(struct inode *un
if (atomic_read(&cgrp->count)
|| !list_empty(&cgrp->children)
- || cgroup_has_css_refs(cgrp)) {
+ || !cgroup_clear_css_refs(cgrp)) {
mutex_unlock(&cgroup_mutex);
return -EBUSY;
}
@@ -3065,7 +3115,8 @@ void __css_put(struct cgroup_subsys_stat
{
struct cgroup *cgrp = css->cgroup;
rcu_read_lock();
- if (atomic_dec_and_test(&css->refcnt) && notify_on_release(cgrp)) {
+ if ((atomic_dec_return(&css->refcnt) == 1) &&
+ notify_on_release(cgrp)) {
set_bit(CGRP_RELEASABLE, &cgrp->flags);
check_for_release(cgrp);
}
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] CGroups: Add a per-subsystem hierarchy_mutex
2008-12-16 11:30 ` [PATCH 1/3] CGroups: Add a per-subsystem hierarchy_mutex menage
@ 2008-12-17 5:16 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-17 5:16 UTC (permalink / raw)
To: menage; +Cc: akpm, lizf, linux-kernel, linux-mm
On Tue, 16 Dec 2008 03:30:56 -0800
menage@google.com wrote:
> This patch adds a hierarchy_mutex to the cgroup_subsys object that
> protects changes to the hierarchy observed by that subsystem. It is
> taken by the cgroup subsystem (in addition to cgroup_mutex) for the
> following operations:
>
> - linking a cgroup into that subsystem's cgroup tree
> - unlinking a cgroup from that subsystem's cgroup tree
> - moving the subsystem to/from a hierarchy (including across the
> bind() callback)
>
> Thus if the subsystem holds its own hierarchy_mutex, it can safely
> traverse its own hierarchy.
>
> Signed-off-by: Paul Menage <menage@google.com>
>
Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>
> Documentation/cgroups/cgroups.txt | 2 +-
> include/linux/cgroup.h | 17 ++++++++++++++++-
> kernel/cgroup.c | 37 +++++++++++++++++++++++++++++++++++--
> 3 files changed, 52 insertions(+), 4 deletions(-)
>
> Index: hierarchy_lock-mmotm-2008-12-09/include/linux/cgroup.h
> ===================================================================
> --- hierarchy_lock-mmotm-2008-12-09.orig/include/linux/cgroup.h
> +++ hierarchy_lock-mmotm-2008-12-09/include/linux/cgroup.h
> @@ -337,8 +337,23 @@ struct cgroup_subsys {
> #define MAX_CGROUP_TYPE_NAMELEN 32
> const char *name;
>
> + /*
> + * Protects sibling/children links of cgroups in this
> + * hierarchy, plus protects which hierarchy (or none) the
> + * subsystem is a part of (i.e. root/sibling). To avoid
> + * potential deadlocks, the following operations should not be
> + * undertaken while holding any hierarchy_mutex:
> + *
> + * - allocating memory
> + * - initiating hotplug events
> + */
> + struct mutex hierarchy_mutex;
> +
> + /*
> + * Link to parent, and list entry in parent's children.
> + * Protected by this->hierarchy_mutex and cgroup_lock()
> + */
> struct cgroupfs_root *root;
> -
> struct list_head sibling;
> };
>
> Index: hierarchy_lock-mmotm-2008-12-09/kernel/cgroup.c
> ===================================================================
> --- hierarchy_lock-mmotm-2008-12-09.orig/kernel/cgroup.c
> +++ hierarchy_lock-mmotm-2008-12-09/kernel/cgroup.c
> @@ -714,23 +714,26 @@ static int rebind_subsystems(struct cgro
> BUG_ON(cgrp->subsys[i]);
> BUG_ON(!dummytop->subsys[i]);
> BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
> + mutex_lock(&ss->hierarchy_mutex);
> cgrp->subsys[i] = dummytop->subsys[i];
> cgrp->subsys[i]->cgroup = cgrp;
> list_move(&ss->sibling, &root->subsys_list);
> ss->root = root;
> if (ss->bind)
> ss->bind(ss, cgrp);
> -
> + mutex_unlock(&ss->hierarchy_mutex);
> } else if (bit & removed_bits) {
> /* We're removing this subsystem */
> BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
> BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
> + mutex_lock(&ss->hierarchy_mutex);
> if (ss->bind)
> ss->bind(ss, dummytop);
> dummytop->subsys[i]->cgroup = dummytop;
> cgrp->subsys[i] = NULL;
> subsys[i]->root = &rootnode;
> list_move(&ss->sibling, &rootnode.subsys_list);
> + mutex_unlock(&ss->hierarchy_mutex);
> } else if (bit & final_bits) {
> /* Subsystem state should already exist */
> BUG_ON(!cgrp->subsys[i]);
> @@ -2326,6 +2329,29 @@ static void init_cgroup_css(struct cgrou
> cgrp->subsys[ss->subsys_id] = css;
> }
>
> +static void cgroup_lock_hierarchy(struct cgroupfs_root *root)
> +{
> + /* We need to take each hierarchy_mutex in a consistent order */
> + int i;
> +
> + for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> + struct cgroup_subsys *ss = subsys[i];
> + if (ss->root == root)
> + mutex_lock_nested(&ss->hierarchy_mutex, i);
> + }
> +}
> +
> +static void cgroup_unlock_hierarchy(struct cgroupfs_root *root)
> +{
> + int i;
> +
> + for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> + struct cgroup_subsys *ss = subsys[i];
> + if (ss->root == root)
> + mutex_unlock(&ss->hierarchy_mutex);
> + }
> +}
> +
> /*
> * cgroup_create - create a cgroup
> * @parent: cgroup that will be parent of the new cgroup
> @@ -2374,7 +2400,9 @@ static long cgroup_create(struct cgroup
> init_cgroup_css(css, ss, cgrp);
> }
>
> + cgroup_lock_hierarchy(root);
> list_add(&cgrp->sibling, &cgrp->parent->children);
> + cgroup_unlock_hierarchy(root);
> root->number_of_cgroups++;
>
> err = cgroup_create_dir(cgrp, dentry, mode);
> @@ -2492,8 +2520,12 @@ static int cgroup_rmdir(struct inode *un
> if (!list_empty(&cgrp->release_list))
> list_del(&cgrp->release_list);
> spin_unlock(&release_list_lock);
> - /* delete my sibling from parent->children */
> +
> + cgroup_lock_hierarchy(cgrp->root);
> + /* delete this cgroup from parent->children */
> list_del(&cgrp->sibling);
> + cgroup_unlock_hierarchy(cgrp->root);
> +
> spin_lock(&cgrp->dentry->d_lock);
> d = dget(cgrp->dentry);
> spin_unlock(&d->d_lock);
> @@ -2535,6 +2567,7 @@ static void __init cgroup_init_subsys(st
> * need to invoke fork callbacks here. */
> BUG_ON(!list_empty(&init_task.tasks));
>
> + mutex_init(&ss->hierarchy_mutex);
> ss->active = 1;
> }
>
> Index: hierarchy_lock-mmotm-2008-12-09/Documentation/cgroups/cgroups.txt
> ===================================================================
> --- hierarchy_lock-mmotm-2008-12-09.orig/Documentation/cgroups/cgroups.txt
> +++ hierarchy_lock-mmotm-2008-12-09/Documentation/cgroups/cgroups.txt
> @@ -528,7 +528,7 @@ example in cpusets, no task may attach b
> up.
>
> void bind(struct cgroup_subsys *ss, struct cgroup *root)
> -(cgroup_mutex held by caller)
> +(cgroup_mutex and ss->hierarchy_mutex held by caller)
>
> Called when a cgroup subsystem is rebound to a different hierarchy
> and root cgroup. Currently this will only involve movement between
>
> --
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] CGroups: Use hierarchy_mutex in memory controller
2008-12-16 11:30 ` [PATCH 2/3] CGroups: Use hierarchy_mutex in memory controller menage
@ 2008-12-17 5:17 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-17 5:17 UTC (permalink / raw)
To: menage; +Cc: akpm, lizf, linux-kernel, linux-mm
On Tue, 16 Dec 2008 03:30:57 -0800
menage@google.com wrote:
> This patch updates the memory controller to use its hierarchy_mutex
> rather than calling cgroup_lock() to protected against
> cgroup_mkdir()/cgroup_rmdir() from occurring in its hierarchy.
>
> Signed-off-by: Paul Menage <menage@google.com>
>
Not doing any special test but passed usual test.
Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> mm/memcontrol.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
> Index: hierarchy_lock-mmotm-2008-12-09/mm/memcontrol.c
> ===================================================================
> --- hierarchy_lock-mmotm-2008-12-09.orig/mm/memcontrol.c
> +++ hierarchy_lock-mmotm-2008-12-09/mm/memcontrol.c
> @@ -154,7 +154,7 @@ struct mem_cgroup {
>
> /*
> * While reclaiming in a hiearchy, we cache the last child we
> - * reclaimed from. Protected by cgroup_lock()
> + * reclaimed from. Protected by hierarchy_mutex
> */
> struct mem_cgroup *last_scanned_child;
> /*
> @@ -529,7 +529,7 @@ unsigned long mem_cgroup_isolate_pages(u
>
> /*
> * This routine finds the DFS walk successor. This routine should be
> - * called with cgroup_mutex held
> + * called with hierarchy_mutex held
> */
> static struct mem_cgroup *
> mem_cgroup_get_next_node(struct mem_cgroup *curr, struct mem_cgroup *root_mem)
> @@ -598,7 +598,7 @@ mem_cgroup_get_first_node(struct mem_cgr
> /*
> * Scan all children under the mem_cgroup mem
> */
> - cgroup_lock();
> + mutex_lock(&mem_cgroup_subsys.hierarchy_mutex);
> if (list_empty(&root_mem->css.cgroup->children)) {
> ret = root_mem;
> goto done;
> @@ -619,7 +619,7 @@ mem_cgroup_get_first_node(struct mem_cgr
>
> done:
> root_mem->last_scanned_child = ret;
> - cgroup_unlock();
> + mutex_unlock(&mem_cgroup_subsys.hierarchy_mutex);
> return ret;
> }
>
> @@ -683,18 +683,16 @@ static int mem_cgroup_hierarchical_recla
> while (next_mem != root_mem) {
> if (next_mem->obsolete) {
> mem_cgroup_put(next_mem);
> - cgroup_lock();
> next_mem = mem_cgroup_get_first_node(root_mem);
> - cgroup_unlock();
> continue;
> }
> ret = try_to_free_mem_cgroup_pages(next_mem, gfp_mask, noswap,
> get_swappiness(next_mem));
> if (mem_cgroup_check_under_limit(root_mem))
> return 0;
> - cgroup_lock();
> + mutex_lock(&mem_cgroup_subsys.hierarchy_mutex);
> next_mem = mem_cgroup_get_next_node(next_mem, root_mem);
> - cgroup_unlock();
> + mutex_unlock(&mem_cgroup_subsys.hierarchy_mutex);
> }
> return ret;
> }
>
> --
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] CGroups: Add css_tryget()
2008-12-16 11:30 ` [PATCH 3/3] CGroups: Add css_tryget() menage
@ 2008-12-17 5:19 ` KAMEZAWA Hiroyuki
2008-12-17 22:07 ` Andrew Morton
1 sibling, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-17 5:19 UTC (permalink / raw)
To: menage; +Cc: akpm, lizf, linux-kernel, linux-mm
On Tue, 16 Dec 2008 03:30:58 -0800
menage@google.com wrote:
> This patch adds css_tryget(), that obtains a counted reference on a
> CSS. It is used in situations where the caller has a "weak" reference
> to the CSS, i.e. one that does not protect the cgroup from removal via
> a reference count, but would instead be cleaned up by a destroy()
> callback.
>
> css_tryget() will return true on success, or false if the cgroup is
> being removed.
>
> This is similar to Kamezawa Hiroyuki's patch from a week or two ago,
> but with the difference that in the event of css_tryget() racing with
> a cgroup_rmdir(), css_tryget() will only return false if the cgroup
> really does get removed.
>
> This implementation is done by biasing css->refcnt, so that a refcnt
> of 1 means "releasable" and 0 means "released or releasing". In the
> event of a race, css_tryget() distinguishes between "released" and
> "releasing" by checking for the CSS_REMOVED flag in css->flags.
>
> Signed-off-by: Paul Menage <menage@google.com>
>
mkdir/rmdir works well. I'll write the user of this patch "css_tryget()"
in memcg.
Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> include/linux/cgroup.h | 38 +++++++++++++++++++++++++-----
> kernel/cgroup.c | 61 ++++++++++++++++++++++++++++++++++++++++++++-----
> 2 files changed, 88 insertions(+), 11 deletions(-)
>
> Index: hierarchy_lock-mmotm-2008-12-09/include/linux/cgroup.h
> ===================================================================
> --- hierarchy_lock-mmotm-2008-12-09.orig/include/linux/cgroup.h
> +++ hierarchy_lock-mmotm-2008-12-09/include/linux/cgroup.h
> @@ -52,9 +52,9 @@ struct cgroup_subsys_state {
> * hierarchy structure */
> struct cgroup *cgroup;
>
> - /* State maintained by the cgroup system to allow
> - * subsystems to be "busy". Should be accessed via css_get()
> - * and css_put() */
> + /* State maintained by the cgroup system to allow subsystems
> + * to be "busy". Should be accessed via css_get(),
> + * css_tryget() and and css_put(). */
>
> atomic_t refcnt;
>
> @@ -64,11 +64,14 @@ struct cgroup_subsys_state {
> /* bits in struct cgroup_subsys_state flags field */
> enum {
> CSS_ROOT, /* This CSS is the root of the subsystem */
> + CSS_REMOVED, /* This CSS is dead */
> };
>
> /*
> - * Call css_get() to hold a reference on the cgroup;
> - *
> + * Call css_get() to hold a reference on the css; it can be used
> + * for a reference obtained via:
> + * - an existing ref-counted reference to the css
> + * - task->cgroups for a locked task
> */
>
> static inline void css_get(struct cgroup_subsys_state *css)
> @@ -77,9 +80,32 @@ static inline void css_get(struct cgroup
> if (!test_bit(CSS_ROOT, &css->flags))
> atomic_inc(&css->refcnt);
> }
> +
> +static inline bool css_is_removed(struct cgroup_subsys_state *css)
> +{
> + return test_bit(CSS_REMOVED, &css->flags);
> +}
> +
> +/*
> + * Call css_tryget() to take a reference on a css if your existing
> + * (known-valid) reference isn't already ref-counted. Returns false if
> + * the css has been destroyed.
> + */
> +
> +static inline bool css_tryget(struct cgroup_subsys_state *css)
> +{
> + if (test_bit(CSS_ROOT, &css->flags))
> + return true;
> + while (!atomic_inc_not_zero(&css->refcnt)) {
> + if (test_bit(CSS_REMOVED, &css->flags))
> + return false;
> + }
> + return true;
> +}
> +
> /*
> * css_put() should be called to release a reference taken by
> - * css_get()
> + * css_get() or css_tryget()
> */
>
> extern void __css_put(struct cgroup_subsys_state *css);
> Index: hierarchy_lock-mmotm-2008-12-09/kernel/cgroup.c
> ===================================================================
> --- hierarchy_lock-mmotm-2008-12-09.orig/kernel/cgroup.c
> +++ hierarchy_lock-mmotm-2008-12-09/kernel/cgroup.c
> @@ -2321,7 +2321,7 @@ static void init_cgroup_css(struct cgrou
> struct cgroup *cgrp)
> {
> css->cgroup = cgrp;
> - atomic_set(&css->refcnt, 0);
> + atomic_set(&css->refcnt, 1);
> css->flags = 0;
> if (cgrp == dummytop)
> set_bit(CSS_ROOT, &css->flags);
> @@ -2453,7 +2453,7 @@ static int cgroup_has_css_refs(struct cg
> {
> /* Check the reference count on each subsystem. Since we
> * already established that there are no tasks in the
> - * cgroup, if the css refcount is also 0, then there should
> + * cgroup, if the css refcount is also 1, then there should
> * be no outstanding references, so the subsystem is safe to
> * destroy. We scan across all subsystems rather than using
> * the per-hierarchy linked list of mounted subsystems since
> @@ -2474,12 +2474,62 @@ static int cgroup_has_css_refs(struct cg
> * matter, since it can only happen if the cgroup
> * has been deleted and hence no longer needs the
> * release agent to be called anyway. */
> - if (css && atomic_read(&css->refcnt))
> + if (css && (atomic_read(&css->refcnt) > 1))
> return 1;
> }
> return 0;
> }
>
> +/*
> + * Atomically mark all (or else none) of the cgroup's CSS objects as
> + * CSS_REMOVED. Return true on success, or false if the cgroup has
> + * busy subsystems. Call with cgroup_mutex held
> + */
> +
> +static int cgroup_clear_css_refs(struct cgroup *cgrp)
> +{
> + struct cgroup_subsys *ss;
> + unsigned long flags;
> + bool failed = false;
> + local_irq_save(flags);
> + for_each_subsys(cgrp->root, ss) {
> + struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
> + int refcnt;
> + do {
> + /* We can only remove a CSS with a refcnt==1 */
> + refcnt = atomic_read(&css->refcnt);
> + if (refcnt > 1) {
> + failed = true;
> + goto done;
> + }
> + BUG_ON(!refcnt);
> + /*
> + * Drop the refcnt to 0 while we check other
> + * subsystems. This will cause any racing
> + * css_tryget() to spin until we set the
> + * CSS_REMOVED bits or abort
> + */
> + } while (atomic_cmpxchg(&css->refcnt, refcnt, 0) != refcnt);
> + }
> + done:
> + for_each_subsys(cgrp->root, ss) {
> + struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
> + if (failed) {
> + /*
> + * Restore old refcnt if we previously managed
> + * to clear it from 1 to 0
> + */
> + if (!atomic_read(&css->refcnt))
> + atomic_set(&css->refcnt, 1);
> + } else {
> + /* Commit the fact that the CSS is removed */
> + set_bit(CSS_REMOVED, &css->flags);
> + }
> + }
> + local_irq_restore(flags);
> + return !failed;
> +}
> +
> static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
> {
> struct cgroup *cgrp = dentry->d_fsdata;
> @@ -2510,7 +2560,7 @@ static int cgroup_rmdir(struct inode *un
>
> if (atomic_read(&cgrp->count)
> || !list_empty(&cgrp->children)
> - || cgroup_has_css_refs(cgrp)) {
> + || !cgroup_clear_css_refs(cgrp)) {
> mutex_unlock(&cgroup_mutex);
> return -EBUSY;
> }
> @@ -3065,7 +3115,8 @@ void __css_put(struct cgroup_subsys_stat
> {
> struct cgroup *cgrp = css->cgroup;
> rcu_read_lock();
> - if (atomic_dec_and_test(&css->refcnt) && notify_on_release(cgrp)) {
> + if ((atomic_dec_return(&css->refcnt) == 1) &&
> + notify_on_release(cgrp)) {
> set_bit(CGRP_RELEASABLE, &cgrp->flags);
> check_for_release(cgrp);
> }
>
> --
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] CGroups: Hierarchy locking/refcount changes
2008-12-16 11:30 [PATCH 0/3] CGroups: Hierarchy locking/refcount changes menage
` (2 preceding siblings ...)
2008-12-16 11:30 ` [PATCH 3/3] CGroups: Add css_tryget() menage
@ 2008-12-17 22:03 ` Andrew Morton
2008-12-19 2:06 ` Paul Menage
3 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2008-12-17 22:03 UTC (permalink / raw)
To: menage; +Cc: kamezawa.hiroyu, lizf, linux-kernel, linux-mm
On Tue, 16 Dec 2008 03:30:55 -0800
menage@google.com wrote:
> These patches introduce new locking/refcount support for cgroups to
> reduce the need for subsystems to call cgroup_lock(). This will
> ultimately allow the atomicity of cgroup_rmdir() (which was removed
> recently) to be restored.
OK, they merged OK. We're accumulating rather a lot of cgroups work.
I have a question mark over these:
cgroups-make-root_list-contains-active-hierarchies-only.patch
cgroups-add-inactive-subsystems-to-rootnodesubsys_list.patch
cgroups-add-inactive-subsystems-to-rootnodesubsys_list-fix.patch
cgroups-introduce-link_css_set-to-remove-duplicate-code.patch
cgroups-introduce-link_css_set-to-remove-duplicate-code-fix.patch
it wasn't clear to me whether you still had issues with them, or
whether updates were expected?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] CGroups: Add css_tryget()
2008-12-16 11:30 ` [PATCH 3/3] CGroups: Add css_tryget() menage
2008-12-17 5:19 ` KAMEZAWA Hiroyuki
@ 2008-12-17 22:07 ` Andrew Morton
2008-12-17 22:48 ` Paul Menage
1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2008-12-17 22:07 UTC (permalink / raw)
To: menage; +Cc: kamezawa.hiroyu, lizf, linux-kernel, linux-mm
On Tue, 16 Dec 2008 03:30:58 -0800
menage@google.com wrote:
> This patch adds css_tryget(), that obtains a counted reference on a
> CSS. It is used in situations where the caller has a "weak" reference
> to the CSS, i.e. one that does not protect the cgroup from removal via
> a reference count, but would instead be cleaned up by a destroy()
> callback.
>
> css_tryget() will return true on success, or false if the cgroup is
> being removed.
>
> This is similar to Kamezawa Hiroyuki's patch from a week or two ago,
> but with the difference that in the event of css_tryget() racing with
> a cgroup_rmdir(), css_tryget() will only return false if the cgroup
> really does get removed.
>
> This implementation is done by biasing css->refcnt, so that a refcnt
> of 1 means "releasable" and 0 means "released or releasing". In the
> event of a race, css_tryget() distinguishes between "released" and
> "releasing" by checking for the CSS_REMOVED flag in css->flags.
>
> ...
>
> --- hierarchy_lock-mmotm-2008-12-09.orig/include/linux/cgroup.h
> +++ hierarchy_lock-mmotm-2008-12-09/include/linux/cgroup.h
> @@ -52,9 +52,9 @@ struct cgroup_subsys_state {
> * hierarchy structure */
> struct cgroup *cgroup;
>
> - /* State maintained by the cgroup system to allow
> - * subsystems to be "busy". Should be accessed via css_get()
> - * and css_put() */
> + /* State maintained by the cgroup system to allow subsystems
> + * to be "busy". Should be accessed via css_get(),
> + * css_tryget() and and css_put(). */
nanonit. This layout:
/*
* State maintained by the cgroup system to allow subsystems
* to be "busy". Should be accessed via css_get(),
* css_tryget() and and css_put().
*/
is conventional/preferred.
> atomic_t refcnt;
>
> @@ -64,11 +64,14 @@ struct cgroup_subsys_state {
> /* bits in struct cgroup_subsys_state flags field */
> enum {
> CSS_ROOT, /* This CSS is the root of the subsystem */
> + CSS_REMOVED, /* This CSS is dead */
> };
>
> /*
> - * Call css_get() to hold a reference on the cgroup;
> - *
> + * Call css_get() to hold a reference on the css; it can be used
> + * for a reference obtained via:
> + * - an existing ref-counted reference to the css
> + * - task->cgroups for a locked task
> */
>
> static inline void css_get(struct cgroup_subsys_state *css)
> @@ -77,9 +80,32 @@ static inline void css_get(struct cgroup
> if (!test_bit(CSS_ROOT, &css->flags))
> atomic_inc(&css->refcnt);
> }
> +
> +static inline bool css_is_removed(struct cgroup_subsys_state *css)
> +{
> + return test_bit(CSS_REMOVED, &css->flags);
> +}
> +
> +/*
> + * Call css_tryget() to take a reference on a css if your existing
> + * (known-valid) reference isn't already ref-counted. Returns false if
> + * the css has been destroyed.
> + */
> +
> +static inline bool css_tryget(struct cgroup_subsys_state *css)
> +{
> + if (test_bit(CSS_ROOT, &css->flags))
> + return true;
> + while (!atomic_inc_not_zero(&css->refcnt)) {
> + if (test_bit(CSS_REMOVED, &css->flags))
> + return false;
> + }
> + return true;
> +}
This looks too large to inline.
We should have a cpu_relax() in the loop?
And possibly a cond_resched().
It would be better if these polling loops didn't exist at all, of
course. But I guess if you could work out a way of doing that, this
patch wouldn't exist.
>
> ...
>
> +/*
> + * Atomically mark all (or else none) of the cgroup's CSS objects as
> + * CSS_REMOVED. Return true on success, or false if the cgroup has
> + * busy subsystems. Call with cgroup_mutex held
> + */
> +
> +static int cgroup_clear_css_refs(struct cgroup *cgrp)
> +{
> + struct cgroup_subsys *ss;
> + unsigned long flags;
> + bool failed = false;
> + local_irq_save(flags);
please put a blank line between end-of-locals and start-of-code.
> + for_each_subsys(cgrp->root, ss) {
> + struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
> + int refcnt;
> + do {
> + /* We can only remove a CSS with a refcnt==1 */
> + refcnt = atomic_read(&css->refcnt);
> + if (refcnt > 1) {
> + failed = true;
> + goto done;
> + }
> + BUG_ON(!refcnt);
> + /*
> + * Drop the refcnt to 0 while we check other
> + * subsystems. This will cause any racing
> + * css_tryget() to spin until we set the
> + * CSS_REMOVED bits or abort
> + */
> + } while (atomic_cmpxchg(&css->refcnt, refcnt, 0) != refcnt);
This loop also should have a cpu_relax(), I think?
> + }
> + done:
> + for_each_subsys(cgrp->root, ss) {
> + struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
> + if (failed) {
> + /*
> + * Restore old refcnt if we previously managed
> + * to clear it from 1 to 0
> + */
> + if (!atomic_read(&css->refcnt))
> + atomic_set(&css->refcnt, 1);
> + } else {
> + /* Commit the fact that the CSS is removed */
> + set_bit(CSS_REMOVED, &css->flags);
> + }
> + }
> + local_irq_restore(flags);
> + return !failed;
> +}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] CGroups: Add css_tryget()
2008-12-17 22:07 ` Andrew Morton
@ 2008-12-17 22:48 ` Paul Menage
0 siblings, 0 replies; 11+ messages in thread
From: Paul Menage @ 2008-12-17 22:48 UTC (permalink / raw)
To: Andrew Morton; +Cc: kamezawa.hiroyu, lizf, linux-kernel, linux-mm
On Wed, Dec 17, 2008 at 2:07 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> /*
> * State maintained by the cgroup system to allow subsystems
> * to be "busy". Should be accessed via css_get(),
> * css_tryget() and and css_put().
> */
>
> is conventional/preferred.
Oops, will fix.
>> static inline void css_get(struct cgroup_subsys_state *css)
>> @@ -77,9 +80,32 @@ static inline void css_get(struct cgroup
>> if (!test_bit(CSS_ROOT, &css->flags))
>> atomic_inc(&css->refcnt);
>> }
>> +
>> +static inline bool css_is_removed(struct cgroup_subsys_state *css)
>> +{
>> + return test_bit(CSS_REMOVED, &css->flags);
>> +}
>> +
>> +/*
>> + * Call css_tryget() to take a reference on a css if your existing
>> + * (known-valid) reference isn't already ref-counted. Returns false if
>> + * the css has been destroyed.
>> + */
>> +
>> +static inline bool css_tryget(struct cgroup_subsys_state *css)
>> +{
>> + if (test_bit(CSS_ROOT, &css->flags))
>> + return true;
>> + while (!atomic_inc_not_zero(&css->refcnt)) {
>> + if (test_bit(CSS_REMOVED, &css->flags))
>> + return false;
>> + }
>> + return true;
>> +}
>
> This looks too large to inline.
>
> We should have a cpu_relax() in the loop?
Sounds reasonable.
>
> And possibly a cond_resched().
No, we don't want to reschedule. These are pseudo spin locks rather
than psuedo mutexes. And the "hold time" is extremely short.
>
> It would be better if these polling loops didn't exist at all, of
> course. But I guess if you could work out a way of doing that, this
> patch wouldn't exist.
It would certainly be possible to implement it as a spinlock and a
count, and do:
css_get() {
spin_lock(&css->lock);
css->count++;
spin_unlock(&css->lock);
}
css_tryget() {
spin_lock(&css->lock);
if (css->count > 1) {
css->count++; result = true;
} else {
result = false;
}
spin_unlock(&css->lock);
}
and implement the cgroups side of it as
for each subsystem {
spin_lock(&css->lock);
if (css->count == 1) {
css->count = 0;
} else {
success = false;
}
}
for each subsystem {
if (!success && css->count == 0) {
css->count = 1;
}
spin_unlock(&css->lock);
}
Functionally that would be identical - the only downside is that's an
extra atomic operation in the fast path of css_get() and css_tryget(),
which some people had objected to in the past when I proposed similar
patches.
Hmm. Thinking about it, this is very similar to the rwlock_t logic,
and I could probably implement css_get() and css_tryget() via
read_lock() and the clear_css_refs() side via write_trylock(). Which
would be pretty much the same as the original patch, except using
conventional primitives. Big downside would be that we would be
limited to RW_LOCK_BIAS refcounts, or about 16M, versus the 2B that we
get with regular atomics.
Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] CGroups: Hierarchy locking/refcount changes
2008-12-17 22:03 ` [PATCH 0/3] CGroups: Hierarchy locking/refcount changes Andrew Morton
@ 2008-12-19 2:06 ` Paul Menage
0 siblings, 0 replies; 11+ messages in thread
From: Paul Menage @ 2008-12-19 2:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: kamezawa.hiroyu, lizf, linux-kernel, linux-mm
On Wed, Dec 17, 2008 at 2:03 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> cgroups-make-root_list-contains-active-hierarchies-only.patch
> cgroups-add-inactive-subsystems-to-rootnodesubsys_list.patch
> cgroups-add-inactive-subsystems-to-rootnodesubsys_list-fix.patch
> cgroups-introduce-link_css_set-to-remove-duplicate-code.patch
> cgroups-introduce-link_css_set-to-remove-duplicate-code-fix.patch
>
> it wasn't clear to me whether you still had issues with them, or
> whether updates were expected?
I think that with the fix patches they should be fine.
Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-12-19 2:04 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-16 11:30 [PATCH 0/3] CGroups: Hierarchy locking/refcount changes menage
2008-12-16 11:30 ` [PATCH 1/3] CGroups: Add a per-subsystem hierarchy_mutex menage
2008-12-17 5:16 ` KAMEZAWA Hiroyuki
2008-12-16 11:30 ` [PATCH 2/3] CGroups: Use hierarchy_mutex in memory controller menage
2008-12-17 5:17 ` KAMEZAWA Hiroyuki
2008-12-16 11:30 ` [PATCH 3/3] CGroups: Add css_tryget() menage
2008-12-17 5:19 ` KAMEZAWA Hiroyuki
2008-12-17 22:07 ` Andrew Morton
2008-12-17 22:48 ` Paul Menage
2008-12-17 22:03 ` [PATCH 0/3] CGroups: Hierarchy locking/refcount changes Andrew Morton
2008-12-19 2:06 ` Paul Menage
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox