* [RFC][PATCH 0/2] memcg: use ID instead of pointer in page_cgroup , retry.
@ 2010-09-24 9:13 KAMEZAWA Hiroyuki
2010-09-24 9:15 ` [RFC][PATCH 1/2] memcg: special ID lookup routine KAMEZAWA Hiroyuki
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-24 9:13 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, balbir, nishimura, akpm
This is a reviced series of use ID.
Restart from RFC.
[1/2] implementation of special ID lookup
[2/2] use ID in mm/memcontrol.c
People may say use css_lookup() and don't add a special routine but
I can't believw css_lookup() can give us enough speed at every page LRU handling
if the number of cgroup is big. I think this patch itself is enough simple...
but I admit this will make mem_cgroup more complex. Hmm.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC][PATCH 1/2] memcg: special ID lookup routine
2010-09-24 9:13 [RFC][PATCH 0/2] memcg: use ID instead of pointer in page_cgroup , retry KAMEZAWA Hiroyuki
@ 2010-09-24 9:15 ` KAMEZAWA Hiroyuki
2010-09-24 9:16 ` [RFC][PATCH 2/2] memcg: use ID instead of pointer KAMEZAWA Hiroyuki
2010-09-27 9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
2 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-24 9:15 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
It seems previous patches are not welcomed, this is a revised one.
My purpose is to replace pc->mem_cgroup to be pc->mem_cgroup_id and to prevent
using more memory when pc->blkio_cgroup_id is added.
As 1st step, this patch implements a lookup table from ID.
For usual lookup, css_lookup() will work enough well but it may have to
access several level of idr radix-tree. Memory cgroup's limit is 65536 and
as far as I here, there are a user who uses 2000+ memory cgroup on a system.
And with generic rcu based lookup routine, the caller has to
Type A:
rcu_read_lock()
obj = obj_lookup()
atomic_inc(obj->refcnt)
rcu_read_unlock()
/* do jobs */
Type B:
rcu_read_lock()
obj = rcu_lookup()
/* do jobs */
rcu_read_unlock()
Under some spinlock in many case.
(Type A is very bad in busy routine and even type B has to check the
object is alive or not. It's not no cost)
This is complicated.
Because page_cgroup -> mem_cgroup information is required at every LRU
operatons, I think it's worth to add a special lookup routine for reducing
cache footprint and, with some limitaton, lookup routine can be RCU free.
Note:
- memcg_lookup() is defined but not used. it's called in other patch.
Changelog:
- no hooks to cgroup.
- no limitation of the number of memcg.
- delay table allocation until memory cgroup is really used.
- No RCU routine. (depends on the limitation to callers newly added.)
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 67 insertions(+)
Index: mmotm-0922/mm/memcontrol.c
===================================================================
--- mmotm-0922.orig/mm/memcontrol.c
+++ mmotm-0922/mm/memcontrol.c
@@ -198,6 +198,7 @@ static void mem_cgroup_oom_notify(struct
*/
struct mem_cgroup {
struct cgroup_subsys_state css;
+ bool cached;
/*
* the counter to account for memory usage
*/
@@ -352,6 +353,65 @@ static void mem_cgroup_put(struct mem_cg
static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
static void drain_all_stock_async(void);
+#define MEMCG_ARRAY_SIZE (sizeof(struct mem_cgroup *) *(65536))
+struct mem_cgroup **memcg_array __read_mostly;
+DEFINE_SPINLOCK(memcg_array_lock);
+
+/*
+ * A quick lookup routine for memory cgroup via ID. This can be used
+ * until destroy() is called against memory cgroup. Then, in most case,
+ * there must be page_cgroups or tasks which points to memcg.
+ * So, cannot be used for swap_cgroup reference.
+ */
+static struct mem_cgroup *memcg_lookup(int id)
+{
+ if (id == 0)
+ return NULL;
+ if (id == 1)
+ return root_mem_cgroup;
+ return *(memcg_array + id);
+}
+
+static void memcg_lookup_set(struct mem_cgroup *mem)
+{
+ int id;
+
+ if (likely(mem->cached) || mem == root_mem_cgroup)
+ return;
+ id = css_id(&mem->css);
+ /* There are race with other "set" entry. need to avoid double refcnt */
+ spin_lock(&memcg_array_lock);
+ if (!(*(memcg_array + id))) {
+ mem_cgroup_get(mem);
+ *(memcg_array + id) = mem;
+ mem->cached = true;
+ }
+ spin_unlock(&memcg_array_lock);
+}
+
+static void memcg_lookup_clear(struct mem_cgroup *mem)
+{
+ int id = css_id(&mem->css);
+ /* No race with other look up/set/unset entry */
+ *(memcg_array + id) = NULL;
+ mem_cgroup_put(mem);
+}
+
+static int init_mem_cgroup_lookup_array(void)
+{
+ int size;
+
+ if (memcg_array)
+ return 0;
+
+ size = MEMCG_ARRAY_SIZE;
+ memcg_array = __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO,
+ PAGE_KERNEL);
+ if (!memcg_array)
+ return -ENOMEM;
+
+ return 0;
+}
static struct mem_cgroup_per_zone *
mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
@@ -2096,6 +2156,7 @@ static void __mem_cgroup_commit_charge(s
mem_cgroup_cancel_charge(mem);
return;
}
+ memcg_lookup_set(mem);
pc->mem_cgroup = mem;
/*
@@ -4341,6 +4402,10 @@ mem_cgroup_create(struct cgroup_subsys *
}
hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
} else {
+ /* Allocation of lookup array is delayd until creat cgroup */
+ error = init_mem_cgroup_lookup_array();
+ if (error == -ENOMEM)
+ goto free_out;
parent = mem_cgroup_from_cont(cont->parent);
mem->use_hierarchy = parent->use_hierarchy;
mem->oom_kill_disable = parent->oom_kill_disable;
@@ -4389,6 +4454,8 @@ static void mem_cgroup_destroy(struct cg
{
struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
+ memcg_lookup_clear(mem);
+
mem_cgroup_put(mem);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC][PATCH 2/2] memcg: use ID instead of pointer
2010-09-24 9:13 [RFC][PATCH 0/2] memcg: use ID instead of pointer in page_cgroup , retry KAMEZAWA Hiroyuki
2010-09-24 9:15 ` [RFC][PATCH 1/2] memcg: special ID lookup routine KAMEZAWA Hiroyuki
@ 2010-09-24 9:16 ` KAMEZAWA Hiroyuki
2010-09-27 9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
2 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-24 9:16 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
replaces page_cgroup->mem_cgroup to be an unsigned short.
And add an ID for blkio cgroup.
More work will be required for reducing sturct page_cgroup size,
but maybe good as 1st step.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/page_cgroup.h | 3 ++-
mm/memcontrol.c | 34 ++++++++++++++++++++--------------
mm/page_cgroup.c | 2 +-
3 files changed, 23 insertions(+), 16 deletions(-)
Index: mmotm-0922/include/linux/page_cgroup.h
===================================================================
--- mmotm-0922.orig/include/linux/page_cgroup.h
+++ mmotm-0922/include/linux/page_cgroup.h
@@ -12,7 +12,8 @@
*/
struct page_cgroup {
unsigned long flags;
- struct mem_cgroup *mem_cgroup;
+ unsigned short mem_cgroup;
+ unsigned short blkio_cgroup;
struct page *page;
struct list_head lru; /* per cgroup LRU list */
};
Index: mmotm-0922/mm/memcontrol.c
===================================================================
--- mmotm-0922.orig/mm/memcontrol.c
+++ mmotm-0922/mm/memcontrol.c
@@ -427,7 +427,7 @@ struct cgroup_subsys_state *mem_cgroup_c
static struct mem_cgroup_per_zone *
page_cgroup_zoneinfo(struct page_cgroup *pc)
{
- struct mem_cgroup *mem = pc->mem_cgroup;
+ struct mem_cgroup *mem = memcg_lookup(pc->mem_cgroup);
int nid = page_cgroup_nid(pc);
int zid = page_cgroup_zid(pc);
@@ -838,6 +838,11 @@ static inline bool mem_cgroup_is_root(st
return (mem == root_mem_cgroup);
}
+static inline bool mem_cgroup_id_is_root(unsigned short id)
+{
+ return (id == 1);
+}
+
/*
* Following LRU functions are allowed to be used without PCG_LOCK.
* Operations are called by routine of global LRU independently from memcg.
@@ -870,7 +875,7 @@ void mem_cgroup_del_lru_list(struct page
*/
mz = page_cgroup_zoneinfo(pc);
MEM_CGROUP_ZSTAT(mz, lru) -= 1;
- if (mem_cgroup_is_root(pc->mem_cgroup))
+ if (mem_cgroup_id_is_root(pc->mem_cgroup))
return;
VM_BUG_ON(list_empty(&pc->lru));
list_del_init(&pc->lru);
@@ -897,7 +902,7 @@ void mem_cgroup_rotate_lru_list(struct p
*/
smp_rmb();
/* unused or root page is not rotated. */
- if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
+ if (!PageCgroupUsed(pc) || mem_cgroup_id_is_root(pc->mem_cgroup))
return;
mz = page_cgroup_zoneinfo(pc);
list_move(&pc->lru, &mz->lists[lru]);
@@ -923,7 +928,7 @@ void mem_cgroup_add_lru_list(struct page
mz = page_cgroup_zoneinfo(pc);
MEM_CGROUP_ZSTAT(mz, lru) += 1;
SetPageCgroupAcctLRU(pc);
- if (mem_cgroup_is_root(pc->mem_cgroup))
+ if (mem_cgroup_id_is_root(pc->mem_cgroup))
return;
list_add(&pc->lru, &mz->lists[lru]);
}
@@ -1663,7 +1668,7 @@ static void mem_cgroup_update_file_stat(
return;
rcu_read_lock();
- mem = pc->mem_cgroup;
+ mem = memcg_lookup(pc->mem_cgroup);
if (unlikely(!mem || !PageCgroupUsed(pc)))
goto out;
/* pc->mem_cgroup is unstable ? */
@@ -1671,7 +1676,7 @@ static void mem_cgroup_update_file_stat(
/* take a lock against to access pc->mem_cgroup */
lock_page_cgroup(pc);
need_unlock = true;
- mem = pc->mem_cgroup;
+ mem = memcg_lookup(pc->mem_cgroup);
if (!mem || !PageCgroupUsed(pc))
goto out;
}
@@ -2121,7 +2126,7 @@ struct mem_cgroup *try_get_mem_cgroup_fr
pc = lookup_page_cgroup(page);
lock_page_cgroup(pc);
if (PageCgroupUsed(pc)) {
- mem = pc->mem_cgroup;
+ mem = memcg_lookup(pc->mem_cgroup);
if (mem && !css_tryget(&mem->css))
mem = NULL;
} else if (PageSwapCache(page)) {
@@ -2158,7 +2163,7 @@ static void __mem_cgroup_commit_charge(s
}
memcg_lookup_set(mem);
- pc->mem_cgroup = mem;
+ pc->mem_cgroup = css_id(&mem->css);
/*
* We access a page_cgroup asynchronously without lock_page_cgroup().
* Especially when a page_cgroup is taken from a page, pc->mem_cgroup
@@ -2216,7 +2221,7 @@ static void __mem_cgroup_move_account(st
VM_BUG_ON(PageLRU(pc->page));
VM_BUG_ON(!PageCgroupLocked(pc));
VM_BUG_ON(!PageCgroupUsed(pc));
- VM_BUG_ON(pc->mem_cgroup != from);
+ VM_BUG_ON(pc->mem_cgroup != css_id(&from->css));
if (PageCgroupFileMapped(pc)) {
/* Update mapped_file data for mem_cgroup */
@@ -2231,7 +2236,7 @@ static void __mem_cgroup_move_account(st
mem_cgroup_cancel_charge(from);
/* caller should have done css_get */
- pc->mem_cgroup = to;
+ pc->mem_cgroup = css_id(&to->css);
mem_cgroup_charge_statistics(to, pc, true);
/*
* We charges against "to" which may not have any tasks. Then, "to"
@@ -2251,7 +2256,7 @@ static int mem_cgroup_move_account(struc
{
int ret = -EINVAL;
lock_page_cgroup(pc);
- if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
+ if (PageCgroupUsed(pc) && pc->mem_cgroup == css_id(&from->css)) {
__mem_cgroup_move_account(pc, from, to, uncharge);
ret = 0;
}
@@ -2590,7 +2595,7 @@ __mem_cgroup_uncharge_common(struct page
lock_page_cgroup(pc);
- mem = pc->mem_cgroup;
+ mem = memcg_lookup(pc->mem_cgroup);
if (!PageCgroupUsed(pc))
goto unlock_out;
@@ -2835,7 +2840,7 @@ int mem_cgroup_prepare_migration(struct
pc = lookup_page_cgroup(page);
lock_page_cgroup(pc);
if (PageCgroupUsed(pc)) {
- mem = pc->mem_cgroup;
+ mem = memcg_lookup(pc->mem_cgroup);
css_get(&mem->css);
/*
* At migrating an anonymous page, its mapcount goes down
@@ -4652,7 +4657,8 @@ static int is_target_pte_for_mc(struct v
* mem_cgroup_move_account() checks the pc is valid or not under
* the lock.
*/
- if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+ if (PageCgroupUsed(pc) &&
+ pc->mem_cgroup == css_id(&mc.from->css)) {
ret = MC_TARGET_PAGE;
if (target)
target->page = page;
Index: mmotm-0922/mm/page_cgroup.c
===================================================================
--- mmotm-0922.orig/mm/page_cgroup.c
+++ mmotm-0922/mm/page_cgroup.c
@@ -15,7 +15,7 @@ static void __meminit
__init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
{
pc->flags = 0;
- pc->mem_cgroup = NULL;
+ pc->mem_cgroup = 0;
pc->page = pfn_to_page(pfn);
INIT_LIST_HEAD(&pc->lru);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2.
2010-09-24 9:13 [RFC][PATCH 0/2] memcg: use ID instead of pointer in page_cgroup , retry KAMEZAWA Hiroyuki
2010-09-24 9:15 ` [RFC][PATCH 1/2] memcg: special ID lookup routine KAMEZAWA Hiroyuki
2010-09-24 9:16 ` [RFC][PATCH 2/2] memcg: use ID instead of pointer KAMEZAWA Hiroyuki
@ 2010-09-27 9:48 ` KAMEZAWA Hiroyuki
2010-09-27 9:51 ` [RFC][PATCH 1/4] memcg: replace page_cgroup->mem_cgroup to be unsigned short KAMEZAWA Hiroyuki
` (4 more replies)
2 siblings, 5 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-27 9:48 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm
On Fri, 24 Sep 2010 18:13:02 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> This is a reviced series of use ID.
> Restart from RFC.
>
Then, I changed my mind..This is a new set. No new special lookups.
But you may feel somethig strange. I don't want to merge these patches
at once. Just think this set as a dump of my stack. Any comments are welcome.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC][PATCH 1/4] memcg: replace page_cgroup->mem_cgroup to be unsigned short
2010-09-27 9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
@ 2010-09-27 9:51 ` KAMEZAWA Hiroyuki
2010-09-27 9:52 ` [RFC][PATCH 2/4] memcg: make css ID visible at cgroup creation time KAMEZAWA Hiroyuki
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-27 9:51 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
replaces page_cgroup->mem_cgroup to be an unsigned short.
And add an ID for blkio cgroup.
More work will be required for reducing sturct page_cgroup size,
but maybe good as 1st step. As far as I tested, css_lookup() is enough fast.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/page_cgroup.h | 3 +
mm/memcontrol.c | 86 +++++++++++++++++++++++++++-----------------
mm/page_cgroup.c | 2 -
3 files changed, 57 insertions(+), 34 deletions(-)
Index: mmotm-0922/include/linux/page_cgroup.h
===================================================================
--- mmotm-0922.orig/include/linux/page_cgroup.h
+++ mmotm-0922/include/linux/page_cgroup.h
@@ -12,7 +12,8 @@
*/
struct page_cgroup {
unsigned long flags;
- struct mem_cgroup *mem_cgroup;
+ unsigned short mem_cgroup;
+ unsigned short blkio_cgroup;
struct page *page;
struct list_head lru; /* per cgroup LRU list */
};
Index: mmotm-0922/mm/memcontrol.c
===================================================================
--- mmotm-0922.orig/mm/memcontrol.c
+++ mmotm-0922/mm/memcontrol.c
@@ -352,6 +352,40 @@ static void mem_cgroup_put(struct mem_cg
static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
static void drain_all_stock_async(void);
+/*
+ * A helper function to get mem_cgroup from ID. must be called under
+ * rcu_read_lock(). The caller must check css_is_removed() or some if
+ * it's concern. (dropping refcnt from swap can be called against removed
+ * memcg.)
+ */
+static struct mem_cgroup *mem_cgroup_lookup(unsigned short id)
+{
+ struct cgroup_subsys_state *css;
+
+ /* ID 0 is unused ID */
+ if (!id)
+ return NULL;
+ if (id == 1)
+ return root_mem_cgroup;
+ css = css_lookup(&mem_cgroup_subsys, id);
+ if (!css)
+ return NULL;
+ return container_of(css, struct mem_cgroup, css);
+}
+
+/*
+ * If the ID is from pc->mem_cgroup, the mem_cgroup refered by ID must be
+ * exist as "valid" cgroup. It's guaranteed. You can use this easy function.
+ */
+static struct memcg_lookup(unsigned short id)
+{
+ struct mem_cgroup *mem;
+
+ rcu_read_lock();
+ mem = mem_cgroup_lookup(id);
+ rcu_read_unlock();
+ return mem;
+}
static struct mem_cgroup_per_zone *
mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
@@ -367,7 +401,7 @@ struct cgroup_subsys_state *mem_cgroup_c
static struct mem_cgroup_per_zone *
page_cgroup_zoneinfo(struct page_cgroup *pc)
{
- struct mem_cgroup *mem = pc->mem_cgroup;
+ struct mem_cgroup *mem = memcg_lookup(pc->mem_cgroup);
int nid = page_cgroup_nid(pc);
int zid = page_cgroup_zid(pc);
@@ -778,6 +812,11 @@ static inline bool mem_cgroup_is_root(st
return (mem == root_mem_cgroup);
}
+static inline bool mem_cgroup_id_is_root(unsigned short id)
+{
+ return (id == 1);
+}
+
/*
* Following LRU functions are allowed to be used without PCG_LOCK.
* Operations are called by routine of global LRU independently from memcg.
@@ -810,7 +849,7 @@ void mem_cgroup_del_lru_list(struct page
*/
mz = page_cgroup_zoneinfo(pc);
MEM_CGROUP_ZSTAT(mz, lru) -= 1;
- if (mem_cgroup_is_root(pc->mem_cgroup))
+ if (mem_cgroup_id_is_root(pc->mem_cgroup))
return;
VM_BUG_ON(list_empty(&pc->lru));
list_del_init(&pc->lru);
@@ -837,7 +876,7 @@ void mem_cgroup_rotate_lru_list(struct p
*/
smp_rmb();
/* unused or root page is not rotated. */
- if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
+ if (!PageCgroupUsed(pc) || mem_cgroup_id_is_root(pc->mem_cgroup))
return;
mz = page_cgroup_zoneinfo(pc);
list_move(&pc->lru, &mz->lists[lru]);
@@ -863,7 +902,7 @@ void mem_cgroup_add_lru_list(struct page
mz = page_cgroup_zoneinfo(pc);
MEM_CGROUP_ZSTAT(mz, lru) += 1;
SetPageCgroupAcctLRU(pc);
- if (mem_cgroup_is_root(pc->mem_cgroup))
+ if (mem_cgroup_id_is_root(pc->mem_cgroup))
return;
list_add(&pc->lru, &mz->lists[lru]);
}
@@ -1603,7 +1642,7 @@ static void mem_cgroup_update_file_stat(
return;
rcu_read_lock();
- mem = pc->mem_cgroup;
+ mem = memcg_lookup(pc->mem_cgroup);
if (unlikely(!mem || !PageCgroupUsed(pc)))
goto out;
/* pc->mem_cgroup is unstable ? */
@@ -1611,7 +1650,7 @@ static void mem_cgroup_update_file_stat(
/* take a lock against to access pc->mem_cgroup */
lock_page_cgroup(pc);
need_unlock = true;
- mem = pc->mem_cgroup;
+ mem = memcg_lookup(pc->mem_cgroup);
if (!mem || !PageCgroupUsed(pc))
goto out;
}
@@ -2030,24 +2069,6 @@ static void mem_cgroup_cancel_charge(str
__mem_cgroup_cancel_charge(mem, 1);
}
-/*
- * A helper function to get mem_cgroup from ID. must be called under
- * rcu_read_lock(). The caller must check css_is_removed() or some if
- * it's concern. (dropping refcnt from swap can be called against removed
- * memcg.)
- */
-static struct mem_cgroup *mem_cgroup_lookup(unsigned short id)
-{
- struct cgroup_subsys_state *css;
-
- /* ID 0 is unused ID */
- if (!id)
- return NULL;
- css = css_lookup(&mem_cgroup_subsys, id);
- if (!css)
- return NULL;
- return container_of(css, struct mem_cgroup, css);
-}
struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
{
@@ -2061,7 +2082,7 @@ struct mem_cgroup *try_get_mem_cgroup_fr
pc = lookup_page_cgroup(page);
lock_page_cgroup(pc);
if (PageCgroupUsed(pc)) {
- mem = pc->mem_cgroup;
+ mem = memcg_lookup(pc->mem_cgroup);
if (mem && !css_tryget(&mem->css))
mem = NULL;
} else if (PageSwapCache(page)) {
@@ -2097,7 +2118,7 @@ static void __mem_cgroup_commit_charge(s
return;
}
- pc->mem_cgroup = mem;
+ pc->mem_cgroup = css_id(&mem->css);
/*
* We access a page_cgroup asynchronously without lock_page_cgroup().
* Especially when a page_cgroup is taken from a page, pc->mem_cgroup
@@ -2155,7 +2176,7 @@ static void __mem_cgroup_move_account(st
VM_BUG_ON(PageLRU(pc->page));
VM_BUG_ON(!PageCgroupLocked(pc));
VM_BUG_ON(!PageCgroupUsed(pc));
- VM_BUG_ON(pc->mem_cgroup != from);
+ VM_BUG_ON(pc->mem_cgroup != css_id(&from->css));
if (PageCgroupFileMapped(pc)) {
/* Update mapped_file data for mem_cgroup */
@@ -2170,7 +2191,7 @@ static void __mem_cgroup_move_account(st
mem_cgroup_cancel_charge(from);
/* caller should have done css_get */
- pc->mem_cgroup = to;
+ pc->mem_cgroup = css_id(&to->css);
mem_cgroup_charge_statistics(to, pc, true);
/*
* We charges against "to" which may not have any tasks. Then, "to"
@@ -2190,7 +2211,7 @@ static int mem_cgroup_move_account(struc
{
int ret = -EINVAL;
lock_page_cgroup(pc);
- if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
+ if (PageCgroupUsed(pc) && pc->mem_cgroup == css_id(&from->css)) {
__mem_cgroup_move_account(pc, from, to, uncharge);
ret = 0;
}
@@ -2529,7 +2550,7 @@ __mem_cgroup_uncharge_common(struct page
lock_page_cgroup(pc);
- mem = pc->mem_cgroup;
+ mem = memcg_lookup(pc->mem_cgroup);
if (!PageCgroupUsed(pc))
goto unlock_out;
@@ -2774,7 +2795,7 @@ int mem_cgroup_prepare_migration(struct
pc = lookup_page_cgroup(page);
lock_page_cgroup(pc);
if (PageCgroupUsed(pc)) {
- mem = pc->mem_cgroup;
+ mem = memcg_lookup(pc->mem_cgroup);
css_get(&mem->css);
/*
* At migrating an anonymous page, its mapcount goes down
@@ -4585,7 +4606,8 @@ static int is_target_pte_for_mc(struct v
* mem_cgroup_move_account() checks the pc is valid or not under
* the lock.
*/
- if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+ if (PageCgroupUsed(pc) &&
+ pc->mem_cgroup == css_id(&mc.from->css)) {
ret = MC_TARGET_PAGE;
if (target)
target->page = page;
Index: mmotm-0922/mm/page_cgroup.c
===================================================================
--- mmotm-0922.orig/mm/page_cgroup.c
+++ mmotm-0922/mm/page_cgroup.c
@@ -15,7 +15,7 @@ static void __meminit
__init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
{
pc->flags = 0;
- pc->mem_cgroup = NULL;
+ pc->mem_cgroup = 0;
pc->page = pfn_to_page(pfn);
INIT_LIST_HEAD(&pc->lru);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC][PATCH 2/4] memcg: make css ID visible at cgroup creation time
2010-09-27 9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
2010-09-27 9:51 ` [RFC][PATCH 1/4] memcg: replace page_cgroup->mem_cgroup to be unsigned short KAMEZAWA Hiroyuki
@ 2010-09-27 9:52 ` KAMEZAWA Hiroyuki
2010-09-27 9:54 ` [RFC][PATCH 3/4] memcg: reduce size of mem_cgroup by removing per-node info array KAMEZAWA Hiroyuki
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-27 9:52 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Now, css'id is allocated after ->create() is called. But to make use of ID
in ->create(), it should be available before ->create().
In another thinking, considering the ID is tightly coupled with "css",
it should be allocated when "css" is allocated.
This patch moves alloc_css_id() to css allocation routine. Now, only 2 subsys,
memory and blkio are using ID. (To support complicated hierarchy walk.)
ID will be used in mem cgroup's ->create(), later.
This patch adds css ID documentation which is not provided.
Note:
If someone changes rules of css allocation, ID allocation should be changed.
Changelog: 2010/09/01
- modified cgroups.txt
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
Documentation/cgroups/cgroups.txt | 48 ++++++++++++++++++++++++++++++++++++
block/blk-cgroup.c | 9 ++++++
include/linux/cgroup.h | 16 ++++++------
kernel/cgroup.c | 50 +++++++++++---------------------------
mm/memcontrol.c | 5 +++
5 files changed, 86 insertions(+), 42 deletions(-)
Index: mmotm-0922/kernel/cgroup.c
===================================================================
--- mmotm-0922.orig/kernel/cgroup.c
+++ mmotm-0922/kernel/cgroup.c
@@ -288,9 +288,6 @@ struct cg_cgroup_link {
static struct css_set init_css_set;
static struct cg_cgroup_link init_css_set_link;
-static int cgroup_init_idr(struct cgroup_subsys *ss,
- struct cgroup_subsys_state *css);
-
/* css_set_lock protects the list of css_set objects, and the
* chain of tasks off each css_set. Nests outside task->alloc_lock
* due to cgroup_iter_start() */
@@ -769,9 +766,6 @@ static struct backing_dev_info cgroup_ba
.capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK,
};
-static int alloc_css_id(struct cgroup_subsys *ss,
- struct cgroup *parent, struct cgroup *child);
-
static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb)
{
struct inode *inode = new_inode(sb);
@@ -3254,7 +3248,8 @@ static void init_cgroup_css(struct cgrou
css->cgroup = cgrp;
atomic_set(&css->refcnt, 1);
css->flags = 0;
- css->id = NULL;
+ if (!ss->use_id)
+ css->id = NULL;
if (cgrp == dummytop)
set_bit(CSS_ROOT, &css->flags);
BUG_ON(cgrp->subsys[ss->subsys_id]);
@@ -3339,12 +3334,6 @@ static long cgroup_create(struct cgroup
goto err_destroy;
}
init_cgroup_css(css, ss, cgrp);
- if (ss->use_id) {
- err = alloc_css_id(ss, parent, cgrp);
- if (err)
- goto err_destroy;
- }
- /* At error, ->destroy() callback has to free assigned ID. */
}
cgroup_lock_hierarchy(root);
@@ -3706,17 +3695,6 @@ int __init_or_module cgroup_load_subsys(
/* our new subsystem will be attached to the dummy hierarchy. */
init_cgroup_css(css, ss, dummytop);
- /* init_idr must be after init_cgroup_css because it sets css->id. */
- if (ss->use_id) {
- int ret = cgroup_init_idr(ss, css);
- if (ret) {
- dummytop->subsys[ss->subsys_id] = NULL;
- ss->destroy(ss, dummytop);
- subsys[i] = NULL;
- mutex_unlock(&cgroup_mutex);
- return ret;
- }
- }
/*
* Now we need to entangle the css into the existing css_sets. unlike
@@ -3885,8 +3863,6 @@ int __init cgroup_init(void)
struct cgroup_subsys *ss = subsys[i];
if (!ss->early_init)
cgroup_init_subsys(ss);
- if (ss->use_id)
- cgroup_init_idr(ss, init_css_set.subsys[ss->subsys_id]);
}
/* Add init_css_set to the hash table */
@@ -4600,8 +4576,8 @@ err_out:
}
-static int __init_or_module cgroup_init_idr(struct cgroup_subsys *ss,
- struct cgroup_subsys_state *rootcss)
+static int cgroup_init_idr(struct cgroup_subsys *ss,
+ struct cgroup_subsys_state *rootcss)
{
struct css_id *newid;
@@ -4613,21 +4589,25 @@ static int __init_or_module cgroup_init_
return PTR_ERR(newid);
newid->stack[0] = newid->id;
- newid->css = rootcss;
- rootcss->id = newid;
+ rcu_assign_pointer(newid->css, rootcss);
+ rcu_assign_pointer(rootcss->id, newid);
return 0;
}
-static int alloc_css_id(struct cgroup_subsys *ss, struct cgroup *parent,
- struct cgroup *child)
+int alloc_css_id(struct cgroup_subsys *ss,
+ struct cgroup *cgrp, struct cgroup_subsys_state *css)
{
int subsys_id, i, depth = 0;
- struct cgroup_subsys_state *parent_css, *child_css;
+ struct cgroup_subsys_state *parent_css;
+ struct cgroup *parent;
struct css_id *child_id, *parent_id;
+ if (cgrp == dummytop)
+ return cgroup_init_idr(ss, css);
+
+ parent = cgrp->parent;
subsys_id = ss->subsys_id;
parent_css = parent->subsys[subsys_id];
- child_css = child->subsys[subsys_id];
parent_id = parent_css->id;
depth = parent_id->depth + 1;
@@ -4642,7 +4622,7 @@ static int alloc_css_id(struct cgroup_su
* child_id->css pointer will be set after this cgroup is available
* see cgroup_populate_dir()
*/
- rcu_assign_pointer(child_css->id, child_id);
+ rcu_assign_pointer(css->id, child_id);
return 0;
}
Index: mmotm-0922/include/linux/cgroup.h
===================================================================
--- mmotm-0922.orig/include/linux/cgroup.h
+++ mmotm-0922/include/linux/cgroup.h
@@ -588,9 +588,11 @@ static inline int cgroup_attach_task_cur
/*
* CSS ID is ID for cgroup_subsys_state structs under subsys. This only works
* if cgroup_subsys.use_id == true. It can be used for looking up and scanning.
- * CSS ID is assigned at cgroup allocation (create) automatically
- * and removed when subsys calls free_css_id() function. This is because
- * the lifetime of cgroup_subsys_state is subsys's matter.
+ * CSS ID must be assigned by subsys itself at cgroup creation and deleted
+ * when subsys calls free_css_id() function. This is because the life time of
+ * of cgroup_subsys_state is subsys's matter.
+ *
+ * ID->css look up is available after cgroup's directory is populated.
*
* Looking up and scanning function should be called under rcu_read_lock().
* Taking cgroup_mutex()/hierarchy_mutex() is not necessary for following calls.
@@ -598,10 +600,10 @@ static inline int cgroup_attach_task_cur
* destroyed". The caller should check css and cgroup's status.
*/
-/*
- * Typically Called at ->destroy(), or somewhere the subsys frees
- * cgroup_subsys_state.
- */
+/* Should be called in ->create() by subsys itself */
+int alloc_css_id(struct cgroup_subsys *ss, struct cgroup *newgr,
+ struct cgroup_subsys_state *css);
+/* Typically Called at ->destroy(), or somewhere the subsys frees css */
void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css);
/* Find a cgroup_subsys_state which has given ID */
Index: mmotm-0922/mm/memcontrol.c
===================================================================
--- mmotm-0922.orig/mm/memcontrol.c
+++ mmotm-0922/mm/memcontrol.c
@@ -4347,6 +4347,11 @@ mem_cgroup_create(struct cgroup_subsys *
if (alloc_mem_cgroup_per_zone_info(mem, node))
goto free_out;
+ error = alloc_css_id(ss, cont, &mem->css);
+ if (error)
+ goto free_out;
+ /* Here, css_id(&mem->css) works. but css_lookup(id)->mem doesn't */
+
/* root ? */
if (cont->parent == NULL) {
int cpu;
Index: mmotm-0922/block/blk-cgroup.c
===================================================================
--- mmotm-0922.orig/block/blk-cgroup.c
+++ mmotm-0922/block/blk-cgroup.c
@@ -1434,9 +1434,13 @@ blkiocg_create(struct cgroup_subsys *sub
{
struct blkio_cgroup *blkcg;
struct cgroup *parent = cgroup->parent;
+ int ret;
if (!parent) {
blkcg = &blkio_root_cgroup;
+ ret = alloc_css_id(subsys, cgroup, &blkcg->css);
+ if (ret)
+ return ERR_PTR(ret);
goto done;
}
@@ -1447,6 +1451,11 @@ blkiocg_create(struct cgroup_subsys *sub
blkcg = kzalloc(sizeof(*blkcg), GFP_KERNEL);
if (!blkcg)
return ERR_PTR(-ENOMEM);
+ ret = alloc_css_id(subsys, cgroup, &blkcg->css);
+ if (ret) {
+ kfree(blkcg);
+ return ERR_PTR(ret);
+ }
blkcg->weight = BLKIO_WEIGHT_DEFAULT;
done:
Index: mmotm-0922/Documentation/cgroups/cgroups.txt
===================================================================
--- mmotm-0922.orig/Documentation/cgroups/cgroups.txt
+++ mmotm-0922/Documentation/cgroups/cgroups.txt
@@ -621,6 +621,54 @@ and root cgroup. Currently this will onl
the default hierarchy (which never has sub-cgroups) and a hierarchy
that is being created/destroyed (and hence has no sub-cgroups).
+3.4 cgroup subsys state IDs.
+------------
+When subsystem sets use_id == true, an ID per [cgroup, subsys] is added
+and it will be tied to cgroup_subsys_state object.
+
+When use_id==true can use following interfaces. But please note that
+allocation/free an ID is subsystem's job because cgroup_subsys_state
+object's lifetime is subsystem's matter.
+
+unsigned short css_id(struct cgroup_subsys_state *css)
+
+Returns ID of cgroup_subsys_state
+
+unsigend short css_depth(struct cgroup_subsys_state *css)
+
+Returns the level which "css" is exisiting under hierarchy tree.
+The root cgroup's depth 0, its children are 1, children's children are
+2....
+
+int alloc_css_id(struct struct cgroup_subsys *ss, struct cgroup *newgr,
+ struct cgroup_subsys_state *css);
+
+Attach an new ID to given css under subsystem ([ss, cgroup])
+should be called in ->create() callback.
+
+void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css);
+
+Free ID attached to "css" under subsystem. Should be called before
+"css" is freed.
+
+struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id);
+
+Look up cgroup_subsys_state via ID. Should be called under rcu_read_lock().
+
+struct cgroup_subsys_state *css_get_next(struct cgroup_subsys *ss, int id,
+ struct cgroup_subsys_state *root, int *foundid);
+
+Returns ID which is under "root" i.e. under sub-directory of "root"
+cgroup's directory at considering cgroup hierarchy. The order of IDs
+returned by this function is not sorted. Please be careful.
+
+bool css_is_ancestor(struct cgroup_subsys_state *cg,
+ const struct cgroup_subsys_state *root);
+
+Returns true if "root" and "cs" is under the same hierarchy and
+"root" can be found when you see all ->parent from "cs" until
+the root cgroup.
+
4. Questions
============
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC][PATCH 3/4] memcg: reduce size of mem_cgroup by removing per-node info array
2010-09-27 9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
2010-09-27 9:51 ` [RFC][PATCH 1/4] memcg: replace page_cgroup->mem_cgroup to be unsigned short KAMEZAWA Hiroyuki
2010-09-27 9:52 ` [RFC][PATCH 2/4] memcg: make css ID visible at cgroup creation time KAMEZAWA Hiroyuki
@ 2010-09-27 9:54 ` KAMEZAWA Hiroyuki
2010-09-27 9:54 ` [RFC][PATCH 4/4] memcg: per node info node hotplug support KAMEZAWA Hiroyuki
2010-09-30 5:31 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
4 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-27 9:54 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Now, memcgroup's per-zone structure is looked up as
mem->info.nodeinfo[nid]->zoneinfo[zid]
1st. This nodeinfo is array of pointers of MAX_NUMNODES size. This makes
sizeof struct mem_cgroup very large and struct mem_cgroup will be allocated on
vmalloc() area because the size is larger than PAGE_SIZE.
(This will never be fixed even when nodehotplug is supported.)
2nd. Now, page_cgroup->mem_cgroup is an ID. Then, we need 2 level lookup up
to accesss per-zone structure as
mem = css_lookup(pc->mem_cgroup);
mz = mem->info.nodeinfo[nid]->zoneinfo[zid]
This look up seems wasteful. This patch removes mem->info and moves all per-zone
memcg onto radix-tree. mem_cgroup_per_zone structure can be found by
radix_tree_lookup(&memcg_lrus, id_func(memcg, nid, zid)).
This makes memcg small (4440 bytes => 344bytes) and combine 2 lookup into one.
Following patch will add memory hotplug support.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 86 +++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 57 insertions(+), 29 deletions(-)
Index: mmotm-0922/mm/memcontrol.c
===================================================================
--- mmotm-0922.orig/mm/memcontrol.c
+++ mmotm-0922/mm/memcontrol.c
@@ -122,13 +122,16 @@ struct mem_cgroup_per_zone {
/* Macro for accessing counter */
#define MEM_CGROUP_ZSTAT(mz, idx) ((mz)->count[(idx)])
-struct mem_cgroup_per_node {
- struct mem_cgroup_per_zone zoneinfo[MAX_NR_ZONES];
-};
+RADIX_TREE(memcg_lrus, GFP_KERNEL);
+DEFINE_SPINLOCK(memcg_lrutable_lock);
-struct mem_cgroup_lru_info {
- struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES];
-};
+static inline long node_zone_idx(int memcg, int node, int zone) {
+ unsigned long id;
+
+ id = ((node) << ZONES_SHIFT | (zone)) << 16;
+ id |= memcg;
+ return id;
+}
/*
* Cgroups above their limits are maintained in a RB-Tree, independent of
@@ -206,11 +209,6 @@ struct mem_cgroup {
* the counter to account for mem+swap usage.
*/
struct res_counter memsw;
- /*
- * Per cgroup active and inactive list, similar to the
- * per zone LRU lists.
- */
- struct mem_cgroup_lru_info info;
/*
protect against reclaim related member.
@@ -388,9 +386,14 @@ static struct mem_cgroup *memcg_lookup(u
}
static struct mem_cgroup_per_zone *
-mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
+mem_cgroup_zoneinfo(int memcgid, int nid, int zid)
{
- return &mem->info.nodeinfo[nid]->zoneinfo[zid];
+ struct mem_cgroup_per_zone *mz;
+
+ rcu_read_lock();
+ mz = radix_tree_lookup(&memcg_lrus, node_zone_idx(memcgid, nid, zid));
+ rcu_read_unlock();
+ return mz;
}
struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem)
@@ -401,14 +404,13 @@ struct cgroup_subsys_state *mem_cgroup_c
static struct mem_cgroup_per_zone *
page_cgroup_zoneinfo(struct page_cgroup *pc)
{
- struct mem_cgroup *mem = memcg_lookup(pc->mem_cgroup);
int nid = page_cgroup_nid(pc);
int zid = page_cgroup_zid(pc);
- if (!mem)
+ if (!pc->mem_cgroup)
return NULL;
- return mem_cgroup_zoneinfo(mem, nid, zid);
+ return mem_cgroup_zoneinfo(pc->mem_cgroup, nid, zid);
}
static struct mem_cgroup_tree_per_zone *
@@ -496,7 +498,7 @@ static void mem_cgroup_update_tree(struc
* because their event counter is not touched.
*/
for (; mem; mem = parent_mem_cgroup(mem)) {
- mz = mem_cgroup_zoneinfo(mem, nid, zid);
+ mz = mem_cgroup_zoneinfo(css_id(&mem->css), nid, zid);
excess = res_counter_soft_limit_excess(&mem->res);
/*
* We have to update the tree if mz is on RB-tree or
@@ -525,7 +527,7 @@ static void mem_cgroup_remove_from_trees
for_each_node_state(node, N_POSSIBLE) {
for (zone = 0; zone < MAX_NR_ZONES; zone++) {
- mz = mem_cgroup_zoneinfo(mem, node, zone);
+ mz = mem_cgroup_zoneinfo(css_id(&mem->css), node, zone);
mctz = soft_limit_tree_node_zone(node, zone);
mem_cgroup_remove_exceeded(mem, mz, mctz);
}
@@ -658,7 +660,7 @@ static unsigned long mem_cgroup_get_loca
for_each_online_node(nid)
for (zid = 0; zid < MAX_NR_ZONES; zid++) {
- mz = mem_cgroup_zoneinfo(mem, nid, zid);
+ mz = mem_cgroup_zoneinfo(css_id(&mem->css), nid, zid);
total += MEM_CGROUP_ZSTAT(mz, idx);
}
return total;
@@ -1039,7 +1041,9 @@ unsigned long mem_cgroup_zone_nr_pages(s
{
int nid = zone_to_nid(zone);
int zid = zone_idx(zone);
- struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);
+ struct mem_cgroup_per_zone *mz;
+
+ mz = mem_cgroup_zoneinfo(css_id(&memcg->css), nid, zid);
return MEM_CGROUP_ZSTAT(mz, lru);
}
@@ -1049,7 +1053,9 @@ struct zone_reclaim_stat *mem_cgroup_get
{
int nid = zone_to_nid(zone);
int zid = zone_idx(zone);
- struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);
+ struct mem_cgroup_per_zone *mz;
+
+ mz = mem_cgroup_zoneinfo(css_id(&memcg->css), nid, zid);
return &mz->reclaim_stat;
}
@@ -1099,7 +1105,7 @@ unsigned long mem_cgroup_isolate_pages(u
int ret;
BUG_ON(!mem_cont);
- mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);
+ mz = mem_cgroup_zoneinfo(css_id(&mem_cont->css), nid, zid);
src = &mz->lists[lru];
scan = 0;
@@ -3179,7 +3185,7 @@ static int mem_cgroup_force_empty_list(s
int ret = 0;
zone = &NODE_DATA(node)->node_zones[zid];
- mz = mem_cgroup_zoneinfo(mem, node, zid);
+ mz = mem_cgroup_zoneinfo(css_id(&mem->css), node, zid);
list = &mz->lists[lru];
loop = MEM_CGROUP_ZSTAT(mz, lru);
@@ -3676,7 +3682,8 @@ static int mem_control_stat_show(struct
for_each_online_node(nid)
for (zid = 0; zid < MAX_NR_ZONES; zid++) {
- mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);
+ mz = mem_cgroup_zoneinfo(
+ css_id(&mem_cont->css), nid, zid);
recent_rotated[0] +=
mz->reclaim_stat.recent_rotated[0];
@@ -4173,10 +4180,9 @@ static int register_memsw_files(struct c
static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
{
- struct mem_cgroup_per_node *pn;
struct mem_cgroup_per_zone *mz;
enum lru_list l;
- int zone, tmp = node;
+ int id, zone, ret, tmp = node;
/*
* This routine is called against possible nodes.
* But it's BUG to call kmalloc() against offline node.
@@ -4187,27 +4193,51 @@ static int alloc_mem_cgroup_per_zone_inf
*/
if (!node_state(node, N_NORMAL_MEMORY))
tmp = -1;
- pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
- if (!pn)
- return 1;
-
- mem->info.nodeinfo[node] = pn;
- memset(pn, 0, sizeof(*pn));
-
for (zone = 0; zone < MAX_NR_ZONES; zone++) {
- mz = &pn->zoneinfo[zone];
+ mz = kzalloc_node(sizeof(struct mem_cgroup_per_zone),
+ GFP_KERNEL, tmp);
+ if (!mz)
+ break;
+ radix_tree_preload(GFP_KERNEL);
+ spin_lock_irq(&memcg_lrutable_lock);
+ id = node_zone_idx(css_id(&mem->css), node, zone);
+ ret = radix_tree_insert(&memcg_lrus, id, mz);
+ spin_unlock_irq(&memcg_lrutable_lock);
+ if (ret)
+ break;
for_each_lru(l)
INIT_LIST_HEAD(&mz->lists[l]);
- mz->usage_in_excess = 0;
mz->on_tree = false;
mz->mem = mem;
}
- return 0;
+
+ if (zone == MAX_NR_ZONES)
+ return 0;
+
+ for (; zone >= 0; zone--) {
+ id = node_zone_idx(css_id(&mem->css), node, zone);
+ spin_lock_irq(&memcg_lrutable_lock);
+ mz = radix_tree_delete(&memcg_lrus, id);
+ spin_unlock_irq(&memcg_lrutable_lock);
+ kfree(mz);
+ }
+
+ return 1;
}
static void free_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
{
- kfree(mem->info.nodeinfo[node]);
+ int id, zone;
+ struct mem_cgroup_per_zone *mz;
+ unsigned long flags;
+
+ for (zone = 0; zone < MAX_NR_ZONES; zone++) {
+ id = node_zone_idx(css_id(&mem->css), node, zone);
+ spin_lock_irqsave(&memcg_lrutable_lock, flags);
+ mz = radix_tree_delete(&memcg_lrus, id);
+ spin_unlock_irqrestore(&memcg_lrutable_lock, flags);
+ kfree(mz);
+ }
}
static struct mem_cgroup *mem_cgroup_alloc(void)
@@ -4234,6 +4264,7 @@ static struct mem_cgroup *mem_cgroup_all
mem = NULL;
}
spin_lock_init(&mem->pcp_counter_lock);
+
return mem;
}
@@ -4343,13 +4374,14 @@ mem_cgroup_create(struct cgroup_subsys *
if (!mem)
return ERR_PTR(error);
+ error = alloc_css_id(ss, cont, &mem->css);
+ if (error)
+ goto free_out;
+
for_each_node_state(node, N_POSSIBLE)
if (alloc_mem_cgroup_per_zone_info(mem, node))
goto free_out;
- error = alloc_css_id(ss, cont, &mem->css);
- if (error)
- goto free_out;
/* Here, css_id(&mem->css) works. but css_lookup(id)->mem doesn't */
/* root ? */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC][PATCH 4/4] memcg: per node info node hotplug support
2010-09-27 9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
` (2 preceding siblings ...)
2010-09-27 9:54 ` [RFC][PATCH 3/4] memcg: reduce size of mem_cgroup by removing per-node info array KAMEZAWA Hiroyuki
@ 2010-09-27 9:54 ` KAMEZAWA Hiroyuki
2010-09-30 5:31 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
4 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-27 9:54 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Support node hot plug (experimental).
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 43 insertions(+), 3 deletions(-)
Index: mmotm-0922/mm/memcontrol.c
===================================================================
--- mmotm-0922.orig/mm/memcontrol.c
+++ mmotm-0922/mm/memcontrol.c
@@ -48,6 +48,7 @@
#include <linux/page_cgroup.h>
#include <linux/cpu.h>
#include <linux/oom.h>
+#include <linux/memory.h>
#include "internal.h"
#include <asm/uaccess.h>
@@ -4212,8 +4213,12 @@ static int alloc_mem_cgroup_per_zone_inf
id = node_zone_idx(css_id(&mem->css), node, zone);
ret = radix_tree_insert(&memcg_lrus, id, mz);
spin_unlock_irq(&memcg_lrutable_lock);
- if (ret)
- break;
+ if (ret) {
+ if (ret != -EEXIST)
+ break;
+ kfree(mz);
+ continue;
+ }
for_each_lru(l)
INIT_LIST_HEAD(&mz->lists[l]);
mz->on_tree = false;
@@ -4372,6 +4377,40 @@ static int mem_cgroup_soft_limit_tree_in
return 0;
}
+static int __meminit memcg_memory_hotplug_callback(struct notifier_block *self,
+ unsigned long action, void *arg)
+{
+ struct memory_notify *mn = arg;
+ struct mem_cgroup *mem;
+ int nid = mn->status_change_nid;
+ int ret = 0;
+
+ /* We just take care of node hotplug */
+ if (nid == -1)
+ return NOTIFY_OK;
+ switch(action) {
+ case MEM_GOING_ONLINE:
+ for_each_mem_cgroup_all(mem)
+ ret = alloc_mem_cgroup_per_zone_info(mem, nid);
+ break;
+ case MEM_OFFLINE:
+ for_each_mem_cgroup_all(mem)
+ free_mem_cgroup_per_zone_info(mem, nid);
+ break;
+ default:
+ break;
+ }
+
+ if (ret)
+ ret = notifier_from_errno(ret);
+ else
+ ret = NOTIFY_OK;
+
+ return ret;
+}
+
+
+
static struct cgroup_subsys_state * __ref
mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
{
@@ -4387,7 +4426,7 @@ mem_cgroup_create(struct cgroup_subsys *
if (error)
goto free_out;
- for_each_node_state(node, N_POSSIBLE)
+ for_each_node_state(node, N_HIGH_MEMORY)
if (alloc_mem_cgroup_per_zone_info(mem, node))
goto free_out;
@@ -4407,6 +4446,7 @@ mem_cgroup_create(struct cgroup_subsys *
INIT_WORK(&stock->work, drain_local_stock);
}
hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
+ hotplug_memory_notifier(memcg_memory_hotplug_callback, 0);
} else {
parent = mem_cgroup_from_cont(cont->parent);
mem->use_hierarchy = parent->use_hierarchy;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2.
2010-09-27 9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
` (3 preceding siblings ...)
2010-09-27 9:54 ` [RFC][PATCH 4/4] memcg: per node info node hotplug support KAMEZAWA Hiroyuki
@ 2010-09-30 5:31 ` KAMEZAWA Hiroyuki
4 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-30 5:31 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm
On Mon, 27 Sep 2010 18:48:21 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 24 Sep 2010 18:13:02 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> >
> > This is a reviced series of use ID.
> > Restart from RFC.
> >
>
> Then, I changed my mind..This is a new set. No new special lookups.
> But you may feel somethig strange. I don't want to merge these patches
> at once. Just think this set as a dump of my stack. Any comments are welcome.
>
At LinuxCon Japan, I talked with Nishimura and just sending only patch 1/4
will be best (go step-by-step). And I know Greg Thelen now rewrite his
dirty page accounting for memcg patch onto the latest mmotm. I think current
priority of it is higher than this. So, I'll wait for a while and post only
patch 1/4.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-09-30 5:37 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-24 9:13 [RFC][PATCH 0/2] memcg: use ID instead of pointer in page_cgroup , retry KAMEZAWA Hiroyuki
2010-09-24 9:15 ` [RFC][PATCH 1/2] memcg: special ID lookup routine KAMEZAWA Hiroyuki
2010-09-24 9:16 ` [RFC][PATCH 2/2] memcg: use ID instead of pointer KAMEZAWA Hiroyuki
2010-09-27 9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
2010-09-27 9:51 ` [RFC][PATCH 1/4] memcg: replace page_cgroup->mem_cgroup to be unsigned short KAMEZAWA Hiroyuki
2010-09-27 9:52 ` [RFC][PATCH 2/4] memcg: make css ID visible at cgroup creation time KAMEZAWA Hiroyuki
2010-09-27 9:54 ` [RFC][PATCH 3/4] memcg: reduce size of mem_cgroup by removing per-node info array KAMEZAWA Hiroyuki
2010-09-27 9:54 ` [RFC][PATCH 4/4] memcg: per node info node hotplug support KAMEZAWA Hiroyuki
2010-09-30 5:31 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox