[RFC][PATCH 0/2] memcg: use ID instead of pointer in page

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH 0/2] memcg: use ID instead of pointer in page_cgroup , retry.
@ 2010-09-24  9:13 KAMEZAWA Hiroyuki
  2010-09-24  9:15 ` [RFC][PATCH 1/2] memcg: special ID lookup routine KAMEZAWA Hiroyuki
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-24  9:13 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, balbir, nishimura, akpm


This is a reviced series of use ID.
Restart from RFC.

[1/2] implementation of special ID lookup
[2/2] use ID in mm/memcontrol.c

People may say use css_lookup() and don't add a special routine but
I can't believw css_lookup() can give us enough speed at every page LRU handling
if the number of cgroup is big. I think this patch itself is enough simple...
but I admit this will make mem_cgroup more complex. Hmm.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC][PATCH 1/2] memcg: special ID lookup routine
  2010-09-24  9:13 [RFC][PATCH 0/2] memcg: use ID instead of pointer in page_cgroup , retry KAMEZAWA Hiroyuki
@ 2010-09-24  9:15 ` KAMEZAWA Hiroyuki
  2010-09-24  9:16 ` [RFC][PATCH 2/2] memcg: use ID instead of pointer KAMEZAWA Hiroyuki
  2010-09-27  9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
  2 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-24  9:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

It seems previous patches are not welcomed, this is a revised one.
My purpose is to replace pc->mem_cgroup to be pc->mem_cgroup_id and to prevent
using more memory when pc->blkio_cgroup_id is added.

As 1st step, this patch implements a lookup table from ID.
For usual lookup, css_lookup() will work enough well but it may have to
access several level of idr radix-tree. Memory cgroup's limit is 65536 and
as far as I here, there are a user who uses 2000+ memory cgroup on a system.
And with generic rcu based lookup routine, the caller has to

Type A:
	rcu_read_lock()
	obj = obj_lookup()
	atomic_inc(obj->refcnt)
	rcu_read_unlock()
	/* do jobs */
Type B:
	rcu_read_lock()
	obj = rcu_lookup()
	/* do jobs */
	rcu_read_unlock()

Under some spinlock in many case.
(Type A is very bad in busy routine and even type B has to check the
 object is alive or not. It's not no cost)
This is complicated.

Because page_cgroup -> mem_cgroup information is required at every LRU
operatons, I think it's worth to add a special lookup routine for reducing
cache footprint and, with some limitaton, lookup routine can be RCU free.

Note:
 - memcg_lookup() is defined but not used. it's called in other patch.

Changelog:
 - no hooks to cgroup.
 - no limitation of the number of memcg.
 - delay table allocation until memory cgroup is really used.
 - No RCU routine. (depends on the limitation to callers newly added.)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

Index: mmotm-0922/mm/memcontrol.c
===================================================================
--- mmotm-0922.orig/mm/memcontrol.c
+++ mmotm-0922/mm/memcontrol.c
@@ -198,6 +198,7 @@ static void mem_cgroup_oom_notify(struct
  */
 struct mem_cgroup {
 	struct cgroup_subsys_state css;
+	bool	cached;
 	/*
 	 * the counter to account for memory usage
 	 */
@@ -352,6 +353,65 @@ static void mem_cgroup_put(struct mem_cg
 static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
 static void drain_all_stock_async(void);
 
+#define MEMCG_ARRAY_SIZE	(sizeof(struct mem_cgroup *) *(65536))
+struct mem_cgroup **memcg_array __read_mostly;
+DEFINE_SPINLOCK(memcg_array_lock);
+
+/*
+ * A quick lookup routine for memory cgroup via ID. This can be used
+ * until destroy() is called against memory cgroup. Then, in most case,
+ * there must be page_cgroups or tasks which points to memcg.
+ * So, cannot be used for swap_cgroup reference.
+ */
+static struct mem_cgroup *memcg_lookup(int id)
+{
+	if (id == 0)
+		return NULL;
+	if (id == 1)
+		return root_mem_cgroup;
+	return *(memcg_array + id);
+}
+
+static void memcg_lookup_set(struct mem_cgroup *mem)
+{
+	int id;
+
+	if (likely(mem->cached) || mem == root_mem_cgroup)
+		return;
+	id = css_id(&mem->css);
+	/* There are race with other "set" entry. need to avoid double refcnt */
+	spin_lock(&memcg_array_lock);
+	if (!(*(memcg_array + id))) {
+		mem_cgroup_get(mem);
+		*(memcg_array + id) = mem;
+		mem->cached = true;
+	}
+	spin_unlock(&memcg_array_lock);
+}
+
+static void memcg_lookup_clear(struct mem_cgroup *mem)
+{
+	int id = css_id(&mem->css);
+	/* No race with other look up/set/unset entry */
+	*(memcg_array + id) = NULL;
+	mem_cgroup_put(mem);
+}
+
+static int init_mem_cgroup_lookup_array(void)
+{
+	int size;
+
+	if (memcg_array)
+		return 0;
+
+	size = MEMCG_ARRAY_SIZE;
+	memcg_array = __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO,
+				PAGE_KERNEL);
+	if (!memcg_array)
+		return -ENOMEM;
+
+	return 0;
+}
 
 static struct mem_cgroup_per_zone *
 mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
@@ -2096,6 +2156,7 @@ static void __mem_cgroup_commit_charge(s
 		mem_cgroup_cancel_charge(mem);
 		return;
 	}
+	memcg_lookup_set(mem);
 
 	pc->mem_cgroup = mem;
 	/*
@@ -4341,6 +4402,10 @@ mem_cgroup_create(struct cgroup_subsys *
 		}
 		hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
 	} else {
+		/* Allocation of lookup array is delayd until creat cgroup */
+		error = init_mem_cgroup_lookup_array();
+		if (error == -ENOMEM)
+			goto free_out;
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
 		mem->oom_kill_disable = parent->oom_kill_disable;
@@ -4389,6 +4454,8 @@ static void mem_cgroup_destroy(struct cg
 {
 	struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
 
+	memcg_lookup_clear(mem);
+
 	mem_cgroup_put(mem);
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC][PATCH 2/2] memcg: use ID instead of pointer
  2010-09-24  9:13 [RFC][PATCH 0/2] memcg: use ID instead of pointer in page_cgroup , retry KAMEZAWA Hiroyuki
  2010-09-24  9:15 ` [RFC][PATCH 1/2] memcg: special ID lookup routine KAMEZAWA Hiroyuki
@ 2010-09-24  9:16 ` KAMEZAWA Hiroyuki
  2010-09-27  9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
  2 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-24  9:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

replaces page_cgroup->mem_cgroup to be an unsigned short.
And add an ID for blkio cgroup.

More work will be required for reducing sturct page_cgroup size,
but maybe good as 1st step.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |    3 ++-
 mm/memcontrol.c             |   34 ++++++++++++++++++++--------------
 mm/page_cgroup.c            |    2 +-
 3 files changed, 23 insertions(+), 16 deletions(-)

Index: mmotm-0922/include/linux/page_cgroup.h
===================================================================
--- mmotm-0922.orig/include/linux/page_cgroup.h
+++ mmotm-0922/include/linux/page_cgroup.h
@@ -12,7 +12,8 @@
  */
 struct page_cgroup {
 	unsigned long flags;
-	struct mem_cgroup *mem_cgroup;
+	unsigned short mem_cgroup;
+	unsigned short blkio_cgroup;
 	struct page *page;
 	struct list_head lru;		/* per cgroup LRU list */
 };
Index: mmotm-0922/mm/memcontrol.c
===================================================================
--- mmotm-0922.orig/mm/memcontrol.c
+++ mmotm-0922/mm/memcontrol.c
@@ -427,7 +427,7 @@ struct cgroup_subsys_state *mem_cgroup_c
 static struct mem_cgroup_per_zone *
 page_cgroup_zoneinfo(struct page_cgroup *pc)
 {
-	struct mem_cgroup *mem = pc->mem_cgroup;
+	struct mem_cgroup *mem = memcg_lookup(pc->mem_cgroup);
 	int nid = page_cgroup_nid(pc);
 	int zid = page_cgroup_zid(pc);
 
@@ -838,6 +838,11 @@ static inline bool mem_cgroup_is_root(st
 	return (mem == root_mem_cgroup);
 }
 
+static inline bool mem_cgroup_id_is_root(unsigned short id)
+{
+	return (id == 1);
+}
+
 /*
  * Following LRU functions are allowed to be used without PCG_LOCK.
  * Operations are called by routine of global LRU independently from memcg.
@@ -870,7 +875,7 @@ void mem_cgroup_del_lru_list(struct page
 	 */
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
-	if (mem_cgroup_is_root(pc->mem_cgroup))
+	if (mem_cgroup_id_is_root(pc->mem_cgroup))
 		return;
 	VM_BUG_ON(list_empty(&pc->lru));
 	list_del_init(&pc->lru);
@@ -897,7 +902,7 @@ void mem_cgroup_rotate_lru_list(struct p
 	 */
 	smp_rmb();
 	/* unused or root page is not rotated. */
-	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
+	if (!PageCgroupUsed(pc) || mem_cgroup_id_is_root(pc->mem_cgroup))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -923,7 +928,7 @@ void mem_cgroup_add_lru_list(struct page
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
 	SetPageCgroupAcctLRU(pc);
-	if (mem_cgroup_is_root(pc->mem_cgroup))
+	if (mem_cgroup_id_is_root(pc->mem_cgroup))
 		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
@@ -1663,7 +1668,7 @@ static void mem_cgroup_update_file_stat(
 		return;
 
 	rcu_read_lock();
-	mem = pc->mem_cgroup;
+	mem = memcg_lookup(pc->mem_cgroup);
 	if (unlikely(!mem || !PageCgroupUsed(pc)))
 		goto out;
 	/* pc->mem_cgroup is unstable ? */
@@ -1671,7 +1676,7 @@ static void mem_cgroup_update_file_stat(
 		/* take a lock against to access pc->mem_cgroup */
 		lock_page_cgroup(pc);
 		need_unlock = true;
-		mem = pc->mem_cgroup;
+		mem = memcg_lookup(pc->mem_cgroup);
 		if (!mem || !PageCgroupUsed(pc))
 			goto out;
 	}
@@ -2121,7 +2126,7 @@ struct mem_cgroup *try_get_mem_cgroup_fr
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		mem = pc->mem_cgroup;
+		mem = memcg_lookup(pc->mem_cgroup);
 		if (mem && !css_tryget(&mem->css))
 			mem = NULL;
 	} else if (PageSwapCache(page)) {
@@ -2158,7 +2163,7 @@ static void __mem_cgroup_commit_charge(s
 	}
 	memcg_lookup_set(mem);
 
-	pc->mem_cgroup = mem;
+	pc->mem_cgroup = css_id(&mem->css);
 	/*
 	 * We access a page_cgroup asynchronously without lock_page_cgroup().
 	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
@@ -2216,7 +2221,7 @@ static void __mem_cgroup_move_account(st
 	VM_BUG_ON(PageLRU(pc->page));
 	VM_BUG_ON(!PageCgroupLocked(pc));
 	VM_BUG_ON(!PageCgroupUsed(pc));
-	VM_BUG_ON(pc->mem_cgroup != from);
+	VM_BUG_ON(pc->mem_cgroup != css_id(&from->css));
 
 	if (PageCgroupFileMapped(pc)) {
 		/* Update mapped_file data for mem_cgroup */
@@ -2231,7 +2236,7 @@ static void __mem_cgroup_move_account(st
 		mem_cgroup_cancel_charge(from);
 
 	/* caller should have done css_get */
-	pc->mem_cgroup = to;
+	pc->mem_cgroup = css_id(&to->css);
 	mem_cgroup_charge_statistics(to, pc, true);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
@@ -2251,7 +2256,7 @@ static int mem_cgroup_move_account(struc
 {
 	int ret = -EINVAL;
 	lock_page_cgroup(pc);
-	if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
+	if (PageCgroupUsed(pc) && pc->mem_cgroup == css_id(&from->css)) {
 		__mem_cgroup_move_account(pc, from, to, uncharge);
 		ret = 0;
 	}
@@ -2590,7 +2595,7 @@ __mem_cgroup_uncharge_common(struct page
 
 	lock_page_cgroup(pc);
 
-	mem = pc->mem_cgroup;
+	mem = memcg_lookup(pc->mem_cgroup);
 
 	if (!PageCgroupUsed(pc))
 		goto unlock_out;
@@ -2835,7 +2840,7 @@ int mem_cgroup_prepare_migration(struct 
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		mem = pc->mem_cgroup;
+		mem = memcg_lookup(pc->mem_cgroup);
 		css_get(&mem->css);
 		/*
 		 * At migrating an anonymous page, its mapcount goes down
@@ -4652,7 +4657,8 @@ static int is_target_pte_for_mc(struct v
 		 * mem_cgroup_move_account() checks the pc is valid or not under
 		 * the lock.
 		 */
-		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+		if (PageCgroupUsed(pc) &&
+			pc->mem_cgroup == css_id(&mc.from->css)) {
 			ret = MC_TARGET_PAGE;
 			if (target)
 				target->page = page;
Index: mmotm-0922/mm/page_cgroup.c
===================================================================
--- mmotm-0922.orig/mm/page_cgroup.c
+++ mmotm-0922/mm/page_cgroup.c
@@ -15,7 +15,7 @@ static void __meminit
 __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
 {
 	pc->flags = 0;
-	pc->mem_cgroup = NULL;
+	pc->mem_cgroup = 0;
 	pc->page = pfn_to_page(pfn);
 	INIT_LIST_HEAD(&pc->lru);
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2.
  2010-09-24  9:13 [RFC][PATCH 0/2] memcg: use ID instead of pointer in page_cgroup , retry KAMEZAWA Hiroyuki
  2010-09-24  9:15 ` [RFC][PATCH 1/2] memcg: special ID lookup routine KAMEZAWA Hiroyuki
  2010-09-24  9:16 ` [RFC][PATCH 2/2] memcg: use ID instead of pointer KAMEZAWA Hiroyuki
@ 2010-09-27  9:48 ` KAMEZAWA Hiroyuki
  2010-09-27  9:51   ` [RFC][PATCH 1/4] memcg: replace page_cgroup->mem_cgroup to be unsigned short KAMEZAWA Hiroyuki
                     ` (4 more replies)
  2 siblings, 5 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-27  9:48 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm

On Fri, 24 Sep 2010 18:13:02 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> 
> This is a reviced series of use ID.
> Restart from RFC.
> 

Then, I changed my mind..This is a new set. No new special lookups.
But you may feel somethig strange. I don't want to merge these patches
at once. Just think this set as a dump of my stack. Any comments are welcome.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC][PATCH 1/4] memcg: replace page_cgroup->mem_cgroup to be unsigned short
  2010-09-27  9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
@ 2010-09-27  9:51   ` KAMEZAWA Hiroyuki
  2010-09-27  9:52   ` [RFC][PATCH 2/4] memcg: make css ID visible at cgroup creation time KAMEZAWA Hiroyuki
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-27  9:51 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

replaces page_cgroup->mem_cgroup to be an unsigned short.
And add an ID for blkio cgroup.

More work will be required for reducing sturct page_cgroup size,
but maybe good as 1st step. As far as I tested, css_lookup() is enough fast.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |    3 +
 mm/memcontrol.c             |   86 +++++++++++++++++++++++++++-----------------
 mm/page_cgroup.c            |    2 -
 3 files changed, 57 insertions(+), 34 deletions(-)

Index: mmotm-0922/include/linux/page_cgroup.h
===================================================================
--- mmotm-0922.orig/include/linux/page_cgroup.h
+++ mmotm-0922/include/linux/page_cgroup.h
@@ -12,7 +12,8 @@
  */
 struct page_cgroup {
 	unsigned long flags;
-	struct mem_cgroup *mem_cgroup;
+	unsigned short mem_cgroup;
+	unsigned short blkio_cgroup;
 	struct page *page;
 	struct list_head lru;		/* per cgroup LRU list */
 };
Index: mmotm-0922/mm/memcontrol.c
===================================================================
--- mmotm-0922.orig/mm/memcontrol.c
+++ mmotm-0922/mm/memcontrol.c
@@ -352,6 +352,40 @@ static void mem_cgroup_put(struct mem_cg
 static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
 static void drain_all_stock_async(void);
 
+/*
+ * A helper function to get mem_cgroup from ID. must be called under
+ * rcu_read_lock(). The caller must check css_is_removed() or some if
+ * it's concern. (dropping refcnt from swap can be called against removed
+ * memcg.)
+ */
+static struct mem_cgroup *mem_cgroup_lookup(unsigned short id)
+{
+	struct cgroup_subsys_state *css;
+
+	/* ID 0 is unused ID */
+	if (!id)
+		return NULL;
+	if (id == 1)
+		return root_mem_cgroup;
+	css = css_lookup(&mem_cgroup_subsys, id);
+	if (!css)
+		return NULL;
+	return container_of(css, struct mem_cgroup, css);
+}
+
+/*
+ * If the ID is from pc->mem_cgroup, the mem_cgroup refered by ID must be
+ * exist as "valid" cgroup. It's guaranteed. You can use this easy function.
+ */
+static struct memcg_lookup(unsigned short id)
+{
+	struct mem_cgroup *mem;
+
+	rcu_read_lock();
+	mem = mem_cgroup_lookup(id);
+	rcu_read_unlock();
+	return mem;
+}
 
 static struct mem_cgroup_per_zone *
 mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
@@ -367,7 +401,7 @@ struct cgroup_subsys_state *mem_cgroup_c
 static struct mem_cgroup_per_zone *
 page_cgroup_zoneinfo(struct page_cgroup *pc)
 {
-	struct mem_cgroup *mem = pc->mem_cgroup;
+	struct mem_cgroup *mem = memcg_lookup(pc->mem_cgroup);
 	int nid = page_cgroup_nid(pc);
 	int zid = page_cgroup_zid(pc);
 
@@ -778,6 +812,11 @@ static inline bool mem_cgroup_is_root(st
 	return (mem == root_mem_cgroup);
 }
 
+static inline bool mem_cgroup_id_is_root(unsigned short id)
+{
+	return (id == 1);
+}
+
 /*
  * Following LRU functions are allowed to be used without PCG_LOCK.
  * Operations are called by routine of global LRU independently from memcg.
@@ -810,7 +849,7 @@ void mem_cgroup_del_lru_list(struct page
 	 */
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
-	if (mem_cgroup_is_root(pc->mem_cgroup))
+	if (mem_cgroup_id_is_root(pc->mem_cgroup))
 		return;
 	VM_BUG_ON(list_empty(&pc->lru));
 	list_del_init(&pc->lru);
@@ -837,7 +876,7 @@ void mem_cgroup_rotate_lru_list(struct p
 	 */
 	smp_rmb();
 	/* unused or root page is not rotated. */
-	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
+	if (!PageCgroupUsed(pc) || mem_cgroup_id_is_root(pc->mem_cgroup))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -863,7 +902,7 @@ void mem_cgroup_add_lru_list(struct page
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
 	SetPageCgroupAcctLRU(pc);
-	if (mem_cgroup_is_root(pc->mem_cgroup))
+	if (mem_cgroup_id_is_root(pc->mem_cgroup))
 		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
@@ -1603,7 +1642,7 @@ static void mem_cgroup_update_file_stat(
 		return;
 
 	rcu_read_lock();
-	mem = pc->mem_cgroup;
+	mem = memcg_lookup(pc->mem_cgroup);
 	if (unlikely(!mem || !PageCgroupUsed(pc)))
 		goto out;
 	/* pc->mem_cgroup is unstable ? */
@@ -1611,7 +1650,7 @@ static void mem_cgroup_update_file_stat(
 		/* take a lock against to access pc->mem_cgroup */
 		lock_page_cgroup(pc);
 		need_unlock = true;
-		mem = pc->mem_cgroup;
+		mem = memcg_lookup(pc->mem_cgroup);
 		if (!mem || !PageCgroupUsed(pc))
 			goto out;
 	}
@@ -2030,24 +2069,6 @@ static void mem_cgroup_cancel_charge(str
 	__mem_cgroup_cancel_charge(mem, 1);
 }
 
-/*
- * A helper function to get mem_cgroup from ID. must be called under
- * rcu_read_lock(). The caller must check css_is_removed() or some if
- * it's concern. (dropping refcnt from swap can be called against removed
- * memcg.)
- */
-static struct mem_cgroup *mem_cgroup_lookup(unsigned short id)
-{
-	struct cgroup_subsys_state *css;
-
-	/* ID 0 is unused ID */
-	if (!id)
-		return NULL;
-	css = css_lookup(&mem_cgroup_subsys, id);
-	if (!css)
-		return NULL;
-	return container_of(css, struct mem_cgroup, css);
-}
 
 struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
 {
@@ -2061,7 +2082,7 @@ struct mem_cgroup *try_get_mem_cgroup_fr
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		mem = pc->mem_cgroup;
+		mem = memcg_lookup(pc->mem_cgroup);
 		if (mem && !css_tryget(&mem->css))
 			mem = NULL;
 	} else if (PageSwapCache(page)) {
@@ -2097,7 +2118,7 @@ static void __mem_cgroup_commit_charge(s
 		return;
 	}
 
-	pc->mem_cgroup = mem;
+	pc->mem_cgroup = css_id(&mem->css);
 	/*
 	 * We access a page_cgroup asynchronously without lock_page_cgroup().
 	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
@@ -2155,7 +2176,7 @@ static void __mem_cgroup_move_account(st
 	VM_BUG_ON(PageLRU(pc->page));
 	VM_BUG_ON(!PageCgroupLocked(pc));
 	VM_BUG_ON(!PageCgroupUsed(pc));
-	VM_BUG_ON(pc->mem_cgroup != from);
+	VM_BUG_ON(pc->mem_cgroup != css_id(&from->css));
 
 	if (PageCgroupFileMapped(pc)) {
 		/* Update mapped_file data for mem_cgroup */
@@ -2170,7 +2191,7 @@ static void __mem_cgroup_move_account(st
 		mem_cgroup_cancel_charge(from);
 
 	/* caller should have done css_get */
-	pc->mem_cgroup = to;
+	pc->mem_cgroup = css_id(&to->css);
 	mem_cgroup_charge_statistics(to, pc, true);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
@@ -2190,7 +2211,7 @@ static int mem_cgroup_move_account(struc
 {
 	int ret = -EINVAL;
 	lock_page_cgroup(pc);
-	if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
+	if (PageCgroupUsed(pc) && pc->mem_cgroup == css_id(&from->css)) {
 		__mem_cgroup_move_account(pc, from, to, uncharge);
 		ret = 0;
 	}
@@ -2529,7 +2550,7 @@ __mem_cgroup_uncharge_common(struct page
 
 	lock_page_cgroup(pc);
 
-	mem = pc->mem_cgroup;
+	mem = memcg_lookup(pc->mem_cgroup);
 
 	if (!PageCgroupUsed(pc))
 		goto unlock_out;
@@ -2774,7 +2795,7 @@ int mem_cgroup_prepare_migration(struct 
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		mem = pc->mem_cgroup;
+		mem = memcg_lookup(pc->mem_cgroup);
 		css_get(&mem->css);
 		/*
 		 * At migrating an anonymous page, its mapcount goes down
@@ -4585,7 +4606,8 @@ static int is_target_pte_for_mc(struct v
 		 * mem_cgroup_move_account() checks the pc is valid or not under
 		 * the lock.
 		 */
-		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+		if (PageCgroupUsed(pc) &&
+			pc->mem_cgroup == css_id(&mc.from->css)) {
 			ret = MC_TARGET_PAGE;
 			if (target)
 				target->page = page;
Index: mmotm-0922/mm/page_cgroup.c
===================================================================
--- mmotm-0922.orig/mm/page_cgroup.c
+++ mmotm-0922/mm/page_cgroup.c
@@ -15,7 +15,7 @@ static void __meminit
 __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
 {
 	pc->flags = 0;
-	pc->mem_cgroup = NULL;
+	pc->mem_cgroup = 0;
 	pc->page = pfn_to_page(pfn);
 	INIT_LIST_HEAD(&pc->lru);
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC][PATCH 2/4] memcg: make css ID visible at cgroup creation time
  2010-09-27  9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
  2010-09-27  9:51   ` [RFC][PATCH 1/4] memcg: replace page_cgroup->mem_cgroup to be unsigned short KAMEZAWA Hiroyuki
@ 2010-09-27  9:52   ` KAMEZAWA Hiroyuki
  2010-09-27  9:54   ` [RFC][PATCH 3/4] memcg: reduce size of mem_cgroup by removing per-node info array KAMEZAWA Hiroyuki
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-27  9:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, css'id is allocated after ->create() is called. But to make use of ID
in ->create(), it should be available before ->create().

In another thinking, considering the ID is tightly coupled with "css",
it should be allocated when "css" is allocated.
This patch moves alloc_css_id() to css allocation routine. Now, only 2 subsys,
memory and blkio are using ID. (To support complicated hierarchy walk.)

ID will be used in mem cgroup's ->create(), later.

This patch adds css ID documentation which is not provided.

Note:
If someone changes rules of css allocation, ID allocation should be changed.

Changelog: 2010/09/01
 - modified cgroups.txt

Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/cgroups/cgroups.txt |   48 ++++++++++++++++++++++++++++++++++++
 block/blk-cgroup.c                |    9 ++++++
 include/linux/cgroup.h            |   16 ++++++------
 kernel/cgroup.c                   |   50 +++++++++++---------------------------
 mm/memcontrol.c                   |    5 +++
 5 files changed, 86 insertions(+), 42 deletions(-)

Index: mmotm-0922/kernel/cgroup.c
===================================================================
--- mmotm-0922.orig/kernel/cgroup.c
+++ mmotm-0922/kernel/cgroup.c
@@ -288,9 +288,6 @@ struct cg_cgroup_link {
 static struct css_set init_css_set;
 static struct cg_cgroup_link init_css_set_link;
 
-static int cgroup_init_idr(struct cgroup_subsys *ss,
-			   struct cgroup_subsys_state *css);
-
 /* css_set_lock protects the list of css_set objects, and the
  * chain of tasks off each css_set.  Nests outside task->alloc_lock
  * due to cgroup_iter_start() */
@@ -769,9 +766,6 @@ static struct backing_dev_info cgroup_ba
 	.capabilities	= BDI_CAP_NO_ACCT_AND_WRITEBACK,
 };
 
-static int alloc_css_id(struct cgroup_subsys *ss,
-			struct cgroup *parent, struct cgroup *child);
-
 static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb)
 {
 	struct inode *inode = new_inode(sb);
@@ -3254,7 +3248,8 @@ static void init_cgroup_css(struct cgrou
 	css->cgroup = cgrp;
 	atomic_set(&css->refcnt, 1);
 	css->flags = 0;
-	css->id = NULL;
+	if (!ss->use_id)
+		css->id = NULL;
 	if (cgrp == dummytop)
 		set_bit(CSS_ROOT, &css->flags);
 	BUG_ON(cgrp->subsys[ss->subsys_id]);
@@ -3339,12 +3334,6 @@ static long cgroup_create(struct cgroup 
 			goto err_destroy;
 		}
 		init_cgroup_css(css, ss, cgrp);
-		if (ss->use_id) {
-			err = alloc_css_id(ss, parent, cgrp);
-			if (err)
-				goto err_destroy;
-		}
-		/* At error, ->destroy() callback has to free assigned ID. */
 	}
 
 	cgroup_lock_hierarchy(root);
@@ -3706,17 +3695,6 @@ int __init_or_module cgroup_load_subsys(
 
 	/* our new subsystem will be attached to the dummy hierarchy. */
 	init_cgroup_css(css, ss, dummytop);
-	/* init_idr must be after init_cgroup_css because it sets css->id. */
-	if (ss->use_id) {
-		int ret = cgroup_init_idr(ss, css);
-		if (ret) {
-			dummytop->subsys[ss->subsys_id] = NULL;
-			ss->destroy(ss, dummytop);
-			subsys[i] = NULL;
-			mutex_unlock(&cgroup_mutex);
-			return ret;
-		}
-	}
 
 	/*
 	 * Now we need to entangle the css into the existing css_sets. unlike
@@ -3885,8 +3863,6 @@ int __init cgroup_init(void)
 		struct cgroup_subsys *ss = subsys[i];
 		if (!ss->early_init)
 			cgroup_init_subsys(ss);
-		if (ss->use_id)
-			cgroup_init_idr(ss, init_css_set.subsys[ss->subsys_id]);
 	}
 
 	/* Add init_css_set to the hash table */
@@ -4600,8 +4576,8 @@ err_out:
 
 }
 
-static int __init_or_module cgroup_init_idr(struct cgroup_subsys *ss,
-					    struct cgroup_subsys_state *rootcss)
+static int cgroup_init_idr(struct cgroup_subsys *ss,
+			    struct cgroup_subsys_state *rootcss)
 {
 	struct css_id *newid;
 
@@ -4613,21 +4589,25 @@ static int __init_or_module cgroup_init_
 		return PTR_ERR(newid);
 
 	newid->stack[0] = newid->id;
-	newid->css = rootcss;
-	rootcss->id = newid;
+	rcu_assign_pointer(newid->css, rootcss);
+	rcu_assign_pointer(rootcss->id, newid);
 	return 0;
 }
 
-static int alloc_css_id(struct cgroup_subsys *ss, struct cgroup *parent,
-			struct cgroup *child)
+int alloc_css_id(struct cgroup_subsys *ss,
+	struct cgroup *cgrp, struct cgroup_subsys_state *css)
 {
 	int subsys_id, i, depth = 0;
-	struct cgroup_subsys_state *parent_css, *child_css;
+	struct cgroup_subsys_state *parent_css;
+	struct cgroup *parent;
 	struct css_id *child_id, *parent_id;
 
+	if (cgrp == dummytop)
+		return cgroup_init_idr(ss, css);
+
+	parent = cgrp->parent;
 	subsys_id = ss->subsys_id;
 	parent_css = parent->subsys[subsys_id];
-	child_css = child->subsys[subsys_id];
 	parent_id = parent_css->id;
 	depth = parent_id->depth + 1;
 
@@ -4642,7 +4622,7 @@ static int alloc_css_id(struct cgroup_su
 	 * child_id->css pointer will be set after this cgroup is available
 	 * see cgroup_populate_dir()
 	 */
-	rcu_assign_pointer(child_css->id, child_id);
+	rcu_assign_pointer(css->id, child_id);
 
 	return 0;
 }
Index: mmotm-0922/include/linux/cgroup.h
===================================================================
--- mmotm-0922.orig/include/linux/cgroup.h
+++ mmotm-0922/include/linux/cgroup.h
@@ -588,9 +588,11 @@ static inline int cgroup_attach_task_cur
 /*
  * CSS ID is ID for cgroup_subsys_state structs under subsys. This only works
  * if cgroup_subsys.use_id == true. It can be used for looking up and scanning.
- * CSS ID is assigned at cgroup allocation (create) automatically
- * and removed when subsys calls free_css_id() function. This is because
- * the lifetime of cgroup_subsys_state is subsys's matter.
+ * CSS ID must be assigned by subsys itself at cgroup creation and deleted
+ * when subsys calls free_css_id() function. This is because the life time of
+ * of cgroup_subsys_state is subsys's matter.
+ *
+ * ID->css look up is available after cgroup's directory is populated.
  *
  * Looking up and scanning function should be called under rcu_read_lock().
  * Taking cgroup_mutex()/hierarchy_mutex() is not necessary for following calls.
@@ -598,10 +600,10 @@ static inline int cgroup_attach_task_cur
  * destroyed". The caller should check css and cgroup's status.
  */
 
-/*
- * Typically Called at ->destroy(), or somewhere the subsys frees
- * cgroup_subsys_state.
- */
+/* Should be called in ->create() by subsys itself */
+int alloc_css_id(struct cgroup_subsys *ss, struct cgroup *newgr,
+		struct cgroup_subsys_state *css);
+/* Typically Called at ->destroy(), or somewhere the subsys frees css */
 void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css);
 
 /* Find a cgroup_subsys_state which has given ID */
Index: mmotm-0922/mm/memcontrol.c
===================================================================
--- mmotm-0922.orig/mm/memcontrol.c
+++ mmotm-0922/mm/memcontrol.c
@@ -4347,6 +4347,11 @@ mem_cgroup_create(struct cgroup_subsys *
 		if (alloc_mem_cgroup_per_zone_info(mem, node))
 			goto free_out;
 
+	error = alloc_css_id(ss, cont, &mem->css);
+	if (error)
+		goto free_out;
+	/* Here, css_id(&mem->css) works. but css_lookup(id)->mem doesn't */
+
 	/* root ? */
 	if (cont->parent == NULL) {
 		int cpu;
Index: mmotm-0922/block/blk-cgroup.c
===================================================================
--- mmotm-0922.orig/block/blk-cgroup.c
+++ mmotm-0922/block/blk-cgroup.c
@@ -1434,9 +1434,13 @@ blkiocg_create(struct cgroup_subsys *sub
 {
 	struct blkio_cgroup *blkcg;
 	struct cgroup *parent = cgroup->parent;
+	int ret;
 
 	if (!parent) {
 		blkcg = &blkio_root_cgroup;
+		ret = alloc_css_id(subsys, cgroup, &blkcg->css);
+		if (ret)
+			return ERR_PTR(ret);
 		goto done;
 	}
 
@@ -1447,6 +1451,11 @@ blkiocg_create(struct cgroup_subsys *sub
 	blkcg = kzalloc(sizeof(*blkcg), GFP_KERNEL);
 	if (!blkcg)
 		return ERR_PTR(-ENOMEM);
+	ret = alloc_css_id(subsys, cgroup, &blkcg->css);
+	if (ret) {
+		kfree(blkcg);
+		return ERR_PTR(ret);
+	}
 
 	blkcg->weight = BLKIO_WEIGHT_DEFAULT;
 done:
Index: mmotm-0922/Documentation/cgroups/cgroups.txt
===================================================================
--- mmotm-0922.orig/Documentation/cgroups/cgroups.txt
+++ mmotm-0922/Documentation/cgroups/cgroups.txt
@@ -621,6 +621,54 @@ and root cgroup. Currently this will onl
 the default hierarchy (which never has sub-cgroups) and a hierarchy
 that is being created/destroyed (and hence has no sub-cgroups).
 
+3.4 cgroup subsys state IDs.
+------------
+When subsystem sets use_id == true, an ID per [cgroup, subsys] is added
+and it will be tied to cgroup_subsys_state object.
+
+When use_id==true can use following interfaces. But please note that
+allocation/free an ID is subsystem's job because cgroup_subsys_state
+object's lifetime is subsystem's matter.
+
+unsigned short css_id(struct cgroup_subsys_state *css)
+
+Returns ID of cgroup_subsys_state
+
+unsigend short css_depth(struct cgroup_subsys_state *css)
+
+Returns the level which "css" is exisiting under hierarchy tree.
+The root cgroup's depth 0, its children are 1, children's children are
+2....
+
+int alloc_css_id(struct struct cgroup_subsys *ss, struct cgroup *newgr,
+                struct cgroup_subsys_state *css);
+
+Attach an new ID to given css under subsystem ([ss, cgroup])
+should be called in ->create() callback.
+
+void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css);
+
+Free ID attached to "css" under subsystem. Should be called before
+"css" is freed.
+
+struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id);
+
+Look up cgroup_subsys_state via ID. Should be called under rcu_read_lock().
+
+struct cgroup_subsys_state *css_get_next(struct cgroup_subsys *ss, int id,
+                struct cgroup_subsys_state *root, int *foundid);
+
+Returns ID which is under "root" i.e. under sub-directory of "root"
+cgroup's directory at considering cgroup hierarchy. The order of IDs
+returned by this function is not sorted. Please be careful.
+
+bool css_is_ancestor(struct cgroup_subsys_state *cg,
+                     const struct cgroup_subsys_state *root);
+
+Returns true if "root" and "cs" is under the same hierarchy and
+"root" can be found when you see all ->parent from "cs" until
+the root cgroup.
+
 4. Questions
 ============
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC][PATCH 3/4] memcg: reduce size of mem_cgroup by removing per-node info array
  2010-09-27  9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
  2010-09-27  9:51   ` [RFC][PATCH 1/4] memcg: replace page_cgroup->mem_cgroup to be unsigned short KAMEZAWA Hiroyuki
  2010-09-27  9:52   ` [RFC][PATCH 2/4] memcg: make css ID visible at cgroup creation time KAMEZAWA Hiroyuki
@ 2010-09-27  9:54   ` KAMEZAWA Hiroyuki
  2010-09-27  9:54   ` [RFC][PATCH 4/4] memcg: per node info node hotplug support KAMEZAWA Hiroyuki
  2010-09-30  5:31   ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
  4 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-27  9:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, memcgroup's per-zone structure is looked up as

	mem->info.nodeinfo[nid]->zoneinfo[zid]

1st. This nodeinfo is array of pointers of MAX_NUMNODES size. This makes
sizeof struct mem_cgroup very large and struct mem_cgroup will be allocated on
vmalloc() area because the size is larger than PAGE_SIZE.
(This will never be fixed even when nodehotplug is supported.)

2nd. Now, page_cgroup->mem_cgroup is an ID. Then, we need 2 level lookup up
to accesss per-zone structure as

	mem = css_lookup(pc->mem_cgroup);
	mz = mem->info.nodeinfo[nid]->zoneinfo[zid]

This look up seems wasteful. This patch removes mem->info and moves all per-zone
memcg onto radix-tree. mem_cgroup_per_zone structure can be found by

	radix_tree_lookup(&memcg_lrus, id_func(memcg, nid, zid)).

This makes memcg small (4440 bytes => 344bytes) and  combine 2 lookup into one.

Following patch will add memory hotplug support.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   86 +++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 57 insertions(+), 29 deletions(-)

Index: mmotm-0922/mm/memcontrol.c
===================================================================
--- mmotm-0922.orig/mm/memcontrol.c
+++ mmotm-0922/mm/memcontrol.c
@@ -122,13 +122,16 @@ struct mem_cgroup_per_zone {
 /* Macro for accessing counter */
 #define MEM_CGROUP_ZSTAT(mz, idx)	((mz)->count[(idx)])
 
-struct mem_cgroup_per_node {
-	struct mem_cgroup_per_zone zoneinfo[MAX_NR_ZONES];
-};
+RADIX_TREE(memcg_lrus, GFP_KERNEL);
+DEFINE_SPINLOCK(memcg_lrutable_lock);
 
-struct mem_cgroup_lru_info {
-	struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES];
-};
+static inline long node_zone_idx(int memcg, int node, int zone) {
+	unsigned long id;
+
+	id = ((node) << ZONES_SHIFT | (zone)) << 16;
+	id |= memcg;
+	return id;
+}
 
 /*
  * Cgroups above their limits are maintained in a RB-Tree, independent of
@@ -206,11 +209,6 @@ struct mem_cgroup {
 	 * the counter to account for mem+swap usage.
 	 */
 	struct res_counter memsw;
-	/*
-	 * Per cgroup active and inactive list, similar to the
-	 * per zone LRU lists.
-	 */
-	struct mem_cgroup_lru_info info;
 
 	/*
 	  protect against reclaim related member.
@@ -388,9 +386,14 @@ static struct mem_cgroup *memcg_lookup(u
 }
 
 static struct mem_cgroup_per_zone *
-mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
+mem_cgroup_zoneinfo(int memcgid, int nid, int zid)
 {
-	return &mem->info.nodeinfo[nid]->zoneinfo[zid];
+	struct mem_cgroup_per_zone *mz;
+
+	rcu_read_lock();
+	mz = radix_tree_lookup(&memcg_lrus, node_zone_idx(memcgid, nid, zid));
+	rcu_read_unlock();
+	return mz;
 }
 
 struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem)
@@ -401,14 +404,13 @@ struct cgroup_subsys_state *mem_cgroup_c
 static struct mem_cgroup_per_zone *
 page_cgroup_zoneinfo(struct page_cgroup *pc)
 {
-	struct mem_cgroup *mem = memcg_lookup(pc->mem_cgroup);
 	int nid = page_cgroup_nid(pc);
 	int zid = page_cgroup_zid(pc);
 
-	if (!mem)
+	if (!pc->mem_cgroup)
 		return NULL;
 
-	return mem_cgroup_zoneinfo(mem, nid, zid);
+	return mem_cgroup_zoneinfo(pc->mem_cgroup, nid, zid);
 }
 
 static struct mem_cgroup_tree_per_zone *
@@ -496,7 +498,7 @@ static void mem_cgroup_update_tree(struc
 	 * because their event counter is not touched.
 	 */
 	for (; mem; mem = parent_mem_cgroup(mem)) {
-		mz = mem_cgroup_zoneinfo(mem, nid, zid);
+		mz = mem_cgroup_zoneinfo(css_id(&mem->css), nid, zid);
 		excess = res_counter_soft_limit_excess(&mem->res);
 		/*
 		 * We have to update the tree if mz is on RB-tree or
@@ -525,7 +527,7 @@ static void mem_cgroup_remove_from_trees
 
 	for_each_node_state(node, N_POSSIBLE) {
 		for (zone = 0; zone < MAX_NR_ZONES; zone++) {
-			mz = mem_cgroup_zoneinfo(mem, node, zone);
+			mz = mem_cgroup_zoneinfo(css_id(&mem->css), node, zone);
 			mctz = soft_limit_tree_node_zone(node, zone);
 			mem_cgroup_remove_exceeded(mem, mz, mctz);
 		}
@@ -658,7 +660,7 @@ static unsigned long mem_cgroup_get_loca
 
 	for_each_online_node(nid)
 		for (zid = 0; zid < MAX_NR_ZONES; zid++) {
-			mz = mem_cgroup_zoneinfo(mem, nid, zid);
+			mz = mem_cgroup_zoneinfo(css_id(&mem->css), nid, zid);
 			total += MEM_CGROUP_ZSTAT(mz, idx);
 		}
 	return total;
@@ -1039,7 +1041,9 @@ unsigned long mem_cgroup_zone_nr_pages(s
 {
 	int nid = zone_to_nid(zone);
 	int zid = zone_idx(zone);
-	struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);
+	struct mem_cgroup_per_zone *mz;
+
+	mz = mem_cgroup_zoneinfo(css_id(&memcg->css), nid, zid);
 
 	return MEM_CGROUP_ZSTAT(mz, lru);
 }
@@ -1049,7 +1053,9 @@ struct zone_reclaim_stat *mem_cgroup_get
 {
 	int nid = zone_to_nid(zone);
 	int zid = zone_idx(zone);
-	struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(memcg, nid, zid);
+	struct mem_cgroup_per_zone *mz;
+
+	mz = mem_cgroup_zoneinfo(css_id(&memcg->css), nid, zid);
 
 	return &mz->reclaim_stat;
 }
@@ -1099,7 +1105,7 @@ unsigned long mem_cgroup_isolate_pages(u
 	int ret;
 
 	BUG_ON(!mem_cont);
-	mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);
+	mz = mem_cgroup_zoneinfo(css_id(&mem_cont->css), nid, zid);
 	src = &mz->lists[lru];
 
 	scan = 0;
@@ -3179,7 +3185,7 @@ static int mem_cgroup_force_empty_list(s
 	int ret = 0;
 
 	zone = &NODE_DATA(node)->node_zones[zid];
-	mz = mem_cgroup_zoneinfo(mem, node, zid);
+	mz = mem_cgroup_zoneinfo(css_id(&mem->css), node, zid);
 	list = &mz->lists[lru];
 
 	loop = MEM_CGROUP_ZSTAT(mz, lru);
@@ -3676,7 +3682,8 @@ static int mem_control_stat_show(struct 
 
 		for_each_online_node(nid)
 			for (zid = 0; zid < MAX_NR_ZONES; zid++) {
-				mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);
+				mz = mem_cgroup_zoneinfo(
+					css_id(&mem_cont->css), nid, zid);
 
 				recent_rotated[0] +=
 					mz->reclaim_stat.recent_rotated[0];
@@ -4173,10 +4180,9 @@ static int register_memsw_files(struct c
 
 static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
 {
-	struct mem_cgroup_per_node *pn;
 	struct mem_cgroup_per_zone *mz;
 	enum lru_list l;
-	int zone, tmp = node;
+	int id, zone, ret, tmp = node;
 	/*
 	 * This routine is called against possible nodes.
 	 * But it's BUG to call kmalloc() against offline node.
@@ -4187,27 +4193,51 @@ static int alloc_mem_cgroup_per_zone_inf
 	 */
 	if (!node_state(node, N_NORMAL_MEMORY))
 		tmp = -1;
-	pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
-	if (!pn)
-		return 1;
-
-	mem->info.nodeinfo[node] = pn;
-	memset(pn, 0, sizeof(*pn));
-
 	for (zone = 0; zone < MAX_NR_ZONES; zone++) {
-		mz = &pn->zoneinfo[zone];
+		mz = kzalloc_node(sizeof(struct mem_cgroup_per_zone),
+					GFP_KERNEL, tmp);
+		if (!mz)
+			break;
+		radix_tree_preload(GFP_KERNEL);
+		spin_lock_irq(&memcg_lrutable_lock);
+		id = node_zone_idx(css_id(&mem->css), node, zone);
+		ret = radix_tree_insert(&memcg_lrus, id, mz);
+		spin_unlock_irq(&memcg_lrutable_lock);
+		if (ret)
+			break;
 		for_each_lru(l)
 			INIT_LIST_HEAD(&mz->lists[l]);
-		mz->usage_in_excess = 0;
 		mz->on_tree = false;
 		mz->mem = mem;
 	}
-	return 0;
+	
+	if (zone == MAX_NR_ZONES)
+		return 0;
+
+	for (; zone >= 0; zone--) {
+		id = node_zone_idx(css_id(&mem->css), node, zone);
+		spin_lock_irq(&memcg_lrutable_lock);
+		mz = radix_tree_delete(&memcg_lrus, id);
+		spin_unlock_irq(&memcg_lrutable_lock);
+		kfree(mz);
+	}
+
+	return 1;
 }
 
 static void free_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
 {
-	kfree(mem->info.nodeinfo[node]);
+	int id, zone;
+	struct mem_cgroup_per_zone *mz;
+	unsigned long flags;
+
+	for (zone = 0; zone < MAX_NR_ZONES; zone++) {
+		id = node_zone_idx(css_id(&mem->css), node, zone);
+		spin_lock_irqsave(&memcg_lrutable_lock, flags);
+		mz = radix_tree_delete(&memcg_lrus, id);
+		spin_unlock_irqrestore(&memcg_lrutable_lock, flags);
+		kfree(mz);
+	}
 }
 
 static struct mem_cgroup *mem_cgroup_alloc(void)
@@ -4234,6 +4264,7 @@ static struct mem_cgroup *mem_cgroup_all
 		mem = NULL;
 	}
 	spin_lock_init(&mem->pcp_counter_lock);
+
 	return mem;
 }
 
@@ -4343,13 +4374,14 @@ mem_cgroup_create(struct cgroup_subsys *
 	if (!mem)
 		return ERR_PTR(error);
 
+	error = alloc_css_id(ss, cont, &mem->css);
+	if (error)
+		goto free_out;
+
 	for_each_node_state(node, N_POSSIBLE)
 		if (alloc_mem_cgroup_per_zone_info(mem, node))
 			goto free_out;
 
-	error = alloc_css_id(ss, cont, &mem->css);
-	if (error)
-		goto free_out;
 	/* Here, css_id(&mem->css) works. but css_lookup(id)->mem doesn't */
 
 	/* root ? */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC][PATCH 4/4] memcg: per node info node hotplug support
  2010-09-27  9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
                     ` (2 preceding siblings ...)
  2010-09-27  9:54   ` [RFC][PATCH 3/4] memcg: reduce size of mem_cgroup by removing per-node info array KAMEZAWA Hiroyuki
@ 2010-09-27  9:54   ` KAMEZAWA Hiroyuki
  2010-09-30  5:31   ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
  4 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-27  9:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Support node hot plug (experimental).

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/memcontrol.c |   46 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 43 insertions(+), 3 deletions(-)

Index: mmotm-0922/mm/memcontrol.c
===================================================================
--- mmotm-0922.orig/mm/memcontrol.c
+++ mmotm-0922/mm/memcontrol.c
@@ -48,6 +48,7 @@
 #include <linux/page_cgroup.h>
 #include <linux/cpu.h>
 #include <linux/oom.h>
+#include <linux/memory.h>
 #include "internal.h"
 
 #include <asm/uaccess.h>
@@ -4212,8 +4213,12 @@ static int alloc_mem_cgroup_per_zone_inf
 		id = node_zone_idx(css_id(&mem->css), node, zone);
 		ret = radix_tree_insert(&memcg_lrus, id, mz);
 		spin_unlock_irq(&memcg_lrutable_lock);
-		if (ret)
-			break;
+		if (ret) {
+			if (ret != -EEXIST)
+				break;
+			kfree(mz);
+			continue;
+		}
 		for_each_lru(l)
 			INIT_LIST_HEAD(&mz->lists[l]);
 		mz->on_tree = false;
@@ -4372,6 +4377,40 @@ static int mem_cgroup_soft_limit_tree_in
 	return 0;
 }
 
+static int __meminit memcg_memory_hotplug_callback(struct notifier_block *self,
+		unsigned long action, void *arg)
+{
+	struct memory_notify *mn = arg;
+	struct mem_cgroup *mem;
+	int nid = mn->status_change_nid;
+	int ret = 0;
+
+	/* We just take care of node hotplug */
+	if (nid == -1)
+		return NOTIFY_OK;
+	switch(action) {
+	case MEM_GOING_ONLINE:
+		for_each_mem_cgroup_all(mem)
+			ret = alloc_mem_cgroup_per_zone_info(mem, nid);
+		break;
+	case MEM_OFFLINE:
+		for_each_mem_cgroup_all(mem)
+			free_mem_cgroup_per_zone_info(mem, nid);
+		break;
+	default:
+		break;
+	}
+
+	if (ret)
+		ret = notifier_from_errno(ret);
+	else
+		ret = NOTIFY_OK;
+
+	return ret;
+}
+
+
+
 static struct cgroup_subsys_state * __ref
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
@@ -4387,7 +4426,7 @@ mem_cgroup_create(struct cgroup_subsys *
 	if (error)
 		goto free_out;
 
-	for_each_node_state(node, N_POSSIBLE)
+	for_each_node_state(node, N_HIGH_MEMORY)
 		if (alloc_mem_cgroup_per_zone_info(mem, node))
 			goto free_out;
 
@@ -4407,6 +4446,7 @@ mem_cgroup_create(struct cgroup_subsys *
 			INIT_WORK(&stock->work, drain_local_stock);
 		}
 		hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
+		hotplug_memory_notifier(memcg_memory_hotplug_callback, 0);
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2.
  2010-09-27  9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
                     ` (3 preceding siblings ...)
  2010-09-27  9:54   ` [RFC][PATCH 4/4] memcg: per node info node hotplug support KAMEZAWA Hiroyuki
@ 2010-09-30  5:31   ` KAMEZAWA Hiroyuki
  4 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-30  5:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, balbir, nishimura, akpm

On Mon, 27 Sep 2010 18:48:21 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Fri, 24 Sep 2010 18:13:02 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > 
> > This is a reviced series of use ID.
> > Restart from RFC.
> > 
> 
> Then, I changed my mind..This is a new set. No new special lookups.
> But you may feel somethig strange. I don't want to merge these patches
> at once. Just think this set as a dump of my stack. Any comments are welcome.
> 

At LinuxCon Japan, I talked with Nishimura and just sending only patch 1/4
will be best (go step-by-step). And I know Greg Thelen now rewrite his
dirty page accounting for memcg patch onto the latest mmotm. I think current
priority of it is higher than this. So, I'll wait for a while and post only
patch 1/4.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-09-30  5:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-24  9:13 [RFC][PATCH 0/2] memcg: use ID instead of pointer in page_cgroup , retry KAMEZAWA Hiroyuki
2010-09-24  9:15 ` [RFC][PATCH 1/2] memcg: special ID lookup routine KAMEZAWA Hiroyuki
2010-09-24  9:16 ` [RFC][PATCH 2/2] memcg: use ID instead of pointer KAMEZAWA Hiroyuki
2010-09-27  9:48 ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki
2010-09-27  9:51   ` [RFC][PATCH 1/4] memcg: replace page_cgroup->mem_cgroup to be unsigned short KAMEZAWA Hiroyuki
2010-09-27  9:52   ` [RFC][PATCH 2/4] memcg: make css ID visible at cgroup creation time KAMEZAWA Hiroyuki
2010-09-27  9:54   ` [RFC][PATCH 3/4] memcg: reduce size of mem_cgroup by removing per-node info array KAMEZAWA Hiroyuki
2010-09-27  9:54   ` [RFC][PATCH 4/4] memcg: per node info node hotplug support KAMEZAWA Hiroyuki
2010-09-30  5:31   ` [RFC][PATCH 0/4] memcg: use ID instead of pointer in page_cgroup , retry v2 KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox