[RFC][PATCH] memcg: performance improvement v2 [0/8]

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH] memcg: performance improvement v2 [0/8]
@ 2008-04-28 11:19 KAMEZAWA Hiroyuki
  2008-04-28 11:22 ` [RFC][PATCH 1/8] memcg: migration handling KAMEZAWA Hiroyuki
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-28 11:19 UTC (permalink / raw)
  To: linux-mm; +Cc: balbir, xemul, hugh, yamamoto

Hi, this is a set of patches for improving memcg performance.
This is a kind of status report of my work but I'm glad if someone test and
review this. pefromance results on micro benchmark are below.

This is updated version agasint 2.6.25-mm1. This is still RFC but I'd like
to post some of set after merge-window if no objections.


patch 1-3 are well tested. patch 4-8 are new ones.
==
1/8 ... migration handling update.
2/8 ... remove refcnt
3/8 ... swap cache handling again
4/8 ... mark as read_mostly.
5/8 ... optimization
6/8 ... remove redundant initilization.
7/8 ... remove redundant checks on chage.
8/8 ... remove excessive function.


Following are result of unixbench on x86-64/2core system.

by #./Run execl C shell fstime fsbuffer fsdisk dc

disabled --- disabled by boot ops (but congfigured)
enabled  --- 2.6.25-mm1 with cgroup_enable=memory
(*1)     --- patch 1-3 are applied.
(*2)     --- patch 1-8 are applied. (patch 4-8 are young, I need more checks.)

                                           disabled       enabled      (*1)      (*2)
Execl Throughput                           3111.8 lps      2896.8    3005.6    3003.8
C Compiler Throughput                      1073.3 lpm       982.5     961.6    1034.3
Shell Scripts (1 concurrent)               5741.0 lpm      5417.7    5682.0    5840.6
Shell Scripts (8 concurrent)               1168.3 lpm      1108.7    1132.3    1139.3
Shell Scripts (16 concurrent)               602.3 lpm       570.7     582.3     586.3
File Read 1024 bufsize 2000 maxblocks    1025248.0 KBps 1016883.0 1027897.0 1017299.0
File Write 1024 bufsize 2000 maxblocks   551012.0 KBps   554619.0  554656.0  548747.0
File Copy 1024 bufsize 2000 maxblocks    346886.0 KBps   351423.0  348238.0  344135.0
File Read 256 bufsize 500 maxblocks      323261.0 KBps   324092.0  320753.0  323042.0
File Write 256 bufsize 500 maxblocks     151046.0 KBps   151319.0  152143.0  151431.0
File Copy 256 bufsize 500 maxblocks      100806.0 KBps   101166.0  100270.0  100947.0
File Read 4096 bufsize 8000 maxblocks    2055692.0 KBps 2050954.0 2055142.0 2047008.0
File Write 4096 bufsize 8000 maxblocks   1619457.0 KBps 1627458.0 1621503.0 1615020.0
File Copy 4096 bufsize 8000 maxblocks    865003.0 KBps   862464.0  861305.0  856702.0
Dc: sqrt(2) to 99 decimal places         133621.2 lpm    125084.7  128716.2  128877.8
 
 - Execl/C/Shel/Dc shows overhead, which comes from map/unmap pages.
 - I don't think file-benchmark shows overhead of memory resource controller.
 - I don't have bigger x86-64 system. sorry.

I'm sorry but I'll be completely offline from May/1st to May/6. So, my answer
may be delayed.

Thanks,
-Kame







--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 1/8] memcg: migration handling
  2008-04-28 11:19 [RFC][PATCH] memcg: performance improvement v2 [0/8] KAMEZAWA Hiroyuki
@ 2008-04-28 11:22 ` KAMEZAWA Hiroyuki
  2008-04-29  1:36   ` Li Zefan
  2008-04-28 11:23 ` [RFC][PATCH 2/8] memcg: remove refcnt KAMEZAWA Hiroyuki
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-28 11:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

This patch changes migration path to assign page_cgroup of new page
at allocation.

Pros:
 - We can avoid compliated lock depndencies.
Cons:
 - new param to mem_cgroup_charge_common().

Changlog:
 - adjusted to 2.6.25-mm1.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


Index: mm-2.6.25-mm1/mm/memcontrol.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/memcontrol.c
+++ mm-2.6.25-mm1/mm/memcontrol.c
@@ -515,7 +515,8 @@ unsigned long mem_cgroup_isolate_pages(u
  * < 0 if the cgroup is over its limit
  */
 static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
-				gfp_t gfp_mask, enum charge_type ctype)
+				gfp_t gfp_mask, enum charge_type ctype,
+				struct mem_cgroup *memcg)
 {
 	struct mem_cgroup *mem;
 	struct page_cgroup *pc;
@@ -560,16 +561,21 @@ retry:
 	 * thread group leader migrates. It's possible that mm is not
 	 * set, if so charge the init_mm (happens for pagecache usage).
 	 */
-	if (!mm)
-		mm = &init_mm;
+	if (!memcg) {
+		if (!mm)
+			mm = &init_mm;
 
-	rcu_read_lock();
-	mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
-	/*
-	 * For every charge from the cgroup, increment reference count
-	 */
-	css_get(&mem->css);
-	rcu_read_unlock();
+		rcu_read_lock();
+		mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
+		/*
+	 	 * For every charge from the cgroup, increment reference count
+	 	 */
+		css_get(&mem->css);
+		rcu_read_unlock();
+	} else {
+		mem = memcg;
+		css_get(&memcg->css);
+	}
 
 	while (res_counter_charge(&mem->res, PAGE_SIZE)) {
 		if (!(gfp_mask & __GFP_WAIT))
@@ -634,7 +640,7 @@ err:
 int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
 {
 	return mem_cgroup_charge_common(page, mm, gfp_mask,
-				MEM_CGROUP_CHARGE_TYPE_MAPPED);
+				MEM_CGROUP_CHARGE_TYPE_MAPPED, NULL);
 }
 
 int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
@@ -643,7 +649,22 @@ int mem_cgroup_cache_charge(struct page 
 	if (!mm)
 		mm = &init_mm;
 	return mem_cgroup_charge_common(page, mm, gfp_mask,
-				MEM_CGROUP_CHARGE_TYPE_CACHE);
+				MEM_CGROUP_CHARGE_TYPE_CACHE, NULL);
+}
+
+int mem_cgroup_getref(struct page *page)
+{
+	struct page_cgroup *pc;
+
+	if (mem_cgroup_subsys.disabled)
+		return 0;
+
+	lock_page_cgroup(page);
+	pc = page_get_page_cgroup(page);
+	VM_BUG_ON(!pc);
+	pc->ref_cnt++;
+	unlock_page_cgroup(page);
+	return 0;
 }
 
 /*
@@ -693,65 +714,39 @@ unlock:
 }
 
 /*
- * Returns non-zero if a page (under migration) has valid page_cgroup member.
- * Refcnt of page_cgroup is incremented.
+ * Before starting migration, account against new page.
  */
-int mem_cgroup_prepare_migration(struct page *page)
+int mem_cgroup_prepare_migration(struct page *page, struct page *newpage)
 {
 	struct page_cgroup *pc;
+	struct mem_cgroup *mem = NULL;
+	enum charge_type ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
+	int ret = 0;
 
 	if (mem_cgroup_subsys.disabled)
 		return 0;
 
 	lock_page_cgroup(page);
 	pc = page_get_page_cgroup(page);
-	if (pc)
-		pc->ref_cnt++;
+	if (pc) {
+		mem = pc->mem_cgroup;
+		css_get(&mem->css);
+		if (pc->flags & PAGE_CGROUP_FLAG_CACHE)
+			ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
+	}
 	unlock_page_cgroup(page);
-	return pc != NULL;
-}
-
-void mem_cgroup_end_migration(struct page *page)
-{
-	mem_cgroup_uncharge_page(page);
+	if (mem) {
+		ret = mem_cgroup_charge_common(newpage, NULL, GFP_KERNEL,
+			ctype, mem);
+		css_put(&mem->css);
+	}
+	return ret;
 }
 
-/*
- * We know both *page* and *newpage* are now not-on-LRU and PG_locked.
- * And no race with uncharge() routines because page_cgroup for *page*
- * has extra one reference by mem_cgroup_prepare_migration.
- */
-void mem_cgroup_page_migration(struct page *page, struct page *newpage)
+/* remove redundant charge */
+void mem_cgroup_end_migration(struct page *newpage)
 {
-	struct page_cgroup *pc;
-	struct mem_cgroup_per_zone *mz;
-	unsigned long flags;
-
-	lock_page_cgroup(page);
-	pc = page_get_page_cgroup(page);
-	if (!pc) {
-		unlock_page_cgroup(page);
-		return;
-	}
-
-	mz = page_cgroup_zoneinfo(pc);
-	spin_lock_irqsave(&mz->lru_lock, flags);
-	__mem_cgroup_remove_list(mz, pc);
-	spin_unlock_irqrestore(&mz->lru_lock, flags);
-
-	page_assign_page_cgroup(page, NULL);
-	unlock_page_cgroup(page);
-
-	pc->page = newpage;
-	lock_page_cgroup(newpage);
-	page_assign_page_cgroup(newpage, pc);
-
-	mz = page_cgroup_zoneinfo(pc);
-	spin_lock_irqsave(&mz->lru_lock, flags);
-	__mem_cgroup_add_list(mz, pc);
-	spin_unlock_irqrestore(&mz->lru_lock, flags);
-
-	unlock_page_cgroup(newpage);
+	mem_cgroup_uncharge_page(newpage);
 }
 
 /*
@@ -781,12 +776,19 @@ static void mem_cgroup_force_empty_list(
 		page = pc->page;
 		get_page(page);
 		spin_unlock_irqrestore(&mz->lru_lock, flags);
-		mem_cgroup_uncharge_page(page);
-		put_page(page);
-		if (--count <= 0) {
-			count = FORCE_UNCHARGE_BATCH;
+		/*
+		 * Check if this page is on LRU. !LRU page can be found
+		 * if it's under page migration.
+		 */
+		if (PageLRU(page)) {
+			mem_cgroup_uncharge_page(page);
+			put_page(page);
+			if (--count <= 0) {
+				count = FORCE_UNCHARGE_BATCH;
+				cond_resched();
+			}
+		} else
 			cond_resched();
-		}
 		spin_lock_irqsave(&mz->lru_lock, flags);
 	}
 	spin_unlock_irqrestore(&mz->lru_lock, flags);
Index: mm-2.6.25-mm1/include/linux/memcontrol.h
===================================================================
--- mm-2.6.25-mm1.orig/include/linux/memcontrol.h
+++ mm-2.6.25-mm1/include/linux/memcontrol.h
@@ -50,9 +50,10 @@ extern struct mem_cgroup *mem_cgroup_fro
 #define mm_match_cgroup(mm, cgroup)	\
 	((cgroup) == mem_cgroup_from_task((mm)->owner))
 
-extern int mem_cgroup_prepare_migration(struct page *page);
+extern int
+mem_cgroup_prepare_migration(struct page *page, struct page *newpage);
 extern void mem_cgroup_end_migration(struct page *page);
-extern void mem_cgroup_page_migration(struct page *page, struct page *newpage);
+extern int mem_cgroup_getref(struct page *page);
 
 /*
  * For memory reclaim.
@@ -112,7 +113,8 @@ static inline int task_in_mem_cgroup(str
 	return 1;
 }
 
-static inline int mem_cgroup_prepare_migration(struct page *page)
+static inline int
+mem_cgroup_prepare_migration(struct page *page, struct page *newpage)
 {
 	return 0;
 }
@@ -121,8 +123,7 @@ static inline void mem_cgroup_end_migrat
 {
 }
 
-static inline void
-mem_cgroup_page_migration(struct page *page, struct page *newpage)
+static inline void mem_cgroup_getref(struct page *page)
 {
 }
 
Index: mm-2.6.25-mm1/mm/migrate.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/migrate.c
+++ mm-2.6.25-mm1/mm/migrate.c
@@ -357,6 +357,10 @@ static int migrate_page_move_mapping(str
 	__inc_zone_page_state(newpage, NR_FILE_PAGES);
 
 	write_unlock_irq(&mapping->tree_lock);
+	if (!PageSwapCache(newpage)) {
+		mem_cgroup_uncharge_page(page);
+		mem_cgroup_getref(newpage);
+	}
 
 	return 0;
 }
@@ -603,7 +607,6 @@ static int move_to_new_page(struct page 
 		rc = fallback_migrate_page(mapping, newpage, page);
 
 	if (!rc) {
-		mem_cgroup_page_migration(page, newpage);
 		remove_migration_ptes(page, newpage);
 	} else
 		newpage->mapping = NULL;
@@ -633,6 +636,12 @@ static int unmap_and_move(new_page_t get
 		/* page was freed from under us. So we are done. */
 		goto move_newpage;
 
+	charge = mem_cgroup_prepare_migration(page, newpage);
+	if (charge == -ENOMEM) {
+		rc = -ENOMEM;
+		goto move_newpage;
+	}
+
 	rc = -EAGAIN;
 	if (TestSetPageLocked(page)) {
 		if (!force)
@@ -684,19 +693,14 @@ static int unmap_and_move(new_page_t get
 		goto rcu_unlock;
 	}
 
-	charge = mem_cgroup_prepare_migration(page);
 	/* Establish migration ptes or remove ptes */
 	try_to_unmap(page, 1);
 
 	if (!page_mapped(page))
 		rc = move_to_new_page(newpage, page);
 
-	if (rc) {
+	if (rc)
 		remove_migration_ptes(page, page);
-		if (charge)
-			mem_cgroup_end_migration(page);
-	} else if (charge)
- 		mem_cgroup_end_migration(newpage);
 rcu_unlock:
 	if (rcu_locked)
 		rcu_read_unlock();
@@ -717,6 +721,8 @@ unlock:
 	}
 
 move_newpage:
+	if (!charge)
+		mem_cgroup_end_migration(newpage);
 	/*
 	 * Move the new page to the LRU. If migration was not successful
 	 * then this will free the page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 2/8] memcg: remove refcnt
  2008-04-28 11:19 [RFC][PATCH] memcg: performance improvement v2 [0/8] KAMEZAWA Hiroyuki
  2008-04-28 11:22 ` [RFC][PATCH 1/8] memcg: migration handling KAMEZAWA Hiroyuki
@ 2008-04-28 11:23 ` KAMEZAWA Hiroyuki
  2008-04-28 11:24 ` [RFC][PATCH 3/8] memcg: swapcache handling retry KAMEZAWA Hiroyuki
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-28 11:23 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

This patch removes refcnt from page_cgroup().

After this,

 * A page is charged only when !page_mapped() && no page_cgroup is assigned.
	* Anon page is newly mapped.
	* File page is added to mapping->tree.

 * A page is uncharged only when
	* Anon page is fully unmapped.
	* File page is removed from LRU.

There is no change in behavior from user's view.

This patch also removes unnecessary calls in rmap.c which was used only for
refcnt mangement.


Changelog:
  adjusted to 2.6.25-mm1.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>




---
 include/linux/memcontrol.h |    9 +--
 mm/filemap.c               |    6 +-
 mm/memcontrol.c            |  103 +++++++++++++++++++++++----------------------
 mm/migrate.c               |    3 -
 mm/rmap.c                  |   14 ------
 mm/shmem.c                 |    8 +--
 6 files changed, 66 insertions(+), 77 deletions(-)

Index: mm-2.6.25-mm1/mm/memcontrol.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/memcontrol.c
+++ mm-2.6.25-mm1/mm/memcontrol.c
@@ -164,7 +164,6 @@ struct page_cgroup {
 	struct list_head lru;		/* per cgroup LRU list */
 	struct page *page;
 	struct mem_cgroup *mem_cgroup;
-	int ref_cnt;			/* cached, mapped, migrating */
 	int flags;
 };
 #define PAGE_CGROUP_FLAG_CACHE	(0x1)	/* charged as cache */
@@ -183,6 +182,7 @@ static enum zone_type page_cgroup_zid(st
 enum charge_type {
 	MEM_CGROUP_CHARGE_TYPE_CACHE = 0,
 	MEM_CGROUP_CHARGE_TYPE_MAPPED,
+	MEM_CGROUP_CHARGE_TYPE_FORCE,	/* used by force_empty */
 };
 
 /*
@@ -543,9 +543,7 @@ retry:
 	 */
 	if (pc) {
 		VM_BUG_ON(pc->page != page);
-		VM_BUG_ON(pc->ref_cnt <= 0);
-
-		pc->ref_cnt++;
+		VM_BUG_ON(!pc->mem_cgroup);
 		unlock_page_cgroup(page);
 		goto done;
 	}
@@ -561,10 +559,7 @@ retry:
 	 * thread group leader migrates. It's possible that mm is not
 	 * set, if so charge the init_mm (happens for pagecache usage).
 	 */
-	if (!memcg) {
-		if (!mm)
-			mm = &init_mm;
-
+	if (likely(!memcg)) {
 		rcu_read_lock();
 		mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
 		/*
@@ -600,7 +595,6 @@ retry:
 		}
 	}
 
-	pc->ref_cnt = 1;
 	pc->mem_cgroup = mem;
 	pc->page = page;
 	pc->flags = PAGE_CGROUP_FLAG_ACTIVE;
@@ -639,6 +633,10 @@ err:
 
 int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
 {
+	if (page_mapped(page))
+		return 0;
+	if (unlikely(!mm))
+		mm = &init_mm;
 	return mem_cgroup_charge_common(page, mm, gfp_mask,
 				MEM_CGROUP_CHARGE_TYPE_MAPPED, NULL);
 }
@@ -646,32 +644,16 @@ int mem_cgroup_charge(struct page *page,
 int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
 				gfp_t gfp_mask)
 {
-	if (!mm)
+	if (unlikely(!mm))
 		mm = &init_mm;
 	return mem_cgroup_charge_common(page, mm, gfp_mask,
 				MEM_CGROUP_CHARGE_TYPE_CACHE, NULL);
 }
 
-int mem_cgroup_getref(struct page *page)
-{
-	struct page_cgroup *pc;
-
-	if (mem_cgroup_subsys.disabled)
-		return 0;
-
-	lock_page_cgroup(page);
-	pc = page_get_page_cgroup(page);
-	VM_BUG_ON(!pc);
-	pc->ref_cnt++;
-	unlock_page_cgroup(page);
-	return 0;
-}
-
 /*
- * Uncharging is always a welcome operation, we never complain, simply
- * uncharge.
+ * uncharge if !page_mapped(page)
  */
-void mem_cgroup_uncharge_page(struct page *page)
+void __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 {
 	struct page_cgroup *pc;
 	struct mem_cgroup *mem;
@@ -690,29 +672,41 @@ void mem_cgroup_uncharge_page(struct pag
 		goto unlock;
 
 	VM_BUG_ON(pc->page != page);
-	VM_BUG_ON(pc->ref_cnt <= 0);
 
-	if (--(pc->ref_cnt) == 0) {
-		mz = page_cgroup_zoneinfo(pc);
-		spin_lock_irqsave(&mz->lru_lock, flags);
-		__mem_cgroup_remove_list(mz, pc);
-		spin_unlock_irqrestore(&mz->lru_lock, flags);
+	if ((ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
+	    && ((pc->flags & PAGE_CGROUP_FLAG_CACHE)
+		|| page_mapped(page)))
+		goto unlock;
 
-		page_assign_page_cgroup(page, NULL);
-		unlock_page_cgroup(page);
+	mz = page_cgroup_zoneinfo(pc);
+	spin_lock_irqsave(&mz->lru_lock, flags);
+	__mem_cgroup_remove_list(mz, pc);
+	spin_unlock_irqrestore(&mz->lru_lock, flags);
 
-		mem = pc->mem_cgroup;
-		res_counter_uncharge(&mem->res, PAGE_SIZE);
-		css_put(&mem->css);
+	page_assign_page_cgroup(page, NULL);
+	unlock_page_cgroup(page);
 
-		kmem_cache_free(page_cgroup_cache, pc);
-		return;
-	}
+	mem = pc->mem_cgroup;
+	res_counter_uncharge(&mem->res, PAGE_SIZE);
+	css_put(&mem->css);
 
+	kmem_cache_free(page_cgroup_cache, pc);
+	return;
 unlock:
 	unlock_page_cgroup(page);
 }
 
+void mem_cgroup_uncharge_page(struct page *page)
+{
+	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED);
+}
+
+void mem_cgroup_uncharge_cache_page(struct page *page)
+{
+	VM_BUG_ON(page_mapped(page));
+	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
+}
+
 /*
  * Before starting migration, account against new page.
  */
@@ -743,15 +737,21 @@ int mem_cgroup_prepare_migration(struct 
 	return ret;
 }
 
-/* remove redundant charge */
+/* remove redundant charge if migration failed.*/
 void mem_cgroup_end_migration(struct page *newpage)
 {
-	mem_cgroup_uncharge_page(newpage);
+	/*
+	 * If migration has succeeded, newpage->mapping has some value
+	 * i.e. pointer to address_space or anon_vma. We have nothing to do
+	 * at success.
+	 */
+	if (!newpage->mapping)
+		__mem_cgroup_uncharge_common(newpage,
+				MEM_CGROUP_CHARGE_TYPE_FORCE);
 }
 
 /*
  * This routine traverse page_cgroup in given list and drop them all.
- * This routine ignores page_cgroup->ref_cnt.
  * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
  */
 #define FORCE_UNCHARGE_BATCH	(128)
@@ -781,7 +781,8 @@ static void mem_cgroup_force_empty_list(
 		 * if it's under page migration.
 		 */
 		if (PageLRU(page)) {
-			mem_cgroup_uncharge_page(page);
+			__mem_cgroup_uncharge_common(page,
+					MEM_CGROUP_CHARGE_TYPE_FORCE);
 			put_page(page);
 			if (--count <= 0) {
 				count = FORCE_UNCHARGE_BATCH;
Index: mm-2.6.25-mm1/mm/filemap.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/filemap.c
+++ mm-2.6.25-mm1/mm/filemap.c
@@ -118,7 +118,7 @@ void __remove_from_page_cache(struct pag
 {
 	struct address_space *mapping = page->mapping;
 
-	mem_cgroup_uncharge_page(page);
+	mem_cgroup_uncharge_cache_page(page);
 	radix_tree_delete(&mapping->page_tree, page->index);
 	page->mapping = NULL;
 	mapping->nrpages--;
@@ -477,12 +477,12 @@ int add_to_page_cache(struct page *page,
 			mapping->nrpages++;
 			__inc_zone_page_state(page, NR_FILE_PAGES);
 		} else
-			mem_cgroup_uncharge_page(page);
+			mem_cgroup_uncharge_cache_page(page);
 
 		write_unlock_irq(&mapping->tree_lock);
 		radix_tree_preload_end();
 	} else
-		mem_cgroup_uncharge_page(page);
+		mem_cgroup_uncharge_cache_page(page);
 out:
 	return error;
 }
Index: mm-2.6.25-mm1/mm/migrate.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/migrate.c
+++ mm-2.6.25-mm1/mm/migrate.c
@@ -358,8 +358,7 @@ static int migrate_page_move_mapping(str
 
 	write_unlock_irq(&mapping->tree_lock);
 	if (!PageSwapCache(newpage)) {
-		mem_cgroup_uncharge_page(page);
-		mem_cgroup_getref(newpage);
+		mem_cgroup_uncharge_cache_page(page);
 	}
 
 	return 0;
Index: mm-2.6.25-mm1/include/linux/memcontrol.h
===================================================================
--- mm-2.6.25-mm1.orig/include/linux/memcontrol.h
+++ mm-2.6.25-mm1/include/linux/memcontrol.h
@@ -35,6 +35,7 @@ extern int mem_cgroup_charge(struct page
 extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
 					gfp_t gfp_mask);
 extern void mem_cgroup_uncharge_page(struct page *page);
+extern void mem_cgroup_uncharge_cache_page(struct page *page);
 extern void mem_cgroup_move_lists(struct page *page, bool active);
 extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
 					struct list_head *dst,
@@ -53,7 +54,6 @@ extern struct mem_cgroup *mem_cgroup_fro
 extern int
 mem_cgroup_prepare_migration(struct page *page, struct page *newpage);
 extern void mem_cgroup_end_migration(struct page *page);
-extern int mem_cgroup_getref(struct page *page);
 
 /*
  * For memory reclaim.
@@ -97,6 +97,9 @@ static inline int mem_cgroup_cache_charg
 static inline void mem_cgroup_uncharge_page(struct page *page)
 {
 }
+static inline void mem_cgroup_uncharge_cache_page(struct page *page)
+{
+}
 
 static inline void mem_cgroup_move_lists(struct page *page, bool active)
 {
@@ -123,10 +126,6 @@ static inline void mem_cgroup_end_migrat
 {
 }
 
-static inline void mem_cgroup_getref(struct page *page)
-{
-}
-
 static inline int mem_cgroup_calc_mapped_ratio(struct mem_cgroup *mem)
 {
 	return 0;
Index: mm-2.6.25-mm1/mm/rmap.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/rmap.c
+++ mm-2.6.25-mm1/mm/rmap.c
@@ -575,14 +575,8 @@ void page_add_anon_rmap(struct page *pag
 	VM_BUG_ON(address < vma->vm_start || address >= vma->vm_end);
 	if (atomic_inc_and_test(&page->_mapcount))
 		__page_set_anon_rmap(page, vma, address);
-	else {
+	else
 		__page_check_anon_rmap(page, vma, address);
-		/*
-		 * We unconditionally charged during prepare, we uncharge here
-		 * This takes care of balancing the reference counts
-		 */
-		mem_cgroup_uncharge_page(page);
-	}
 }
 
 /**
@@ -613,12 +607,6 @@ void page_add_file_rmap(struct page *pag
 {
 	if (atomic_inc_and_test(&page->_mapcount))
 		__inc_zone_page_state(page, NR_FILE_MAPPED);
-	else
-		/*
-		 * We unconditionally charged during prepare, we uncharge here
-		 * This takes care of balancing the reference counts
-		 */
-		mem_cgroup_uncharge_page(page);
 }
 
 #ifdef CONFIG_DEBUG_VM
Index: mm-2.6.25-mm1/mm/shmem.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/shmem.c
+++ mm-2.6.25-mm1/mm/shmem.c
@@ -962,7 +962,7 @@ found:
 	spin_unlock(&info->lock);
 	radix_tree_preload_end();
 uncharge:
-	mem_cgroup_uncharge_page(page);
+	mem_cgroup_uncharge_cache_page(page);
 out:
 	unlock_page(page);
 	page_cache_release(page);
@@ -1319,7 +1319,7 @@ repeat:
 					page_cache_release(swappage);
 					goto failed;
 				}
-				mem_cgroup_uncharge_page(swappage);
+				mem_cgroup_uncharge_cache_page(swappage);
 			}
 			page_cache_release(swappage);
 			goto repeat;
@@ -1389,7 +1389,7 @@ repeat:
 			if (error || swap.val || 0 != add_to_page_cache_lru(
 					filepage, mapping, idx, GFP_NOWAIT)) {
 				spin_unlock(&info->lock);
-				mem_cgroup_uncharge_page(filepage);
+				mem_cgroup_uncharge_cache_page(filepage);
 				page_cache_release(filepage);
 				shmem_unacct_blocks(info->flags, 1);
 				shmem_free_blocks(inode, 1);
@@ -1398,7 +1398,7 @@ repeat:
 					goto failed;
 				goto repeat;
 			}
-			mem_cgroup_uncharge_page(filepage);
+			mem_cgroup_uncharge_cache_page(filepage);
 			info->flags |= SHMEM_PAGEIN;
 		}
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 3/8] memcg: swapcache handling retry
  2008-04-28 11:19 [RFC][PATCH] memcg: performance improvement v2 [0/8] KAMEZAWA Hiroyuki
  2008-04-28 11:22 ` [RFC][PATCH 1/8] memcg: migration handling KAMEZAWA Hiroyuki
  2008-04-28 11:23 ` [RFC][PATCH 2/8] memcg: remove refcnt KAMEZAWA Hiroyuki
@ 2008-04-28 11:24 ` KAMEZAWA Hiroyuki
  2008-04-28 11:26 ` [RFC][PATCH 4/8] memcg: read_mostly KAMEZAWA Hiroyuki
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-28 11:24 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

Now swapcache is not accounted. (because it had some troubles.)

This is retrying account swap cache, based on remove-refcnt patch.

This does.
 * When a page is swap-cache,  mem_cgroup_uncharge_page() will *not*
   uncharge page even if page->mapcount == 0.
 * When a page is removed from swap-cache, mem_cgroup_uncharge_page()
   is called again.
 * A swapcache page is newly charged only when it's mapped.

Changelog:
 - adjusted to 2.6.25-mm1.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/migrate.c    |    3 ++-
 mm/swap_state.c |    1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: mm-2.6.25-mm1/mm/migrate.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/migrate.c
+++ mm-2.6.25-mm1/mm/migrate.c
@@ -359,7 +359,8 @@ static int migrate_page_move_mapping(str
 	write_unlock_irq(&mapping->tree_lock);
 	if (!PageSwapCache(newpage)) {
 		mem_cgroup_uncharge_cache_page(page);
-	}
+	} else
+		mem_cgroup_uncharge_page(page);
 
 	return 0;
 }
Index: mm-2.6.25-mm1/mm/swap_state.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/swap_state.c
+++ mm-2.6.25-mm1/mm/swap_state.c
@@ -110,6 +110,7 @@ void __delete_from_swap_cache(struct pag
 	total_swapcache_pages--;
 	__dec_zone_page_state(page, NR_FILE_PAGES);
 	INC_CACHE_INFO(del_total);
+	mem_cgroup_uncharge_page(page);
 }
 
 /**
Index: mm-2.6.25-mm1/mm/memcontrol.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/memcontrol.c
+++ mm-2.6.25-mm1/mm/memcontrol.c
@@ -675,7 +675,8 @@ void __mem_cgroup_uncharge_common(struct
 
 	if ((ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
 	    && ((pc->flags & PAGE_CGROUP_FLAG_CACHE)
-		|| page_mapped(page)))
+		|| page_mapped(page)
+		|| PageSwapCache(page)))
 		goto unlock;
 
 	mz = page_cgroup_zoneinfo(pc);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 4/8] memcg: read_mostly
  2008-04-28 11:19 [RFC][PATCH] memcg: performance improvement v2 [0/8] KAMEZAWA Hiroyuki
                   ` (2 preceding siblings ...)
  2008-04-28 11:24 ` [RFC][PATCH 3/8] memcg: swapcache handling retry KAMEZAWA Hiroyuki
@ 2008-04-28 11:26 ` KAMEZAWA Hiroyuki
  2008-04-29  1:34   ` Li Zefan
  2008-04-28 11:28 ` [RFC][PATCH 5/8] memcg: optimize branches KAMEZAWA Hiroyuki
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-28 11:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

These 3 params are read_mostly.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


Index: mm-2.6.25-mm1/mm/memcontrol.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/memcontrol.c
+++ mm-2.6.25-mm1/mm/memcontrol.c
@@ -35,9 +35,9 @@
 
 #include <asm/uaccess.h>
 
-struct cgroup_subsys mem_cgroup_subsys;
-static const int MEM_CGROUP_RECLAIM_RETRIES = 5;
-static struct kmem_cache *page_cgroup_cache;
+struct cgroup_subsys mem_cgroup_subsys __read_mostly;
+static const int MEM_CGROUP_RECLAIM_RETRIES __read_mostly = 5;
+static struct kmem_cache *page_cgroup_cache __read_mostly;
 
 /*
  * Statistics for memory cgroup.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 5/8] memcg: optimize branches
  2008-04-28 11:19 [RFC][PATCH] memcg: performance improvement v2 [0/8] KAMEZAWA Hiroyuki
                   ` (3 preceding siblings ...)
  2008-04-28 11:26 ` [RFC][PATCH 4/8] memcg: read_mostly KAMEZAWA Hiroyuki
@ 2008-04-28 11:28 ` KAMEZAWA Hiroyuki
  2008-04-29  2:04   ` Li Zefan
  2008-04-28 11:30 ` [RFC][PATCH 6/8] memcg: remove redundant initilization KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-28 11:28 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

Showing brach direction for obvious conditions.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/memcontrol.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: mm-2.6.25-mm1/mm/memcontrol.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/memcontrol.c
+++ mm-2.6.25-mm1/mm/memcontrol.c
@@ -541,7 +541,7 @@ retry:
 	 * The page_cgroup exists and
 	 * the page has already been accounted.
 	 */
-	if (pc) {
+	if (unlikely(pc)) {
 		VM_BUG_ON(pc->page != page);
 		VM_BUG_ON(!pc->mem_cgroup);
 		unlock_page_cgroup(page);
@@ -550,7 +550,7 @@ retry:
 	unlock_page_cgroup(page);
 
 	pc = kmem_cache_zalloc(page_cgroup_cache, gfp_mask);
-	if (pc == NULL)
+	if (unlikely(!pc))
 		goto err;
 
 	/*
@@ -602,7 +602,7 @@ retry:
 		pc->flags = PAGE_CGROUP_FLAG_CACHE;
 
 	lock_page_cgroup(page);
-	if (page_get_page_cgroup(page)) {
+	if (unlikely(page_get_page_cgroup(page))) {
 		unlock_page_cgroup(page);
 		/*
 		 * Another charge has been added to this page already.
@@ -668,7 +668,7 @@ void __mem_cgroup_uncharge_common(struct
 	 */
 	lock_page_cgroup(page);
 	pc = page_get_page_cgroup(page);
-	if (!pc)
+	if (unlikely(!pc))
 		goto unlock;
 
 	VM_BUG_ON(pc->page != page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 6/8] memcg: remove redundant initilization
  2008-04-28 11:19 [RFC][PATCH] memcg: performance improvement v2 [0/8] KAMEZAWA Hiroyuki
                   ` (4 preceding siblings ...)
  2008-04-28 11:28 ` [RFC][PATCH 5/8] memcg: optimize branches KAMEZAWA Hiroyuki
@ 2008-04-28 11:30 ` KAMEZAWA Hiroyuki
  2008-04-28 11:31 ` [RFC][PATCH 7/8] memcg: remove redundant checks KAMEZAWA Hiroyuki
  2008-04-28 11:32 ` [RFC][PATCH 8/8] memcg: inlining mem_cgroup_chage_statistics() KAMEZAWA Hiroyuki
  7 siblings, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-28 11:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

1. Remove over-killing initializations.
2. makes flag initialization clearer.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/memcontrol.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: mm-2.6.25-mm1/mm/memcontrol.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/memcontrol.c
+++ mm-2.6.25-mm1/mm/memcontrol.c
@@ -287,7 +287,7 @@ static void __mem_cgroup_remove_list(str
 		MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) -= 1;
 
 	mem_cgroup_charge_statistics(pc->mem_cgroup, pc->flags, false);
-	list_del_init(&pc->lru);
+	list_del(&pc->lru);
 }
 
 static void __mem_cgroup_add_list(struct mem_cgroup_per_zone *mz,
@@ -549,10 +549,9 @@ retry:
 	}
 	unlock_page_cgroup(page);
 
-	pc = kmem_cache_zalloc(page_cgroup_cache, gfp_mask);
+	pc = kmem_cache_alloc(page_cgroup_cache, gfp_mask);
 	if (unlikely(!pc))
 		goto err;
-
 	/*
 	 * We always charge the cgroup the mm_struct belongs to.
 	 * The mm_struct's mem_cgroup changes on task migration if the
@@ -597,9 +596,10 @@ retry:
 
 	pc->mem_cgroup = mem;
 	pc->page = page;
-	pc->flags = PAGE_CGROUP_FLAG_ACTIVE;
 	if (ctype == MEM_CGROUP_CHARGE_TYPE_CACHE)
 		pc->flags = PAGE_CGROUP_FLAG_CACHE;
+	else
+		pc->flags = PAGE_CGROUP_FLAG_ACTIVE;
 
 	lock_page_cgroup(page);
 	if (unlikely(page_get_page_cgroup(page))) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 7/8] memcg: remove redundant checks
  2008-04-28 11:19 [RFC][PATCH] memcg: performance improvement v2 [0/8] KAMEZAWA Hiroyuki
                   ` (5 preceding siblings ...)
  2008-04-28 11:30 ` [RFC][PATCH 6/8] memcg: remove redundant initilization KAMEZAWA Hiroyuki
@ 2008-04-28 11:31 ` KAMEZAWA Hiroyuki
  2008-04-28 11:32 ` [RFC][PATCH 8/8] memcg: inlining mem_cgroup_chage_statistics() KAMEZAWA Hiroyuki
  7 siblings, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-28 11:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

This patch needs review ;)

Because of remove refcnt patch, it's very rare case to that
mem_cgroup_charge_common() is called against a page which is accounted.

mem_cgroup_charge_common() is called when.
 1. a page is added into file cache.
 2. an anon page is newly mapped.
 3. a file-cache page is newly mapped.

To rise a racy condition, above action against a page should occur
at once.

And, for case 3, we don't account it because it's already file cache.
For avoiding accounting "File cache but not mapped" page, checking
page->mapping is better than checking page_mapped().

Signed-off-by : KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/memcontrol.c |   39 ++++++++++-----------------------------
 1 file changed, 10 insertions(+), 29 deletions(-)

Index: mm-2.6.25-mm1/mm/memcontrol.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/memcontrol.c
+++ mm-2.6.25-mm1/mm/memcontrol.c
@@ -527,28 +527,6 @@ static int mem_cgroup_charge_common(stru
 	if (mem_cgroup_subsys.disabled)
 		return 0;
 
-	/*
-	 * Should page_cgroup's go to their own slab?
-	 * One could optimize the performance of the charging routine
-	 * by saving a bit in the page_flags and using it as a lock
-	 * to see if the cgroup page already has a page_cgroup associated
-	 * with it
-	 */
-retry:
-	lock_page_cgroup(page);
-	pc = page_get_page_cgroup(page);
-	/*
-	 * The page_cgroup exists and
-	 * the page has already been accounted.
-	 */
-	if (unlikely(pc)) {
-		VM_BUG_ON(pc->page != page);
-		VM_BUG_ON(!pc->mem_cgroup);
-		unlock_page_cgroup(page);
-		goto done;
-	}
-	unlock_page_cgroup(page);
-
 	pc = kmem_cache_alloc(page_cgroup_cache, gfp_mask);
 	if (unlikely(!pc))
 		goto err;
@@ -604,15 +582,10 @@ retry:
 	lock_page_cgroup(page);
 	if (unlikely(page_get_page_cgroup(page))) {
 		unlock_page_cgroup(page);
-		/*
-		 * Another charge has been added to this page already.
-		 * We take lock_page_cgroup(page) again and read
-		 * page->cgroup, increment refcnt.... just retry is OK.
-		 */
 		res_counter_uncharge(&mem->res, PAGE_SIZE);
 		css_put(&mem->css);
 		kmem_cache_free(page_cgroup_cache, pc);
-		goto retry;
+		goto done;
 	}
 	page_assign_page_cgroup(page, pc);
 
@@ -633,7 +606,15 @@ err:
 
 int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
 {
-	if (page_mapped(page))
+	/*
+	 * If a page is file cache, it's already charged. If a page is anon
+	 * and mapped, it's already charged. We accounts only *new* anon
+	 * page here. We cannot use !PageAnon() here because a new page for
+	 * anon is not marked as an anon page. we just checks a page has some
+	 * objrmap context or not.
+	 * Note that swap-cache's page->mapping is NULL. So we have no problem.
+	 */
+	if (page->mapping)
 		return 0;
 	if (unlikely(!mm))
 		mm = &init_mm;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 8/8] memcg: inlining mem_cgroup_chage_statistics()
  2008-04-28 11:19 [RFC][PATCH] memcg: performance improvement v2 [0/8] KAMEZAWA Hiroyuki
                   ` (6 preceding siblings ...)
  2008-04-28 11:31 ` [RFC][PATCH 7/8] memcg: remove redundant checks KAMEZAWA Hiroyuki
@ 2008-04-28 11:32 ` KAMEZAWA Hiroyuki
  7 siblings, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-28 11:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

I added mem_cgroup_charge_statistics() but this seems to add more
function calls. (compiler doesn't inline this ;) And maybe 
removing mem_cgroup_charge_statistics() is more straightforward and
easier to read.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/memcontrol.c |   32 ++++++++++++++------------------
 1 file changed, 14 insertions(+), 18 deletions(-)

Index: mm-2.6.25-mm1/mm/memcontrol.c
===================================================================
--- mm-2.6.25-mm1.orig/mm/memcontrol.c
+++ mm-2.6.25-mm1/mm/memcontrol.c
@@ -185,22 +185,6 @@ enum charge_type {
 	MEM_CGROUP_CHARGE_TYPE_FORCE,	/* used by force_empty */
 };
 
-/*
- * Always modified under lru lock. Then, not necessary to preempt_disable()
- */
-static void mem_cgroup_charge_statistics(struct mem_cgroup *mem, int flags,
-					bool charge)
-{
-	int val = (charge)? 1 : -1;
-	struct mem_cgroup_stat *stat = &mem->stat;
-
-	VM_BUG_ON(!irqs_disabled());
-	if (flags & PAGE_CGROUP_FLAG_CACHE)
-		__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_CACHE, val);
-	else
-		__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_RSS, val);
-}
-
 static struct mem_cgroup_per_zone *
 mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
 {
@@ -279,14 +263,20 @@ static void unlock_page_cgroup(struct pa
 static void __mem_cgroup_remove_list(struct mem_cgroup_per_zone *mz,
 			struct page_cgroup *pc)
 {
+	struct mem_cgroup_stat *stat = &pc->mem_cgroup->stat;
 	int from = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
+	int cache = pc->flags & PAGE_CGROUP_FLAG_CACHE;
 
 	if (from)
 		MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_ACTIVE) -= 1;
 	else
 		MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) -= 1;
 
-	mem_cgroup_charge_statistics(pc->mem_cgroup, pc->flags, false);
+	if (cache)
+		__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_CACHE, -1);
+	else
+		__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_RSS, -1);
+
 	list_del(&pc->lru);
 }
 
@@ -294,6 +284,8 @@ static void __mem_cgroup_add_list(struct
 				struct page_cgroup *pc)
 {
 	int to = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
+	int cache =  pc->flags & PAGE_CGROUP_FLAG_CACHE;
+	struct mem_cgroup_stat *stat = &pc->mem_cgroup->stat;
 
 	if (!to) {
 		MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) += 1;
@@ -302,7 +294,11 @@ static void __mem_cgroup_add_list(struct
 		MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_ACTIVE) += 1;
 		list_add(&pc->lru, &mz->active_list);
 	}
-	mem_cgroup_charge_statistics(pc->mem_cgroup, pc->flags, true);
+
+	if (cache)
+		__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_CACHE, 1);
+	else
+		__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_RSS, 1);
 }
 
 static void __mem_cgroup_move_lists(struct page_cgroup *pc, bool active)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 4/8] memcg: read_mostly
  2008-04-28 11:26 ` [RFC][PATCH 4/8] memcg: read_mostly KAMEZAWA Hiroyuki
@ 2008-04-29  1:34   ` Li Zefan
  2008-04-29  1:43     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 16+ messages in thread
From: Li Zefan @ 2008-04-29  1:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

KAMEZAWA Hiroyuki wrote:
> These 3 params are read_mostly.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> 
> Index: mm-2.6.25-mm1/mm/memcontrol.c
> ===================================================================
> --- mm-2.6.25-mm1.orig/mm/memcontrol.c
> +++ mm-2.6.25-mm1/mm/memcontrol.c
> @@ -35,9 +35,9 @@
>  
>  #include <asm/uaccess.h>
>  
> -struct cgroup_subsys mem_cgroup_subsys;
> -static const int MEM_CGROUP_RECLAIM_RETRIES = 5;
> -static struct kmem_cache *page_cgroup_cache;
> +struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> +static const int MEM_CGROUP_RECLAIM_RETRIES __read_mostly = 5;

it's not __read_mostly, it's __read_always. ;)
so why not make it a macro:
	#define MEM_CGROUP_RECLAIM_RETRIES	5

> +static struct kmem_cache *page_cgroup_cache __read_mostly;
>  
>  /*
>   * Statistics for memory cgroup.
> 
> --

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/8] memcg: migration handling
  2008-04-28 11:22 ` [RFC][PATCH 1/8] memcg: migration handling KAMEZAWA Hiroyuki
@ 2008-04-29  1:36   ` Li Zefan
  2008-04-29  1:48     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 16+ messages in thread
From: Li Zefan @ 2008-04-29  1:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

KAMEZAWA Hiroyuki wrote:
[..snip..]
> Index: mm-2.6.25-mm1/mm/migrate.c
> ===================================================================
> --- mm-2.6.25-mm1.orig/mm/migrate.c
> +++ mm-2.6.25-mm1/mm/migrate.c
> @@ -357,6 +357,10 @@ static int migrate_page_move_mapping(str
>  	__inc_zone_page_state(newpage, NR_FILE_PAGES);
>  
>  	write_unlock_irq(&mapping->tree_lock);
> +	if (!PageSwapCache(newpage)) {
> +		mem_cgroup_uncharge_page(page);
> +		mem_cgroup_getref(newpage);
> +	}
>  
>  	return 0;
>  }
> @@ -603,7 +607,6 @@ static int move_to_new_page(struct page 
>  		rc = fallback_migrate_page(mapping, newpage, page);
>  
>  	if (!rc) {
> -		mem_cgroup_page_migration(page, newpage);
>  		remove_migration_ptes(page, newpage);
>  	} else
>  		newpage->mapping = NULL;
> @@ -633,6 +636,12 @@ static int unmap_and_move(new_page_t get
>  		/* page was freed from under us. So we are done. */
>  		goto move_newpage;
>  
> +	charge = mem_cgroup_prepare_migration(page, newpage);
> +	if (charge == -ENOMEM) {
> +		rc = -ENOMEM;
> +		goto move_newpage;
> +	}
> +

A BUG_ON(charge) is needed to insure the only error code from
mem_cgroup_prepare_migration() is -ENOMEM ?

>  	rc = -EAGAIN;
>  	if (TestSetPageLocked(page)) {
>  		if (!force)
> @@ -684,19 +693,14 @@ static int unmap_and_move(new_page_t get
>  		goto rcu_unlock;
>  	}
>  
> -	charge = mem_cgroup_prepare_migration(page);
>  	/* Establish migration ptes or remove ptes */
>  	try_to_unmap(page, 1);
>  
>  	if (!page_mapped(page))
>  		rc = move_to_new_page(newpage, page);
>  
> -	if (rc) {
> +	if (rc)
>  		remove_migration_ptes(page, page);
> -		if (charge)
> -			mem_cgroup_end_migration(page);
> -	} else if (charge)
> - 		mem_cgroup_end_migration(newpage);
>  rcu_unlock:
>  	if (rcu_locked)
>  		rcu_read_unlock();
> @@ -717,6 +721,8 @@ unlock:
>  	}
>  
>  move_newpage:
> +	if (!charge)
> +		mem_cgroup_end_migration(newpage);
>  	/*
>  	 * Move the new page to the LRU. If migration was not successful
>  	 * then this will free the page.
> 
> --

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 4/8] memcg: read_mostly
  2008-04-29  1:34   ` Li Zefan
@ 2008-04-29  1:43     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-29  1:43 UTC (permalink / raw)
  To: Li Zefan; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

On Tue, 29 Apr 2008 09:34:26 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> > Index: mm-2.6.25-mm1/mm/memcontrol.c
> > ===================================================================
> > --- mm-2.6.25-mm1.orig/mm/memcontrol.c
> > +++ mm-2.6.25-mm1/mm/memcontrol.c
> > @@ -35,9 +35,9 @@
> >  
> >  #include <asm/uaccess.h>
> >  
> > -struct cgroup_subsys mem_cgroup_subsys;
> > -static const int MEM_CGROUP_RECLAIM_RETRIES = 5;
> > -static struct kmem_cache *page_cgroup_cache;
> > +struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> > +static const int MEM_CGROUP_RECLAIM_RETRIES __read_mostly = 5;
> 
> it's not __read_mostly, it's __read_always. ;)
> so why not make it a macro:
> 	#define MEM_CGROUP_RECLAIM_RETRIES	5
> 
I'm not sure why this is not macro ;)
I'll change it to macro in next version if no objections.

Thanks you for review.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/8] memcg: migration handling
  2008-04-29  1:36   ` Li Zefan
@ 2008-04-29  1:48     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-29  1:48 UTC (permalink / raw)
  To: Li Zefan; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

On Tue, 29 Apr 2008 09:36:40 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> [..snip..]
> > Index: mm-2.6.25-mm1/mm/migrate.c
> > ===================================================================
> > --- mm-2.6.25-mm1.orig/mm/migrate.c
> > +++ mm-2.6.25-mm1/mm/migrate.c
> > @@ -357,6 +357,10 @@ static int migrate_page_move_mapping(str
> >  	__inc_zone_page_state(newpage, NR_FILE_PAGES);
> >  
> >  	write_unlock_irq(&mapping->tree_lock);
> > +	if (!PageSwapCache(newpage)) {
> > +		mem_cgroup_uncharge_page(page);
> > +		mem_cgroup_getref(newpage);
> > +	}
> >  
> >  	return 0;
> >  }
> > @@ -603,7 +607,6 @@ static int move_to_new_page(struct page 
> >  		rc = fallback_migrate_page(mapping, newpage, page);
> >  
> >  	if (!rc) {
> > -		mem_cgroup_page_migration(page, newpage);
> >  		remove_migration_ptes(page, newpage);
> >  	} else
> >  		newpage->mapping = NULL;
> > @@ -633,6 +636,12 @@ static int unmap_and_move(new_page_t get
> >  		/* page was freed from under us. So we are done. */
> >  		goto move_newpage;
> >  
> > +	charge = mem_cgroup_prepare_migration(page, newpage);
> > +	if (charge == -ENOMEM) {
> > +		rc = -ENOMEM;
> > +		goto move_newpage;
> > +	}
> > +
> 
> A BUG_ON(charge) is needed to insure the only error code from
> mem_cgroup_prepare_migration() is -ENOMEM ?
> 
Hmm, it just depends on what mem_cgroup_charge_common() retruns.
And it returns 0 or -ENOMEM. But it seems good to add BUG_ON(charge)
after 'if' for sanity check.
I will add it in the next version.

Thanks,
-Kame






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 5/8] memcg: optimize branches
  2008-04-28 11:28 ` [RFC][PATCH 5/8] memcg: optimize branches KAMEZAWA Hiroyuki
@ 2008-04-29  2:04   ` Li Zefan
  2008-04-29  2:48     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 16+ messages in thread
From: Li Zefan @ 2008-04-29  2:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

KAMEZAWA Hiroyuki wrote:
> Showing brach direction for obvious conditions.
> 

Did you compare the compiled objects with and without this patch ?

It seems gcc will take (ptr == NULL) as unlikely without your explicit
anotation. And likely() and unlikely() should be used in some performance-
critical path only ?

> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> ---
>  mm/memcontrol.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> Index: mm-2.6.25-mm1/mm/memcontrol.c
> ===================================================================
> --- mm-2.6.25-mm1.orig/mm/memcontrol.c
> +++ mm-2.6.25-mm1/mm/memcontrol.c
> @@ -541,7 +541,7 @@ retry:
>  	 * The page_cgroup exists and
>  	 * the page has already been accounted.
>  	 */
> -	if (pc) {
> +	if (unlikely(pc)) {
>  		VM_BUG_ON(pc->page != page);
>  		VM_BUG_ON(!pc->mem_cgroup);
>  		unlock_page_cgroup(page);
> @@ -550,7 +550,7 @@ retry:
>  	unlock_page_cgroup(page);
>  
>  	pc = kmem_cache_zalloc(page_cgroup_cache, gfp_mask);
> -	if (pc == NULL)
> +	if (unlikely(!pc))
>  		goto err;
>  
>  	/*
> @@ -602,7 +602,7 @@ retry:
>  		pc->flags = PAGE_CGROUP_FLAG_CACHE;
>  
>  	lock_page_cgroup(page);
> -	if (page_get_page_cgroup(page)) {
> +	if (unlikely(page_get_page_cgroup(page))) {
>  		unlock_page_cgroup(page);
>  		/*
>  		 * Another charge has been added to this page already.
> @@ -668,7 +668,7 @@ void __mem_cgroup_uncharge_common(struct
>  	 */
>  	lock_page_cgroup(page);
>  	pc = page_get_page_cgroup(page);
> -	if (!pc)
> +	if (unlikely(!pc))
>  		goto unlock;
>  
>  	VM_BUG_ON(pc->page != page);
> 
> --

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 5/8] memcg: optimize branches
  2008-04-29  2:04   ` Li Zefan
@ 2008-04-29  2:48     ` KAMEZAWA Hiroyuki
  2008-04-29  3:11       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-29  2:48 UTC (permalink / raw)
  To: Li Zefan; +Cc: linux-mm, balbir, xemul, hugh, yamamoto

On Tue, 29 Apr 2008 10:04:45 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > Showing brach direction for obvious conditions.
> > 
> 
> Did you compare the compiled objects with and without this patch ?
> 
yes, on ia64. (...I attached the result of x86-64 just because it's widely used). 

> It seems gcc will take (ptr == NULL) as unlikely without your explicit
> anotation.
I didn't know that. But plz try this. I don't see what you say.
==
#include <stdio.h>

#define unlikely(x)     __builtin_expect(!!(x), 0)
//#define unlikely(x)   (x)

extern void *allocalter(int size);
extern void call(int arg);
void *foo(void)
{
        char *ptr;
        ptr = allocator(1024);
        if (unlikely(!ptr)) {
                call(1);
                call(2);
        } else {
                call(3);
                call(4);
        }
        return ptr;
}
==
When I compiled above with -O2 on ia64,
 - if unlikely is used , call(3),call(4) are on fast path.
 - if unlikely is not used, call(1), call(3) are on fast path.

In more obvious case, gcc will do some obvious optimization I agree.
But it's difficult for me to know how gcc will compile it in C-language level.

> And likely() and unlikely() should be used in some performance-
> critical path only ?
> 
I think it can be used against busy and obvious path. IMHO, it gives a hint
not only to a compiler, but also to a developper who read it.
 
Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 5/8] memcg: optimize branches
  2008-04-29  2:48     ` KAMEZAWA Hiroyuki
@ 2008-04-29  3:11       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-04-29  3:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Li Zefan, linux-mm, balbir, xemul, hugh, yamamoto

On Tue, 29 Apr 2008 11:48:32 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> extern void *allocalter(int size);
> extern void call(int arg);
> void *foo(void)
> {
>         char *ptr;
>         ptr = allocator(1024);
>         if (unlikely(!ptr)) {
>                 call(1);
>                 call(2);
>         } else {
>                 call(3);
>                 call(4);
>         }
>         return ptr;
> }
> ==
> When I compiled above with -O2 on ia64,
>  - if unlikely is used , call(3),call(4) are on fast path.
>  - if unlikely is not used, call(1), call(3) are on fast path.
                                       call(2)
Sorry, too many typos in these days ...

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-04-29  3:11 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-28 11:19 [RFC][PATCH] memcg: performance improvement v2 [0/8] KAMEZAWA Hiroyuki
2008-04-28 11:22 ` [RFC][PATCH 1/8] memcg: migration handling KAMEZAWA Hiroyuki
2008-04-29  1:36   ` Li Zefan
2008-04-29  1:48     ` KAMEZAWA Hiroyuki
2008-04-28 11:23 ` [RFC][PATCH 2/8] memcg: remove refcnt KAMEZAWA Hiroyuki
2008-04-28 11:24 ` [RFC][PATCH 3/8] memcg: swapcache handling retry KAMEZAWA Hiroyuki
2008-04-28 11:26 ` [RFC][PATCH 4/8] memcg: read_mostly KAMEZAWA Hiroyuki
2008-04-29  1:34   ` Li Zefan
2008-04-29  1:43     ` KAMEZAWA Hiroyuki
2008-04-28 11:28 ` [RFC][PATCH 5/8] memcg: optimize branches KAMEZAWA Hiroyuki
2008-04-29  2:04   ` Li Zefan
2008-04-29  2:48     ` KAMEZAWA Hiroyuki
2008-04-29  3:11       ` KAMEZAWA Hiroyuki
2008-04-28 11:30 ` [RFC][PATCH 6/8] memcg: remove redundant initilization KAMEZAWA Hiroyuki
2008-04-28 11:31 ` [RFC][PATCH 7/8] memcg: remove redundant checks KAMEZAWA Hiroyuki
2008-04-28 11:32 ` [RFC][PATCH 8/8] memcg: inlining mem_cgroup_chage_statistics() KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox