[RFC][-mm] [0/7] misc memcg patch set

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC][-mm] [0/7] misc memcg patch set
@ 2008-07-02 12:03 KAMEZAWA Hiroyuki
  2008-07-02 12:07 ` [RFC][-mm] [1/7] shmem swapcache fix KAMEZAWA Hiroyuki
                   ` (11 more replies)
  0 siblings, 12 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-02 12:03 UTC (permalink / raw)
  To: linux-mm; +Cc: balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

Hi, it seems vmm-related bugs in -mm is reduced to some safe level.
I restarted my patches for memcg.

This mail is just for dumping patches on my stack.
(I'll resend one by one later.)

based on 2.6.26-rc5-mm3
+ kosaki's fixes + cgroup write_string set + Hugh Dickins's fixes for shmem
(All patches are in -mm queue.)

Any comments are welcome (but 7/7 patch is not so neat...)

[1/7] swapcache handle fix for shmem.
[2/7] adjust to split-lru: remove PAGE_CGROUP_FLAG_CACHE flag. 
[3/7] adjust to split-lru: push shmem's page to active list
      (Imported from Hugh Dickins's work.)
[4/7] reduce usage at change limit. res_counter part.
[5/7] reduce usage at change limit. memcg part.
[6/7] memcg-background-job.           res_coutner part
[7/7] memcg-background-job            memcg part.

Balbir, I'd like to import your idea of soft-limit to memcg-background-job
patch set. (Maybe better than adding hooks to very generic part.)
How do you think ?

Other patches in plan (including other guy's)
- soft-limit (Balbir works.)
  I myself think memcg-background-job patches can copperative with this.

- dirty_ratio for memcg. (haven't written at all)
  Support dirty_ratio for memcg. This will improve OOM avoidance.

- swapiness for memcg (had patches..but have to rewrite.)
  Support swapiness per memcg. (of no use ?)

- swap_controller (Maybe Nishimura works on.)
  The world may change after this...cgroup without swap can appears easily.

- hierarchy (needs more discussion. maybe after OLS?)
  have some pathes, but not in hurry.

- more performance improvements (we need some trick.)
  = Can we remove lock_page_cgroup() ?
  = Can we reduce spinlocks ?

- move resource at task move (needs helps from cgroup)
  We need some magical way. It seems impossible to implement this only by memcg.

- NUMA statistics (needs helps from cgroup)
  It seems dynamic file creation feature or some rule to show array of
  statistics should be defined.

- memory guarantee (soft-mlock.)
  guard parameter against global LRU for saying "Don't reclaim from me more ;("
  Maybe HA Linux people will want this....

Do you have others ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][-mm] [1/7] shmem swapcache fix
  2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
@ 2008-07-02 12:07 ` KAMEZAWA Hiroyuki
  2008-07-02 12:08 ` [RFC][-mm] [2/7] delete FLAG_CACHE KAMEZAWA Hiroyuki
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-02 12:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

SwapCache handling fix.

shmem's swapcache behavior is a little different from anonymous's one and
memcg failed to handle it. This patch tries to fix it.

After this:

Any page marked as SwapCache is not uncharged. (delelte_from_swap_cache()
delete the flag.) Because SwapCache is accounted, this is not a good change
for performance of shmem/tmpfs under memcg. (But meory was leaked.)
We need additional fix of background-job, dirty_ratio, etc..

To check a page is alive shmem-page-cache or not we use
 page->mapping && !PageAnon(page) instead of
 pc->flags & PAGE_CGROUP_FLAG_CACHE.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Index: test-2.6.26-rc5-mm3++/mm/memcontrol.c
===================================================================
--- test-2.6.26-rc5-mm3++.orig/mm/memcontrol.c	2008-07-02 09:29:52.000000000 +0900
+++ test-2.6.26-rc5-mm3++/mm/memcontrol.c	2008-07-02 10:58:15.000000000 +0900
@@ -685,11 +685,45 @@
 
 	VM_BUG_ON(pc->page != page);
 
-	if ((ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
-	    && ((pc->flags & PAGE_CGROUP_FLAG_CACHE)
-		|| page_mapped(page)
-		|| PageSwapCache(page)))
+	/*
+	 * File Cache
+	 * If called with MEM_CGROUP_CHARGE_TYPE_MAPPED, check page->mapping.
+	 * add_to_page_cache() .... charged before inserting radix-tree.
+	 * remove_from_page_cache() .... uncharged at removing from radix-tree.
+	 * page->mapping && !PageAnon(page) catches file cache.
+	 *
+	 * Anon/Shmem.....We check PageSwapCache(page).
+	 * Anon .... charged before mapped.
+	 * Shmem .... charged at add_to_page_cache() as usual File Cache.
+	 *
+	 * This page will be finally uncharged when removed from swap-cache
+	 *
+	 * we treat 2 cases here.
+	 * A. anonymous page  B. shmem.
+	 * We never uncharge if page is marked as SwapCache.
+	 * add_to_swap_cache() have nothing to do with charge/uncharge.
+	 * SwapCache flag is deleted before delete_from_swap_cache() calls this
+	 *
+	 * shmem's behavior is following. (see shmem.c/swap_state.c also)
+	 * at swap-out:
+	 * 	0. add_to_page_cache()//charged at page creation.
+	 * 	1. add_to_swap_cache() (marked as SwapCache)
+	 *	2. remove_from_page_cache().  (calls this.)
+	 *	(finally) delete_from_swap_cache(). (calls this.)
+	 * at swap-in:
+	 * 	3. add_to_swap_cache() (no charge here.)
+	 * 	4. add_to_page_cache() (charged here.)
+	 * 	5. delete_from_swap_cache() (calls this.)
+	 * PageSwapCache(page) catches "2".
+	 * page->mapping && !PageAnon() catches "5" and avoid uncharging.
+	 */
+	if (PageSwapCache(page))
 		goto unlock;
+	/* called from unmap or delete_from_swap_cache() */
+	if ((ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
+		&& (page_mapped(page)
+		    || (page->mapping && !PageAnon(page))))/* alive cache ? */
+			goto unlock;
 
 	mz = page_cgroup_zoneinfo(pc);
 	spin_lock_irqsave(&mz->lru_lock, flags);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][-mm] [2/7] delete FLAG_CACHE
  2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
  2008-07-02 12:07 ` [RFC][-mm] [1/7] shmem swapcache fix KAMEZAWA Hiroyuki
@ 2008-07-02 12:08 ` KAMEZAWA Hiroyuki
  2008-07-02 12:10 ` [RFC][-mm] [3/7] add shmem page to active list KAMEZAWA Hiroyuki
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-02 12:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

split-lru defines PAGE_CGROUP_FLAG_FILE...and memcg uses PAGE_CGROUP_FLAG_CACHE.
It seems there are no major difference between 2 flags.

This patch takes FLAG_FILE and delete FLAG_CACHE.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



Index: test-2.6.26-rc5-mm3++/mm/memcontrol.c
===================================================================
--- test-2.6.26-rc5-mm3++.orig/mm/memcontrol.c
+++ test-2.6.26-rc5-mm3++/mm/memcontrol.c
@@ -160,10 +160,9 @@ struct page_cgroup {
 	struct mem_cgroup *mem_cgroup;
 	int flags;
 };
-#define PAGE_CGROUP_FLAG_CACHE	   (0x1)	/* charged as cache */
+#define PAGE_CGROUP_FLAG_FILE	   (0x1)	/* charged as cache */
 #define PAGE_CGROUP_FLAG_ACTIVE    (0x2)	/* page is active in this cgroup */
-#define PAGE_CGROUP_FLAG_FILE	   (0x4)	/* page is file system backed */
-#define PAGE_CGROUP_FLAG_UNEVICTABLE (0x8)	/* page is unevictableable */
+#define PAGE_CGROUP_FLAG_UNEVICTABLE (0x4)	/* page is unevictableable */
 
 static int page_cgroup_nid(struct page_cgroup *pc)
 {
@@ -191,7 +190,7 @@ static void mem_cgroup_charge_statistics
 	struct mem_cgroup_stat *stat = &mem->stat;
 
 	VM_BUG_ON(!irqs_disabled());
-	if (flags & PAGE_CGROUP_FLAG_CACHE)
+	if (flags & PAGE_CGROUP_FLAG_FILE)
 		__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_CACHE, val);
 	else
 		__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_RSS, val);
@@ -573,11 +572,9 @@ static int mem_cgroup_charge_common(stru
 	 * If a page is accounted as a page cache, insert to inactive list.
 	 * If anon, insert to active list.
 	 */
-	if (ctype == MEM_CGROUP_CHARGE_TYPE_CACHE) {
-		pc->flags = PAGE_CGROUP_FLAG_CACHE;
-		if (page_is_file_cache(page))
-			pc->flags |= PAGE_CGROUP_FLAG_FILE;
-	} else
+	if (ctype == MEM_CGROUP_CHARGE_TYPE_CACHE)
+		pc->flags = PAGE_CGROUP_FLAG_FILE;
+	else
 		pc->flags = PAGE_CGROUP_FLAG_ACTIVE;
 
 	lock_page_cgroup(page);
@@ -772,7 +769,7 @@ int mem_cgroup_prepare_migration(struct 
 	if (pc) {
 		mem = pc->mem_cgroup;
 		css_get(&mem->css);
-		if (pc->flags & PAGE_CGROUP_FLAG_CACHE)
+		if (pc->flags & PAGE_CGROUP_FLAG_FILE)
 			ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
 	}
 	unlock_page_cgroup(page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][-mm] [3/7] add shmem page to active list.
  2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
  2008-07-02 12:07 ` [RFC][-mm] [1/7] shmem swapcache fix KAMEZAWA Hiroyuki
  2008-07-02 12:08 ` [RFC][-mm] [2/7] delete FLAG_CACHE KAMEZAWA Hiroyuki
@ 2008-07-02 12:10 ` KAMEZAWA Hiroyuki
  2008-07-03  0:11   ` KAMEZAWA Hiroyuki
  2008-07-02 12:12 ` [RFC][-mm] [4/7] handle limit change KAMEZAWA Hiroyuki
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-02 12:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

Original one is Hugh Dickins's
  http://marc.info/?l=linux-kernel&m=121469899304590&w=2
Not necessary if his one is merged.

add shmem's page to active list when we link it to memcg's lru.
need discussion ?

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
 mm/memcontrol.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: test-2.6.26-rc5-mm3++/mm/memcontrol.c
===================================================================
--- test-2.6.26-rc5-mm3++.orig/mm/memcontrol.c
+++ test-2.6.26-rc5-mm3++/mm/memcontrol.c
@@ -575,7 +575,10 @@ static int mem_cgroup_charge_common(stru
 	if (ctype == MEM_CGROUP_CHARGE_TYPE_CACHE)
 		pc->flags = PAGE_CGROUP_FLAG_FILE;
 	else
-		pc->flags = PAGE_CGROUP_FLAG_ACTIVE;
+		pc->flags = 0;
+	/* anonymous page and shmem pages are started from active list */
+	if (!page_is_file_cache(page))
+		pc->flags |= PAGE_CGROUP_FLAG_ACTIVE;
 
 	lock_page_cgroup(page);
 	if (unlikely(page_get_page_cgroup(page))) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][-mm] [4/7] handle limit change
  2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
                   ` (2 preceding siblings ...)
  2008-07-02 12:10 ` [RFC][-mm] [3/7] add shmem page to active list KAMEZAWA Hiroyuki
@ 2008-07-02 12:12 ` KAMEZAWA Hiroyuki
  2008-07-02 12:13 ` [RFC][-mm] [5/7] reduce usage at " KAMEZAWA Hiroyuki
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-02 12:12 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

Add an interface to set limit. This is necessary to memory resource controller
because it shrinks usage at set limit.

(*) Other controller may not need this interface to shrink usage because
    shrinking is not necessary or impossible under it.

Changelog:
  - fixed white space bug.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


 include/linux/res_counter.h |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

Index: test-2.6.26-rc5-mm3++/include/linux/res_counter.h
===================================================================
--- test-2.6.26-rc5-mm3++.orig/include/linux/res_counter.h
+++ test-2.6.26-rc5-mm3++/include/linux/res_counter.h
@@ -158,4 +158,19 @@ static inline void res_counter_reset_fai
 	cnt->failcnt = 0;
 	spin_unlock_irqrestore(&cnt->lock, flags);
 }
+
+static inline int res_counter_set_limit(struct res_counter *cnt,
+	unsigned long long limit)
+{
+	unsigned long flags;
+	int ret = -EBUSY;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	if (cnt->usage < limit) {
+		cnt->limit = limit;
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][-mm] [5/7] reduce usage at limit change
  2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
                   ` (3 preceding siblings ...)
  2008-07-02 12:12 ` [RFC][-mm] [4/7] handle limit change KAMEZAWA Hiroyuki
@ 2008-07-02 12:13 ` KAMEZAWA Hiroyuki
  2008-07-02 12:15 ` [RFC][-mm] [6/7] res_counter distance to limit KAMEZAWA Hiroyuki
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-02 12:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

Shrinking memory usage at limit change.

Changelog: v1 -> v2
  - adjusted to be based on write_string() patch set
  - removed backword goto.
  - removed unneccesary cond_resched().


Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

 Documentation/controllers/memory.txt |    3 --
 mm/memcontrol.c                      |   43 +++++++++++++++++++++++++++++++----
 2 files changed, 40 insertions(+), 6 deletions(-)

Index: test-2.6.26-rc5-mm3++/mm/memcontrol.c
===================================================================
--- test-2.6.26-rc5-mm3++.orig/mm/memcontrol.c
+++ test-2.6.26-rc5-mm3++/mm/memcontrol.c
@@ -834,6 +834,26 @@ int mem_cgroup_shrink_usage(struct mm_st
 	return 0;
 }
 
+int mem_cgroup_resize_limit(struct mem_cgroup *memcg, unsigned long long val)
+{
+
+	int retry_count = MEM_CGROUP_RECLAIM_RETRIES;
+	int progress;
+	int ret = 0;
+
+	while (res_counter_set_limit(&memcg->res, val)) {
+		if (!retry_count) {
+			ret = -EBUSY;
+			break;
+		}
+		progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
+		if (!progress)
+			retry_count--;
+	}
+	return ret;
+}
+
+
 /*
  * This routine traverse page_cgroup in given list and drop them all.
  * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
@@ -914,13 +934,29 @@ static u64 mem_cgroup_read(struct cgroup
 	return res_counter_read_u64(&mem_cgroup_from_cont(cont)->res,
 				    cft->private);
 }
-
+/*
+ * The user of this function is...
+ * RES_LIMIT.
+ */
 static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 			    const char *buffer)
 {
-	return res_counter_write(&mem_cgroup_from_cont(cont)->res,
-				 cft->private, buffer,
-				 res_counter_memparse_write_strategy);
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
+	unsigned long long val;
+	int ret;
+
+        switch (cft->private) {
+	case RES_LIMIT:
+		/* This function does all necessary parse...reuse it */
+		ret = res_counter_memparse_write_strategy(buffer, &val);
+		if (!ret)
+			ret = mem_cgroup_resize_limit(memcg, val);
+		break;
+	default:
+		ret = -EINVAL; /* should be BUG() ? */
+		break;
+	}
+	return ret;
 }
 
 static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
Index: test-2.6.26-rc5-mm3++/Documentation/controllers/memory.txt
===================================================================
--- test-2.6.26-rc5-mm3++.orig/Documentation/controllers/memory.txt
+++ test-2.6.26-rc5-mm3++/Documentation/controllers/memory.txt
@@ -242,8 +242,7 @@ rmdir() if there are no tasks.
 1. Add support for accounting huge pages (as a separate controller)
 2. Make per-cgroup scanner reclaim not-shared pages first
 3. Teach controller to account for shared-pages
-4. Start reclamation when the limit is lowered
-5. Start reclamation in the background when the limit is
+4. Start reclamation in the background when the limit is
    not yet hit but the usage is getting closer
 
 Summary

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][-mm] [6/7] res_counter distance to limit
  2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
                   ` (4 preceding siblings ...)
  2008-07-02 12:13 ` [RFC][-mm] [5/7] reduce usage at " KAMEZAWA Hiroyuki
@ 2008-07-02 12:15 ` KAMEZAWA Hiroyuki
  2008-07-02 19:19   ` Paul Menage
  2008-07-02 12:17 ` [RFC][-mm] [7/7] background job for memcg KAMEZAWA Hiroyuki
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-02 12:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

I wonder wheher there is better name rather than "distance"...
give me a hint ;)
==
Charge the val to res_counter and returns distance to the limit.

Useful when a controller (memory controller) want to implement background
feedback ops depends on the rest of resource.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

 include/linux/res_counter.h |   21 ++++++++++++++++++++-
 kernel/res_counter.c        |   27 +++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 1 deletion(-)

Index: test-2.6.26-rc5-mm3++/include/linux/res_counter.h
===================================================================
--- test-2.6.26-rc5-mm3++.orig/include/linux/res_counter.h
+++ test-2.6.26-rc5-mm3++/include/linux/res_counter.h
@@ -104,7 +104,8 @@ int __must_check res_counter_charge_lock
 		unsigned long val);
 int __must_check res_counter_charge(struct res_counter *counter,
 		unsigned long val);
-
+int __must_check res_counter_charge_distance(struct res_counter *counter,
+	unsigned long val, unsigned long long *distance);
 /*
  * uncharge - tell that some portion of the resource is released
  *
@@ -173,4 +174,22 @@ static inline int res_counter_set_limit(
 	spin_unlock_irqrestore(&cnt->lock, flags);
 	return ret;
 }
+
+/*
+ * Returns limit - usage. if usage > limit, returns 0.
+ */
+
+static inline unsigned long long
+res_counter_distance_to_limit(struct res_counter *cnt)
+{
+	unsigned long flags;
+	unsigned long long distance = 0;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	if (cnt->usage < cnt->limit)
+		distance = cnt->limit - cnt->usage;
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return distance;
+}
+
 #endif
Index: test-2.6.26-rc5-mm3++/kernel/res_counter.c
===================================================================
--- test-2.6.26-rc5-mm3++.orig/kernel/res_counter.c
+++ test-2.6.26-rc5-mm3++/kernel/res_counter.c
@@ -44,6 +44,33 @@ int res_counter_charge(struct res_counte
 	return ret;
 }
 
+/*
+ * res_counter_charge_distance - do res_counter_charge and returns distance to
+ * limit.
+ * @counter: the counter
+ * @val: the amount of the resource. each controller defines its own units.
+ * @distance: the rest of resource to the limit.
+ *
+ * returns 0 on success and <0 if the counter->usage will exceed the
+ * counter->limit.
+ */
+
+int res_counter_charge_distance(struct res_counter *counter, unsigned long val,
+	unsigned long long *distance)
+{
+	int ret;
+	unsigned long flags;
+
+	spin_lock_irqsave(&counter->lock, flags);
+	ret = res_counter_charge_locked(counter, val);
+	if (!ret)
+		*distance = counter->limit - counter->usage;
+	spin_unlock_irqrestore(&counter->lock, flags);
+	return ret;
+}
+
+
+
 void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
 {
 	if (WARN_ON(counter->usage < val))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][-mm] [7/7] background job for memcg
  2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
                   ` (5 preceding siblings ...)
  2008-07-02 12:15 ` [RFC][-mm] [6/7] res_counter distance to limit KAMEZAWA Hiroyuki
@ 2008-07-02 12:17 ` KAMEZAWA Hiroyuki
  2008-07-02 14:23 ` [RFC][-mm] [0/7] misc memcg patch set Daisuke Nishimura
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-02 12:17 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

Background relcaim for memcg.

This patch adds a daemon to do background page reclaim based on high/low
watermark for memcg. Almost all codes are rewritten from previous one.
So start from numbering v1, again.

Major changes from old one:
 i)changes the reclaim to be started based on distance to the limit.
   Not to the amount of pages.
   By this, we don't need strict check against low < high < limit.
   low and high are guaranteed to be below limit.
 ii)"A" daemon is used instead of daemons per memcg.
   maybe simpler than previous ones. (But maybe it's ok to start per-node
   thread. It's  TODO.)
 iii) Because of ii), memcg->flags is added.

Note: I tried to use work_queue but it seems it's not suitable for a work
      with loop. So, I added kthread.
      BTW, if you think of better name, please tell me.


Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

 mm/memcontrol.c |  201 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 196 insertions(+), 5 deletions(-)

Index: test-2.6.26-rc5-mm3++/mm/memcontrol.c
===================================================================
--- test-2.6.26-rc5-mm3++.orig/mm/memcontrol.c
+++ test-2.6.26-rc5-mm3++/mm/memcontrol.c
@@ -11,7 +11,7 @@
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
  *
- * This program is distributed in the hope that it will be useful,
+ * This program is distributed inx the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
@@ -33,13 +33,31 @@
 #include <linux/seq_file.h>
 #include <linux/vmalloc.h>
 #include <linux/mm_inline.h>
-
+#include <linux/list.h>
+#include <linux/wait.h>
+#include <linux/kthread.h>
+#include <linux/freezer.h>
 #include <asm/uaccess.h>
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 static struct kmem_cache *page_cgroup_cache __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
 
+/* Background reclaim stuff */
+static LIST_HEAD(memcg_global_mswapd_list);
+static DECLARE_WAIT_QUEUE_HEAD(memcg_mswapdq);
+static DEFINE_SPINLOCK(memcg_mswapd_lock);
+struct task_struct *mswapd_daemon  __read_mostly;
+
+enum {
+	MEMCG_HWMARK = 10000,
+	MEMCG_LWMARK = 10001,
+};
+
+
+/* An interface for waiting the end of background reclaim */
+static DECLARE_WAIT_QUEUE_HEAD(memcg_destroy_waitq);
+
 /*
  * Statistics for memory cgroup.
  */
@@ -129,6 +147,11 @@ struct mem_cgroup {
 	struct mem_cgroup_lru_info info;
 
 	int	prev_priority;	/* for recording reclaim priority */
+	unsigned long		flags;
+	/* background reclaim stuff */
+	unsigned long long highwmrk_distance;
+	unsigned long long lowwmrk_distance;
+	struct list_head	mswapd_list;
 	/*
 	 * statistics.
 	 */
@@ -136,6 +159,13 @@ struct mem_cgroup {
 };
 static struct mem_cgroup init_mem_cgroup;
 
+/* Flag bit for memcg itself */
+enum {
+	MEMCG_FLAG_IN_RECLAIM,
+	MEMCG_FLAG_OBSOLETE,
+};
+
+
 /*
  * We use the lower bit of the page->page_cgroup pointer as a bit spin
  * lock.  We need to ensure that page->page_cgroup is at least two
@@ -504,6 +534,37 @@ unsigned long mem_cgroup_isolate_pages(u
 	return nr_taken;
 }
 
+
+static void mem_cgroup_schedule_reclaim(struct mem_cgroup *memcg)
+{
+	unsigned long flags;
+
+	if (unlikely(!mswapd_daemon))
+		return;
+
+	if (!test_and_set_bit(MEMCG_FLAG_IN_RECLAIM, &memcg->flags)) {
+		/* When OBSOLETE is marked, there is no thread in this group */
+		BUG_ON(test_bit(MEMCG_FLAG_OBSOLETE, &memcg->flags));
+
+		spin_lock_irqsave(&memcg_mswapd_lock, flags);
+		BUG_ON(!list_empty(&memcg->mswapd_list));
+		css_get(&memcg->css);
+		list_add_tail(&memcg->mswapd_list, &memcg_global_mswapd_list);
+		spin_unlock_irqrestore(&memcg_mswapd_lock, flags);
+		if (!waitqueue_active(&memcg_mswapdq))
+			return;
+		wake_up_interruptible(&memcg_mswapdq);
+	}
+}
+
+
+static void mem_cgroup_check_distance_to_limit(struct mem_cgroup *memcg,
+					unsigned long long distance)
+{
+	if (distance < memcg->highwmrk_distance)
+		mem_cgroup_schedule_reclaim(memcg);
+}
+
 /*
  * Charge the memory controller for page usage.
  * Return
@@ -518,6 +579,7 @@ static int mem_cgroup_charge_common(stru
 	struct page_cgroup *pc;
 	unsigned long flags;
 	unsigned long nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
+	unsigned long long distance;
 	struct mem_cgroup_per_zone *mz;
 
 	pc = kmem_cache_alloc(page_cgroup_cache, gfp_mask);
@@ -543,7 +605,7 @@ static int mem_cgroup_charge_common(stru
 		css_get(&memcg->css);
 	}
 
-	while (res_counter_charge(&mem->res, PAGE_SIZE)) {
+	while (res_counter_charge_distance(&mem->res, PAGE_SIZE, &distance)) {
 		if (!(gfp_mask & __GFP_WAIT))
 			goto out;
 
@@ -596,6 +658,9 @@ static int mem_cgroup_charge_common(stru
 	spin_unlock_irqrestore(&mz->lru_lock, flags);
 
 	unlock_page_cgroup(page);
+	/* O.K. we successfully charged. check thresholds...
+	   should check gfp flag ? */
+	mem_cgroup_check_distance_to_limit(mem, distance);
 done:
 	return 0;
 out:
@@ -834,6 +899,68 @@ int mem_cgroup_shrink_usage(struct mm_st
 	return 0;
 }
 
+/*
+ * a daemon to do backgound page shrinking within memcg.
+ */
+static int memcg_mswapd(void *data)
+{
+	DEFINE_WAIT(wait);
+	int ret;
+	struct mem_cgroup *memcg;
+	unsigned long distance;
+
+	current->flags |= PF_SWAPWRITE;
+	set_user_nice(current, 0);
+	set_freezable();
+
+	while (!kthread_should_stop()) {
+		prepare_to_wait(&memcg_mswapdq, &wait, TASK_INTERRUPTIBLE);
+
+		/* Is there scheduled one ? */
+		spin_lock_irq(&memcg_mswapd_lock);
+		if (list_empty(&memcg_global_mswapd_list)) {
+			spin_unlock_irq(&memcg_mswapd_lock);
+			if (!kthread_should_stop()) {
+				schedule();
+				try_to_freeze();
+			}
+			finish_wait(&memcg_mswapdq, &wait);
+			continue;
+		}
+		memcg = container_of(memcg_global_mswapd_list.next,
+					struct mem_cgroup, mswapd_list);
+		list_del_init(&memcg->mswapd_list);
+		spin_unlock_irq(&memcg_mswapd_lock);
+
+		finish_wait(&memcg_mswapdq, &wait);
+
+		if (!test_bit(MEMCG_FLAG_OBSOLETE, &memcg->flags)) {
+			ret = try_to_free_mem_cgroup_pages(memcg,
+						  GFP_HIGHUSER_MOVABLE);
+			distance = res_counter_distance_to_limit(&memcg->res);
+		} else
+			distance = 0;
+
+		if (distance < memcg->lowwmrk_distance) {
+			/* Don't clear IN_RECLAIM flag and add to tail */
+			spin_lock_irq(&memcg_mswapd_lock);
+			list_add_tail(&memcg->mswapd_list,
+				      &memcg_global_mswapd_list);
+			spin_unlock_irq(&memcg_mswapd_lock);
+		} else {
+			css_put(&memcg->css);
+			clear_bit(MEMCG_FLAG_IN_RECLAIM, &memcg->flags);
+			wake_up_all(&memcg_destroy_waitq);
+		}
+		yield();
+	}
+	/* currently, stop this thread is not implemented, but maybe
+	   in future. */
+	BUG();
+	return 0;
+}
+
+
 int mem_cgroup_resize_limit(struct mem_cgroup *memcg, unsigned long long val)
 {
 
@@ -931,8 +1058,17 @@ out:
 
 static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
 {
-	return res_counter_read_u64(&mem_cgroup_from_cont(cont)->res,
-				    cft->private);
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
+
+	switch (cft->private) {
+	case MEMCG_HWMARK:
+		return memcg->highwmrk_distance;
+	case MEMCG_LWMARK:
+		return memcg->lowwmrk_distance;
+	default:
+		break;
+	}
+	return res_counter_read_u64(&memcg->res, cft->private);
 }
 /*
  * The user of this function is...
@@ -952,6 +1088,24 @@ static int mem_cgroup_write(struct cgrou
 		if (!ret)
 			ret = mem_cgroup_resize_limit(memcg, val);
 		break;
+	case MEMCG_HWMARK:
+		ret = res_counter_memparse_write_strategy(buffer, &val);
+		if (!ret) {
+			if (val <= memcg->lowwmrk_distance)
+				memcg->highwmrk_distance = val;
+			else
+				ret = -EINVAL;
+		}
+		break;
+	case MEMCG_LWMARK:
+		ret = res_counter_memparse_write_strategy(buffer, &val);
+		if (!ret) {
+			if (val >= memcg->highwmrk_distance)
+				memcg->lowwmrk_distance = val;
+			else
+				ret = -EINVAL;
+		}
+		break;
 	default:
 		ret = -EINVAL; /* should be BUG() ? */
 		break;
@@ -1063,6 +1217,18 @@ static struct cftype mem_cgroup_files[] 
 		.name = "stat",
 		.read_map = mem_control_stat_show,
 	},
+	{
+		.name = "start_reclaim_distance",
+		.private = MEMCG_HWMARK,
+		.write_string = mem_cgroup_write,
+		.read_u64 = mem_cgroup_read,
+	},
+	{
+		.name = "stop_reclaim_distance",
+		.private = MEMCG_LWMARK,
+		.write_string = mem_cgroup_write,
+		.read_u64 = mem_cgroup_read,
+	},
 };
 
 static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
@@ -1124,6 +1290,19 @@ static void mem_cgroup_free(struct mem_c
 		vfree(mem);
 }
 
+static int mem_cgroup_start_daemon(void)
+{
+	struct task_struct *result;
+	int ret = 0;
+	result = kthread_run(memcg_mswapd, NULL, "memcontrol");
+	if (IS_ERR(result)) {
+		printk("failed to start memory controller daemon\n");
+		mswapd_daemon = NULL;
+	} else
+		mswapd_daemon = result;
+	return ret;
+}
+late_initcall(mem_cgroup_start_daemon);
 
 static struct cgroup_subsys_state *
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
@@ -1146,12 +1325,15 @@ mem_cgroup_create(struct cgroup_subsys *
 		if (alloc_mem_cgroup_per_zone_info(mem, node))
 			goto free_out;
 
+	/* mem->flags is cleared by memset() */
+
 	return &mem->css;
 free_out:
 	for_each_node_state(node, N_POSSIBLE)
 		free_mem_cgroup_per_zone_info(mem, node);
 	if (cont->parent != NULL)
 		mem_cgroup_free(mem);
+
 	return ERR_PTR(-ENOMEM);
 }
 
@@ -1159,7 +1341,16 @@ static void mem_cgroup_pre_destroy(struc
 					struct cgroup *cont)
 {
 	struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
+
+	set_bit(MEMCG_FLAG_OBSOLETE, &mem->flags);
+	if (mswapd_daemon)
+		wake_up(mswapd_daemon);
+	/* wait for being removed from background reclaim queue */
+	wait_event_interruptible(memcg_destroy_waitq,
+			!(test_bit(MEMCG_FLAG_IN_RECLAIM, &mem->flags)));
+
 	mem_cgroup_force_empty(mem);
+
 }
 
 static void mem_cgroup_destroy(struct cgroup_subsys *ss,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [0/7] misc memcg patch set
  2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
                   ` (6 preceding siblings ...)
  2008-07-02 12:17 ` [RFC][-mm] [7/7] background job for memcg KAMEZAWA Hiroyuki
@ 2008-07-02 14:23 ` Daisuke Nishimura
  2008-07-02 16:31   ` KOSAKI Motohiro
  2008-07-02 23:56   ` KAMEZAWA Hiroyuki
  2008-07-03  2:38 ` KAMEZAWA Hiroyuki
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 24+ messages in thread
From: Daisuke Nishimura @ 2008-07-02 14:23 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: nishimura, linux-mm, balbir, xemul, yamamoto, hugh, kosaki.motohiro

Hi, Kamezawa-san.

> - swap_controller (Maybe Nishimura works on.)
>   The world may change after this...cgroup without swap can appears easily.
> 
Yes, and sorry for delaying submitting the next version of it.

I used most of my time in testing -mm itself last month,
but I'm testing the next version now and goint to submit it
in a few days.

I hope to have some discussion on this topic too
at OLS and LinuxFoundationSyposiumJapan ;)


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [0/7] misc memcg patch set
  2008-07-02 14:23 ` [RFC][-mm] [0/7] misc memcg patch set Daisuke Nishimura
@ 2008-07-02 16:31   ` KOSAKI Motohiro
  2008-07-02 23:56   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 24+ messages in thread
From: KOSAKI Motohiro @ 2008-07-02 16:31 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, linux-mm, balbir, xemul, yamamoto, hugh

Hi Nishimura-san,

>> - swap_controller (Maybe Nishimura works on.)
>>   The world may change after this...cgroup without swap can appears easily.
>>
> Yes, and sorry for delaying submitting the next version of it.
>
> I used most of my time in testing -mm itself last month,
> but I'm testing the next version now and goint to submit it
> in a few days.

sorry for that ;-)

but your recent activity improvement -mm stabilization largely.
thank you.
I'm continuously working to stabilize -mm too.

> I hope to have some discussion on this topic too
> at OLS and LinuxFoundationSyposiumJapan ;)

I hope hear your swap controller development plan too.
I'm looking forward to meet you :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [6/7] res_counter distance to limit
  2008-07-02 12:15 ` [RFC][-mm] [6/7] res_counter distance to limit KAMEZAWA Hiroyuki
@ 2008-07-02 19:19   ` Paul Menage
  2008-07-02 23:57     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Menage @ 2008-07-02 19:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

On Wed, Jul 2, 2008 at 5:15 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> I wonder wheher there is better name rather than "distance"...
> give me a hint ;)

How about res_counter_report_spare() and res_counter_charge_and_report_spare() ?

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [0/7] misc memcg patch set
  2008-07-02 14:23 ` [RFC][-mm] [0/7] misc memcg patch set Daisuke Nishimura
  2008-07-02 16:31   ` KOSAKI Motohiro
@ 2008-07-02 23:56   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-02 23:56 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, balbir, xemul, yamamoto, hugh, kosaki.motohiro

On Wed, 2 Jul 2008 23:23:15 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> Hi, Kamezawa-san.
> 
> > - swap_controller (Maybe Nishimura works on.)
> >   The world may change after this...cgroup without swap can appears easily.
> > 
> Yes, and sorry for delaying submitting the next version of it.
> 
> I used most of my time in testing -mm itself last month,
> but I'm testing the next version now and goint to submit it
> in a few days.
> 
Your test was very helpful, thank you!.


> I hope to have some discussion on this topic too
> at OLS and LinuxFoundationSyposiumJapan ;)
> 
me, too

Thanks,
-Kame

> 
> Thanks,
> Daisuke Nishimura.
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [6/7] res_counter distance to limit
  2008-07-02 19:19   ` Paul Menage
@ 2008-07-02 23:57     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-02 23:57 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

On Wed, 2 Jul 2008 12:19:24 -0700
"Paul Menage" <menage@google.com> wrote:

> On Wed, Jul 2, 2008 at 5:15 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > I wonder wheher there is better name rather than "distance"...
> > give me a hint ;)
> 
> How about res_counter_report_spare() and res_counter_charge_and_report_spare() ?
> 
seems better. thank you.

-Kame

> Paul
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [3/7] add shmem page to active list.
  2008-07-02 12:10 ` [RFC][-mm] [3/7] add shmem page to active list KAMEZAWA Hiroyuki
@ 2008-07-03  0:11   ` KAMEZAWA Hiroyuki
  2008-07-03  4:27     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-03  0:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro


On Wed, 2 Jul 2008 21:10:57 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Original one is Hugh Dickins's
>   http://marc.info/?l=linux-kernel&m=121469899304590&w=2
> Not necessary if his one is merged.
> 
It was merged. Then, this patch will be removed. (but see below)

> add shmem's page to active list when we link it to memcg's lru.
> need discussion ?
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>  mm/memcontrol.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> Index: test-2.6.26-rc5-mm3++/mm/memcontrol.c
> ===================================================================
> --- test-2.6.26-rc5-mm3++.orig/mm/memcontrol.c
> +++ test-2.6.26-rc5-mm3++/mm/memcontrol.c
> @@ -575,7 +575,10 @@ static int mem_cgroup_charge_common(stru
>  	if (ctype == MEM_CGROUP_CHARGE_TYPE_CACHE)
>  		pc->flags = PAGE_CGROUP_FLAG_FILE;
>  	else
> -		pc->flags = PAGE_CGROUP_FLAG_ACTIVE;
> +		pc->flags = 0;
> +	/* anonymous page and shmem pages are started from active list */
> +	if (!page_is_file_cache(page))
> +		pc->flags |= PAGE_CGROUP_FLAG_ACTIVE;
>  
This was wrong ;(

       if (page_is_file_cache(page))
                pc->flags = PAGE_CGROUP_FLAG_FILE;
        else
                pc->flags = PAGE_CGROUP_ACTIVE;

will be a correct one. And this will change the shmem's accounting attribute
from CACHE to ANON. Does anyone have strong demand to account shmem as CACHE ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [0/7] misc memcg patch set
  2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
                   ` (7 preceding siblings ...)
  2008-07-02 14:23 ` [RFC][-mm] [0/7] misc memcg patch set Daisuke Nishimura
@ 2008-07-03  2:38 ` KAMEZAWA Hiroyuki
  2008-07-03  3:06 ` KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-03  2:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

On Wed, 2 Jul 2008 21:03:22 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Other patches in plan (including other guy's)
> - soft-limit (Balbir works.)
>   I myself think memcg-background-job patches can copperative with this.
> 
> - dirty_ratio for memcg. (haven't written at all)
>   Support dirty_ratio for memcg. This will improve OOM avoidance.
> 
> - swapiness for memcg (had patches..but have to rewrite.)
>   Support swapiness per memcg. (of no use ?)
> 
> - swap_controller (Maybe Nishimura works on.)
>   The world may change after this...cgroup without swap can appears easily.
> 
> - hierarchy (needs more discussion. maybe after OLS?)
>   have some pathes, but not in hurry.
> 
> - more performance improvements (we need some trick.)
>   = Can we remove lock_page_cgroup() ?
>   = Can we reduce spinlocks ?
> 
> - move resource at task move (needs helps from cgroup)
>   We need some magical way. It seems impossible to implement this only by memcg.
> 
> - NUMA statistics (needs helps from cgroup)
>   It seems dynamic file creation feature or some rule to show array of
>   statistics should be defined.
> 
> - memory guarantee (soft-mlock.)
>   guard parameter against global LRU for saying "Don't reclaim from me more ;("
>   Maybe HA Linux people will want this....
> 
> Do you have others ?
> 

+ hugepage handling.
  Currently hugepage is charged as PAGE_SIZE page....it's a BUG.
  At first, it seems we have to avoid charging PG_compund page.
  (until multi-size page cache is introduced.)

  I think hugepage itself is an other resource than memcg deals with. The total
  amount ot it is controlled by sysctl.
 
  Should we add hugepage controller or memrlimit controller will handle it ?
  Or just ignore hugepage ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [0/7] misc memcg patch set
  2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
                   ` (8 preceding siblings ...)
  2008-07-03  2:38 ` KAMEZAWA Hiroyuki
@ 2008-07-03  3:06 ` KAMEZAWA Hiroyuki
  2008-07-03  9:07 ` Balbir Singh
  2008-07-11  8:38 ` KAMEZAWA Hiroyuki
  11 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-03  3:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

On Wed, 2 Jul 2008 21:03:22 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Hi, it seems vmm-related bugs in -mm is reduced to some safe level.
> I restarted my patches for memcg.
> 
> This mail is just for dumping patches on my stack.
> (I'll resend one by one later.)
> 
> based on 2.6.26-rc5-mm3
> + kosaki's fixes + cgroup write_string set + Hugh Dickins's fixes for shmem
> (All patches are in -mm queue.)
> 
> Any comments are welcome (but 7/7 patch is not so neat...)
> 
> [1/7] swapcache handle fix for shmem.
> [2/7] adjust to split-lru: remove PAGE_CGROUP_FLAG_CACHE flag. 
> [3/7] adjust to split-lru: push shmem's page to active list
>       (Imported from Hugh Dickins's work.)
> [4/7] reduce usage at change limit. res_counter part.
> [5/7] reduce usage at change limit. memcg part.
> [6/7] memcg-background-job.           res_coutner part
> [7/7] memcg-background-job            memcg part.

I decied to rewrite 7/7, totally. 
And add more sophisticated memcg-threadpool. (like pdflush)
So, please ignore 7/7.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [3/7] add shmem page to active list.
  2008-07-03  0:11   ` KAMEZAWA Hiroyuki
@ 2008-07-03  4:27     ` KAMEZAWA Hiroyuki
  2008-07-03  7:03       ` Hugh Dickins
  0 siblings, 1 reply; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-03  4:27 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

On Thu, 3 Jul 2008 09:11:44 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > add shmem's page to active list when we link it to memcg's lru.
> > need discussion ?
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >  mm/memcontrol.c |    5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > Index: test-2.6.26-rc5-mm3++/mm/memcontrol.c
> > ===================================================================
> > --- test-2.6.26-rc5-mm3++.orig/mm/memcontrol.c
> > +++ test-2.6.26-rc5-mm3++/mm/memcontrol.c
> > @@ -575,7 +575,10 @@ static int mem_cgroup_charge_common(stru
> >  	if (ctype == MEM_CGROUP_CHARGE_TYPE_CACHE)
> >  		pc->flags = PAGE_CGROUP_FLAG_FILE;
> >  	else
> > -		pc->flags = PAGE_CGROUP_FLAG_ACTIVE;
> > +		pc->flags = 0;
> > +	/* anonymous page and shmem pages are started from active list */
> > +	if (!page_is_file_cache(page))
> > +		pc->flags |= PAGE_CGROUP_FLAG_ACTIVE;
> >  
> This was wrong ;(
> 
>        if (page_is_file_cache(page))
>                 pc->flags = PAGE_CGROUP_FLAG_FILE;
>         else
>                 pc->flags = PAGE_CGROUP_ACTIVE;
> 
> will be a correct one. And this will change the shmem's accounting attribute
> from CACHE to ANON. Does anyone have strong demand to account shmem as CACHE ?
> 

BTW, is there a way to see the RSS usage of shmem from /proc or somewhere ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [3/7] add shmem page to active list.
  2008-07-03  4:27     ` KAMEZAWA Hiroyuki
@ 2008-07-03  7:03       ` Hugh Dickins
  2008-07-03  7:43         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 24+ messages in thread
From: Hugh Dickins @ 2008-07-03  7:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, kosaki.motohiro

On Thu, 3 Jul 2008, KAMEZAWA Hiroyuki wrote:
> 
> BTW, is there a way to see the RSS usage of shmem from /proc or somewhere ?

No, it's just been a (very weirdly backed!) filesystem until these
-mm developments.  If you add such stats (for more than temporary
debugging), you'll need to use per_cpu counters for it: more global
locking or atomic ops on those paths would be sure to upset SGI.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [3/7] add shmem page to active list.
  2008-07-03  7:03       ` Hugh Dickins
@ 2008-07-03  7:43         ` KAMEZAWA Hiroyuki
  2008-07-03 22:28           ` Hugh Dickins
  0 siblings, 1 reply; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-03  7:43 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, kosaki.motohiro

On Thu, 3 Jul 2008 08:03:17 +0100 (BST)
Hugh Dickins <hugh@veritas.com> wrote:

> On Thu, 3 Jul 2008, KAMEZAWA Hiroyuki wrote:
> > 
> > BTW, is there a way to see the RSS usage of shmem from /proc or somewhere ?
> 
> No, it's just been a (very weirdly backed!) filesystem until these
> -mm developments.  If you add such stats (for more than temporary
> debugging), you'll need to use per_cpu counters for it: more global
> locking or atomic ops on those paths would be sure to upset SGI.
> 

like zone stat ? but I think struct address_space->nr_pages is udapted and
shmem's inode has alloced/swapped paremeters.

It seems alloced == address_space->nr_pages + info->swapped, right ?

I just wanted to ask whether they are exported or not.
(Or can I get that information by some ioctl ?)

BTW,  current meminfo is following.
==
[kamezawa@blackonyx test-2.6.26-rc5-mm3++]$ cat /proc/meminfo
MemTotal:       49471980 kB
MemFree:        44448528 kB
Buffers:          472412 kB
Cached:          3721388 kB
SwapCached:        22616 kB
Active:           658480 kB
Inactive:        3609828 kB
Active(anon):      14900 kB
Inactive(anon):    64496 kB
Active(file):     643580 kB
Inactive(file):  3545332 kB
Unevictable:        2020 kB
Mlocked:            2020 kB
SwapTotal:       2031608 kB
SwapFree:        1982656 kB
Dirty:                60 kB
Writeback:             0 kB
AnonPages:         62476 kB
Mapped:            32092 kB
Slab:             548584 kB
SReclaimable:     490284 kB
SUnreclaim:        58300 kB
PageTables:        12648 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
==

Cached = filesystem + shmem
Active(anon) = anon-active + shmem-active
Inactive(anon) = anon-inactive + shmem-inactive
Active(file) = file cache-active
Inactive(file) = file cache-inactive.

Right ? Maybe I have to drop the patch 2/7 and leave FLAG_CACHE.

Thanks,
-Kame

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [0/7] misc memcg patch set
  2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
                   ` (9 preceding siblings ...)
  2008-07-03  3:06 ` KAMEZAWA Hiroyuki
@ 2008-07-03  9:07 ` Balbir Singh
  2008-07-03  9:54   ` KAMEZAWA Hiroyuki
  2008-07-11  8:38 ` KAMEZAWA Hiroyuki
  11 siblings, 1 reply; 24+ messages in thread
From: Balbir Singh @ 2008-07-03  9:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

KAMEZAWA Hiroyuki wrote:
> Hi, it seems vmm-related bugs in -mm is reduced to some safe level.
> I restarted my patches for memcg.
> 
> This mail is just for dumping patches on my stack.
> (I'll resend one by one later.)
> 
> based on 2.6.26-rc5-mm3
> + kosaki's fixes + cgroup write_string set + Hugh Dickins's fixes for shmem
> (All patches are in -mm queue.)
> 
> Any comments are welcome (but 7/7 patch is not so neat...)
> 
> [1/7] swapcache handle fix for shmem.
> [2/7] adjust to split-lru: remove PAGE_CGROUP_FLAG_CACHE flag. 
> [3/7] adjust to split-lru: push shmem's page to active list
>       (Imported from Hugh Dickins's work.)
> [4/7] reduce usage at change limit. res_counter part.
> [5/7] reduce usage at change limit. memcg part.
> [6/7] memcg-background-job.           res_coutner part
> [7/7] memcg-background-job            memcg part.
> 
> Balbir, I'd like to import your idea of soft-limit to memcg-background-job
> patch set. (Maybe better than adding hooks to very generic part.)
> How do you think ?
> 

I am all for integration. My only requirement is that I want to reclaim from a
node when there is system memory contention. The soft limit patches touch the
generic infrastructure, just barely to indicate that we should look at
reclaiming from controllers over their soft limit.

> Other patches in plan (including other guy's)
> - soft-limit (Balbir works.)
>   I myself think memcg-background-job patches can copperative with this.
> 

That'll be nice thing to do. I am planning on a new version of the soft limit
patches soon (but due to data structure experimentation, it's taking me longer
to get done).

> - dirty_ratio for memcg. (haven't written at all)
>   Support dirty_ratio for memcg. This will improve OOM avoidance.
> 

OK, might be worth doing

> - swapiness for memcg (had patches..but have to rewrite.)
>   Support swapiness per memcg. (of no use ?)
> 

OK, Might be worth doing

> - swap_controller (Maybe Nishimura works on.)
>   The world may change after this...cgroup without swap can appears easily.
> 

I see a swap controller and swap namespace emerging, we'll need to see how they
work. The swap controller is definitely important

> - hierarchy (needs more discussion. maybe after OLS?)
>   have some pathes, but not in hurry.
> 

Same here, not in a hurry, but I think it will help define full functionality

> - more performance improvements (we need some trick.)
>   = Can we remove lock_page_cgroup() ?

We exchanged some early patches on this. We'll get back to it after the things
above.

>   = Can we reduce spinlocks ?
> 

Yes and most of our work happens under irqs disabled. We'll need to investigate
a bit more.

> - move resource at task move (needs helps from cgroup)
>   We need some magical way. It seems impossible to implement this only by memcg.
> 

I have some ideas on this. May be we can discuss this in the OLS BoF or on
email. This is low priority at the moment.

> - NUMA statistics (needs helps from cgroup)
>   It seems dynamic file creation feature or some rule to show array of
>   statistics should be defined.
> 
> - memory guarantee (soft-mlock.)
>   guard parameter against global LRU for saying "Don't reclaim from me more ;("
>   Maybe HA Linux people will want this....
> 

This is  hard goal to achieve, since we do have unreclaimable memory. Guarantees
would probably imply reservation of resources. Water marks might be a better way
to do it.

> Do you have others ?
> 

I think that should be it (it covers most if not all the documented TODOs we have)



-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [0/7] misc memcg patch set
  2008-07-03  9:07 ` Balbir Singh
@ 2008-07-03  9:54   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-03  9:54 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

On Thu, 03 Jul 2008 14:37:20 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> > Balbir, I'd like to import your idea of soft-limit to memcg-background-job
> > patch set. (Maybe better than adding hooks to very generic part.)
> > How do you think ?
> > 
> 
> I am all for integration. My only requirement is that I want to reclaim from a
> node when there is system memory contention. The soft limit patches touch the
> generic infrastructure, just barely to indicate that we should look at
> reclaiming from controllers over their soft limit.
> 
One of my concern is that soft-limit path is added to alloc_pages()'s not to
kswapd()'s path. I wonder it's better to detect memory shortage by some 
calculation and do it by ourselves rather than adding hook to memory allocation
path.

For example.

 start soft limit reclaim when the amount of free pages of zone < XXXMbytes.
 
 But I'm not sure how this kind of voluntary memory freeing should be done.
 It seems soft-limit at memory contention implies to add some lru priority to
 pages. Maybe someone wants.


> > Other patches in plan (including other guy's)
> > - soft-limit (Balbir works.)
> >   I myself think memcg-background-job patches can copperative with this.
> > 
> 
> That'll be nice thing to do. I am planning on a new version of the soft limit
> patches soon (but due to data structure experimentation, it's taking me longer
> to get done).
> 
  My new version of background-job will be much smarter than 7/7.

> > - dirty_ratio for memcg. (haven't written at all)
> >   Support dirty_ratio for memcg. This will improve OOM avoidance.
> > 
> 
> OK, might be worth doing
> 
> > - swapiness for memcg (had patches..but have to rewrite.)
> >   Support swapiness per memcg. (of no use ?)
> > 
> 
> OK, Might be worth doing
> 
> > - swap_controller (Maybe Nishimura works on.)
> >   The world may change after this...cgroup without swap can appears easily.
> > 
> 
> I see a swap controller and swap namespace emerging, we'll need to see how they
> work. The swap controller is definitely important
> 
> > - hierarchy (needs more discussion. maybe after OLS?)
> >   have some pathes, but not in hurry.
> > 
> 
> Same here, not in a hurry, but I think it will help define full functionality
> 
> > - more performance improvements (we need some trick.)
> >   = Can we remove lock_page_cgroup() ?
> 
> We exchanged some early patches on this. We'll get back to it after the things
> above.
> 
> >   = Can we reduce spinlocks ?
> > 
> 
> Yes and most of our work happens under irqs disabled. We'll need to investigate
> a bit more.
> 
> > - move resource at task move (needs helps from cgroup)
> >   We need some magical way. It seems impossible to implement this only by memcg.
> > 
> 
> I have some ideas on this. May be we can discuss this in the OLS BoF or on
> email. This is low priority at the moment.
> 
I'm also not in hurry but
Why I argue this is because I consider following (special) situation.

  echo PID > /dev/memcg/group01/tasks
  
  "PID" PID allocates tons of memory and mlock it. (or make swap full.)

  admin moved it.
  echo PID > /dev/memcg/group02/tasks

  What happens ? If OOM occurs in group01, only not-guilty processes will be killed.


> > - NUMA statistics (needs helps from cgroup)
> >   It seems dynamic file creation feature or some rule to show array of
> >   statistics should be defined.
> > 
> > - memory guarantee (soft-mlock.)
> >   guard parameter against global LRU for saying "Don't reclaim from me more ;("
> >   Maybe HA Linux people will want this....
> > 
> 
> This is  hard goal to achieve, since we do have unreclaimable memory. Guarantees
> would probably imply reservation of resources. Water marks might be a better way
> to do it.
> 
Yes, I recognize this is a hard goal ;)

But this is one of function most of users wants.


> > Do you have others ?
> > 
> 
> I think that should be it (it covers most if not all the documented TODOs we have)
> 
> 

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [3/7] add shmem page to active list.
  2008-07-03  7:43         ` KAMEZAWA Hiroyuki
@ 2008-07-03 22:28           ` Hugh Dickins
  2008-07-04  0:47             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 24+ messages in thread
From: Hugh Dickins @ 2008-07-03 22:28 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, kosaki.motohiro

On Thu, 3 Jul 2008, KAMEZAWA Hiroyuki wrote:
> On Thu, 3 Jul 2008 08:03:17 +0100 (BST)
> Hugh Dickins <hugh@veritas.com> wrote:
> 
> > On Thu, 3 Jul 2008, KAMEZAWA Hiroyuki wrote:
> > > 
> > > BTW, is there a way to see the RSS usage of shmem from /proc or somewhere ?
> > 
> > No, it's just been a (very weirdly backed!) filesystem until these
> > -mm developments.  If you add such stats (for more than temporary
> > debugging), you'll need to use per_cpu counters for it: more global
> > locking or atomic ops on those paths would be sure to upset SGI.
> 
> like zone stat ?

Like that, yes.  I had been going to suggest adding another couple
of stats to that (one for in memory, one for on swap, or heading
to or from swap); but noticed that everything there is an event,
with the comment "Counters should only be incremented", so it
would be an abuse to add shmem page counts there.

> but I think struct address_space->nr_pages is updated and
> shmem's inode has alloced/swapped paremeters.

Per inode, yes.

> 
> It seems alloced == address_space->nr_pages + info->swapped, right ?

That's right (and you'll have read the comment that they can get
out of synch because of undirtied pages getting dropped: I don't
see that as any problem to worry about in your counts).

> 
> I just wanted to ask whether they are exported or not.
> (Or can I get that information by some ioctl ?)

info->swapped is not available outside mm/shmem.c, but I suppose
mapping->nr_pages is available.  But those are not what you want,
are they?  You want totals, not counts per inode.  Totalling them
up over all the inodes in all the tmpfs'es, that could be a big
job; we don't even have a list of all the inodes at present (and
more overhead to link them into and unlink them from that list).

> 
> BTW,  current meminfo is following.
> ==
> [kamezawa@blackonyx test-2.6.26-rc5-mm3++]$ cat /proc/meminfo
> MemTotal:       49471980 kB
> MemFree:        44448528 kB
> Buffers:          472412 kB
> Cached:          3721388 kB
> SwapCached:        22616 kB
> Active:           658480 kB
> Inactive:        3609828 kB
> Active(anon):      14900 kB
> Inactive(anon):    64496 kB
> Active(file):     643580 kB
> Inactive(file):  3545332 kB
> Unevictable:        2020 kB
> Mlocked:            2020 kB
> SwapTotal:       2031608 kB
> SwapFree:        1982656 kB
> Dirty:                60 kB
> Writeback:             0 kB
> AnonPages:         62476 kB
> Mapped:            32092 kB
> Slab:             548584 kB
> SReclaimable:     490284 kB
> SUnreclaim:        58300 kB
> PageTables:        12648 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> ==
> 
> Cached = filesystem + shmem
> Active(anon) = anon-active + shmem-active
> Inactive(anon) = anon-inactive + shmem-inactive
> Active(file) = file cache-active
> Inactive(file) = file cache-inactive.
> 
> Right ?

Yes, I believe so; with the complication that SwapCached
shares pages with Active(anon) and Inactive(anon), but
includes shmem pages not at that moment counted in Cached
or Active(anon) or Inactive(anon), and includes pages which
haven't yet been identified with really-anon or shmem.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [3/7] add shmem page to active list.
  2008-07-03 22:28           ` Hugh Dickins
@ 2008-07-04  0:47             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-04  0:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, kosaki.motohiro

On Thu, 3 Jul 2008 23:28:21 +0100 (BST)
Hugh Dickins <hugh@veritas.com> wrote:

> On Thu, 3 Jul 2008, KAMEZAWA Hiroyuki wrote:
> > On Thu, 3 Jul 2008 08:03:17 +0100 (BST)
> > Hugh Dickins <hugh@veritas.com> wrote:
> > 
> > > On Thu, 3 Jul 2008, KAMEZAWA Hiroyuki wrote:
> > > > 
> > > > BTW, is there a way to see the RSS usage of shmem from /proc or somewhere ?
> > > 
> > > No, it's just been a (very weirdly backed!) filesystem until these
> > > -mm developments.  If you add such stats (for more than temporary
> > > debugging), you'll need to use per_cpu counters for it: more global
> > > locking or atomic ops on those paths would be sure to upset SGI.
> > 
> > like zone stat ?
> 
> Like that, yes.  I had been going to suggest adding another couple
> of stats to that (one for in memory, one for on swap, or heading
> to or from swap); but noticed that everything there is an event,
> with the comment "Counters should only be incremented", so it
> would be an abuse to add shmem page counts there.
> 

Thank you for all your advices. It seems there is no another way rather
than using zone_stat. I'll try it first and see what happens.
(It uses per-cpu counter and update global counter when it goes over threshold.)

BTW, measuring performance of file copy on tmpfs is enough to see overhead ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][-mm] [0/7] misc memcg patch set
  2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
                   ` (10 preceding siblings ...)
  2008-07-03  9:07 ` Balbir Singh
@ 2008-07-11  8:38 ` KAMEZAWA Hiroyuki
  11 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-07-11  8:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, xemul, nishimura, yamamoto, hugh, kosaki.motohiro

On Wed, 2 Jul 2008 21:03:22 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Hi, it seems vmm-related bugs in -mm is reduced to some safe level.
> I restarted my patches for memcg.
> 
> This mail is just for dumping patches on my stack.
> (I'll resend one by one later.)
> 
> based on 2.6.26-rc5-mm3
> + kosaki's fixes + cgroup write_string set + Hugh Dickins's fixes for shmem
> (All patches are in -mm queue.)
> 
> Any comments are welcome (but 7/7 patch is not so neat...)
> 
> [1/7] swapcache handle fix for shmem.
> [2/7] adjust to split-lru: remove PAGE_CGROUP_FLAG_CACHE flag. 
> [3/7] adjust to split-lru: push shmem's page to active list
>       (Imported from Hugh Dickins's work.)
> [4/7] reduce usage at change limit. res_counter part.
> [5/7] reduce usage at change limit. memcg part.
> [6/7] memcg-background-job.           res_coutner part
> [7/7] memcg-background-job            memcg part.
> 
> Balbir, I'd like to import your idea of soft-limit to memcg-background-job
> patch set. (Maybe better than adding hooks to very generic part.)

Hmm, Andrew Morton suggested me that "please do it in user-land..don't make 
the kernel more complex.".

Maybe not difficult..

Current my idea is adding following file.

 - memory.shrink_usage
   echo 20M > memory.shirnk_usage will try to shrink uage to limit-20M.
   If it took too long time, return -EBUSY.

 - memory.exceed_thresh
 - memory.threash_notifier
   notifier is triggered when the usage exceeds memory.excheed_thresh
   (Yes I may be able to rename this to soft_limit_thresh.)

 Writing a daemon or command to kick memory.shrink_usage is not so difficult.

any idea or comments ?

Thanks,
-Kame



> How do you think ?
> 
> Other patches in plan (including other guy's)
> - soft-limit (Balbir works.)
>   I myself think memcg-background-job patches can copperative with this.
> 
> - dirty_ratio for memcg. (haven't written at all)
>   Support dirty_ratio for memcg. This will improve OOM avoidance.
> 
> - swapiness for memcg (had patches..but have to rewrite.)
>   Support swapiness per memcg. (of no use ?)
> 
> - swap_controller (Maybe Nishimura works on.)
>   The world may change after this...cgroup without swap can appears easily.
> 
> - hierarchy (needs more discussion. maybe after OLS?)
>   have some pathes, but not in hurry.
> 
> - more performance improvements (we need some trick.)
>   = Can we remove lock_page_cgroup() ?
>   = Can we reduce spinlocks ?
> 
> - move resource at task move (needs helps from cgroup)
>   We need some magical way. It seems impossible to implement this only by memcg.
> 
> - NUMA statistics (needs helps from cgroup)
>   It seems dynamic file creation feature or some rule to show array of
>   statistics should be defined.
> 
> - memory guarantee (soft-mlock.)
>   guard parameter against global LRU for saying "Don't reclaim from me more ;("
>   Maybe HA Linux people will want this....
> 
> Do you have others ?
> 
> Thanks,
> -Kame
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2008-07-11  8:38 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-02 12:03 [RFC][-mm] [0/7] misc memcg patch set KAMEZAWA Hiroyuki
2008-07-02 12:07 ` [RFC][-mm] [1/7] shmem swapcache fix KAMEZAWA Hiroyuki
2008-07-02 12:08 ` [RFC][-mm] [2/7] delete FLAG_CACHE KAMEZAWA Hiroyuki
2008-07-02 12:10 ` [RFC][-mm] [3/7] add shmem page to active list KAMEZAWA Hiroyuki
2008-07-03  0:11   ` KAMEZAWA Hiroyuki
2008-07-03  4:27     ` KAMEZAWA Hiroyuki
2008-07-03  7:03       ` Hugh Dickins
2008-07-03  7:43         ` KAMEZAWA Hiroyuki
2008-07-03 22:28           ` Hugh Dickins
2008-07-04  0:47             ` KAMEZAWA Hiroyuki
2008-07-02 12:12 ` [RFC][-mm] [4/7] handle limit change KAMEZAWA Hiroyuki
2008-07-02 12:13 ` [RFC][-mm] [5/7] reduce usage at " KAMEZAWA Hiroyuki
2008-07-02 12:15 ` [RFC][-mm] [6/7] res_counter distance to limit KAMEZAWA Hiroyuki
2008-07-02 19:19   ` Paul Menage
2008-07-02 23:57     ` KAMEZAWA Hiroyuki
2008-07-02 12:17 ` [RFC][-mm] [7/7] background job for memcg KAMEZAWA Hiroyuki
2008-07-02 14:23 ` [RFC][-mm] [0/7] misc memcg patch set Daisuke Nishimura
2008-07-02 16:31   ` KOSAKI Motohiro
2008-07-02 23:56   ` KAMEZAWA Hiroyuki
2008-07-03  2:38 ` KAMEZAWA Hiroyuki
2008-07-03  3:06 ` KAMEZAWA Hiroyuki
2008-07-03  9:07 ` Balbir Singh
2008-07-03  9:54   ` KAMEZAWA Hiroyuki
2008-07-11  8:38 ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox