[RFC][BUGFIX][PATCH] memcg: fix shrink

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC][BUGFIX][PATCH] memcg: fix shrink_usage
@ 2009-03-26  4:08 Daisuke Nishimura
  2009-03-26  4:31 ` Daisuke Nishimura
  2009-03-26  5:12 ` KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 13+ messages in thread
From: Daisuke Nishimura @ 2009-03-26  4:08 UTC (permalink / raw)
  To: linux-mm
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Hugh Dickins,
	Daisuke Nishimura

This is another bug I've working on recently.

I want this (and the stale swapcache problem) to be fixed for 2.6.30.

Any comments?

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

Current mem_cgroup_shrink_usage has two problems.

1. It doesn't call mem_cgroup_out_of_memory and doesn't update last_oom_jiffies,
   so pagefault_out_of_memory invokes global OOM.
2. Considering hierarchy, shrinking has to be done from the mem_over_limit,
   not from the memcg where the page to be charged to.

I think these problems can be solved easily by making shrink_usage
call charge and uncharge.
Actually, it is the old behavior before commit c9b0ed51.

Instead of going back to old behavior, this patch:

- adds a new arg to mem_cgroup_try_charge to store mem_over_limit.
- defines new function add_to_page_cache_store_memcg, which behaves
  like add_to_page_cache_locked but uses try_charge_swapin/commit_charge_swapin
  and stores mem_over_limit on failure of try_charge_swapin.
- makes shmem_getpage use add_to_page_cache_store_memcg, and pass
  the mem_over_limit to shrink_usage.
- makes shrink_usage call mem_cgroup_out_of_memory and record_last_oom.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 include/linux/memcontrol.h |   15 ++++++----
 include/linux/pagemap.h    |   13 ++++++++
 mm/filemap.c               |   67 ++++++++++++++++++++++++++++++++++++++++++++
 mm/memcontrol.c            |   62 +++++++++++++++++++++++-----------------
 mm/memory.c                |    2 +-
 mm/shmem.c                 |   22 ++++++--------
 mm/swapfile.c              |    3 +-
 7 files changed, 137 insertions(+), 47 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 18146c9..f926912 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -41,9 +41,12 @@ extern int mem_cgroup_newpage_charge(struct page *page, struct mm_struct *mm,
 				gfp_t gfp_mask);
 /* for swap handling */
 extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
-		struct page *page, gfp_t mask, struct mem_cgroup **ptr);
+		struct page *page, gfp_t mask,
+		struct mem_cgroup **ptr, struct mem_cgroup **fail_ptr);
 extern void mem_cgroup_commit_charge_swapin(struct page *page,
 					struct mem_cgroup *ptr);
+extern void mem_cgroup_commit_cache_charge_swapin(struct page *page,
+					struct mem_cgroup *ptr);
 extern void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *ptr);
 
 extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
@@ -56,8 +59,7 @@ extern void mem_cgroup_move_lists(struct page *page,
 				  enum lru_list from, enum lru_list to);
 extern void mem_cgroup_uncharge_page(struct page *page);
 extern void mem_cgroup_uncharge_cache_page(struct page *page);
-extern int mem_cgroup_shrink_usage(struct page *page,
-			struct mm_struct *mm, gfp_t gfp_mask);
+extern int mem_cgroup_shrink_usage(struct mem_cgroup *mem, gfp_t gfp_mask);
 
 extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
 					struct list_head *dst,
@@ -133,7 +135,8 @@ static inline int mem_cgroup_cache_charge(struct page *page,
 }
 
 static inline int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
-		struct page *page, gfp_t gfp_mask, struct mem_cgroup **ptr)
+			struct page *page, gfp_t gfp_mask,
+			struct mem_cgroup **ptr, struct mem_cgroup **fail_ptr)
 {
 	return 0;
 }
@@ -155,8 +158,8 @@ static inline void mem_cgroup_uncharge_cache_page(struct page *page)
 {
 }
 
-static inline int mem_cgroup_shrink_usage(struct page *page,
-			struct mm_struct *mm, gfp_t gfp_mask)
+static inline
+int mem_cgroup_shrink_usage(struct mem_cgroup *mem, gfp_t gfp_mask)
 {
 	return 0;
 }
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 135028e..715236e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -453,6 +453,19 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size)
 
 int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 				pgoff_t index, gfp_t gfp_mask);
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+int add_to_page_cache_store_memcg(struct page *page,
+			struct address_space *mapping, pgoff_t index,
+			gfp_t gfp_mask, struct mem_cgroup **ptr);
+#else
+static inline int add_to_page_cache_store_memcg(struct page *page,
+			struct address_space *mapping, pgoff_t index,
+			gfp_t gfp_mask, struct mem_cgroup **ptr)
+{
+	*ptr = NULL;
+	return add_to_page_cache_locked(page, mapping, index, gfp_mask);
+}
+#endif
 int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 				pgoff_t index, gfp_t gfp_mask);
 extern void remove_from_page_cache(struct page *page);
diff --git a/mm/filemap.c b/mm/filemap.c
index c41782c..9960250 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -490,6 +490,73 @@ out:
 }
 EXPORT_SYMBOL(add_to_page_cache_locked);
 
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+/**
+ * add_to_page_cache_store_memcg - add a page to the pagecache and store memcg
+ * @page:	page to add
+ * @mapping:	the page's address_space
+ * @offset:	page index
+ * @gfp_mask:	page allocation mode
+ * @ptr:	store mem_cgroup where charge was failed
+ *
+ * This function is used to add a page to the pagecache. It must be locked.
+ * This function does not add the page to the LRU.  The caller must do that.
+ * Unlike add_to_page_cache_locked, this function uses try_charge/commit_charge
+ * scheme and stores memcg where charge was failed.
+ * This function is called only from shmem_getpage, and it passes this memcg
+ * to shrink_usage. Extra refcnt of this memcg is decremented in shrink_usage.
+ */
+int add_to_page_cache_store_memcg(struct page *page,
+				struct address_space *mapping, pgoff_t offset,
+				gfp_t gfp_mask, struct mem_cgroup **ptr)
+{
+	struct mem_cgroup *mem = NULL;
+	struct mem_cgroup *fail = NULL;
+	int error;
+
+	VM_BUG_ON(!PageLocked(page));
+	VM_BUG_ON(!ptr);
+
+	error = mem_cgroup_try_charge_swapin(current->mm, page,
+					gfp_mask & GFP_RECLAIM_MASK,
+					&mem, &fail);
+	if (error) {
+		VM_BUG_ON(error != -ENOMEM);
+		VM_BUG_ON(!fail);
+		*ptr = fail;
+		goto out;
+	}
+
+	*ptr = NULL;
+	error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
+	if (error == 0) {
+		page_cache_get(page);
+		page->mapping = mapping;
+		page->index = offset;
+
+		spin_lock_irq(&mapping->tree_lock);
+		error = radix_tree_insert(&mapping->page_tree, offset, page);
+		if (likely(!error)) {
+			mapping->nrpages++;
+			__inc_zone_page_state(page, NR_FILE_PAGES);
+		} else {
+			page->mapping = NULL;
+			mem_cgroup_cancel_charge_swapin(mem);
+			page_cache_release(page);
+		}
+		spin_unlock_irq(&mapping->tree_lock);
+		radix_tree_preload_end();
+
+		if (likely(!error))
+			mem_cgroup_commit_cache_charge_swapin(page, mem);
+	} else
+		mem_cgroup_cancel_charge_swapin(mem);
+out:
+	return error;
+}
+EXPORT_SYMBOL(add_to_page_cache_store_memcg);
+#endif
+
 int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 				pgoff_t offset, gfp_t gfp_mask)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3492286..7d3078c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -940,7 +940,7 @@ static void record_last_oom(struct mem_cgroup *mem)
  */
 static int __mem_cgroup_try_charge(struct mm_struct *mm,
 			gfp_t gfp_mask, struct mem_cgroup **memcg,
-			bool oom)
+			struct mem_cgroup **fail, bool oom)
 {
 	struct mem_cgroup *mem, *mem_over_limit;
 	int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
@@ -1023,6 +1023,10 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
 	}
 	return 0;
 nomem:
+	if (fail) {
+		css_get(&mem_over_limit->css); /* Callers should call css_put */
+		*fail = mem_over_limit;
+	}
 	css_put(&mem->css);
 	return -ENOMEM;
 }
@@ -1187,7 +1191,7 @@ static int mem_cgroup_move_parent(struct page_cgroup *pc,
 	parent = mem_cgroup_from_cont(pcg);
 
 
-	ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false);
+	ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, NULL, false);
 	if (ret || !parent)
 		return ret;
 
@@ -1244,7 +1248,7 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
 	prefetchw(pc);
 
 	mem = memcg;
-	ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, true);
+	ret = __mem_cgroup_try_charge(mm, gfp_mask, &mem, NULL, true);
 	if (ret || !mem)
 		return ret;
 
@@ -1274,10 +1278,6 @@ int mem_cgroup_newpage_charge(struct page *page,
 				MEM_CGROUP_CHARGE_TYPE_MAPPED, NULL);
 }
 
-static void
-__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
-					enum charge_type ctype);
-
 int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
 				gfp_t gfp_mask)
 {
@@ -1323,10 +1323,10 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
 
 	/* shmem */
 	if (PageSwapCache(page)) {
-		ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &mem);
+		ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask,
+							&mem, NULL);
 		if (!ret)
-			__mem_cgroup_commit_charge_swapin(page, mem,
-					MEM_CGROUP_CHARGE_TYPE_SHMEM);
+			mem_cgroup_commit_cache_charge_swapin(page, mem);
 	} else
 		ret = mem_cgroup_charge_common(page, mm, gfp_mask,
 					MEM_CGROUP_CHARGE_TYPE_SHMEM, mem);
@@ -1341,8 +1341,9 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
  * "commit()" or removed by "cancel()"
  */
 int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
-				 struct page *page,
-				 gfp_t mask, struct mem_cgroup **ptr)
+				 struct page *page, gfp_t mask,
+				 struct mem_cgroup **ptr,
+				 struct mem_cgroup **fail_ptr)
 {
 	struct mem_cgroup *mem;
 	int ret;
@@ -1363,14 +1364,14 @@ int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
 	if (!mem)
 		goto charge_cur_mm;
 	*ptr = mem;
-	ret = __mem_cgroup_try_charge(NULL, mask, ptr, true);
+	ret = __mem_cgroup_try_charge(NULL, mask, ptr, fail_ptr, true);
 	/* drop extra refcnt from tryget */
 	css_put(&mem->css);
 	return ret;
 charge_cur_mm:
 	if (unlikely(!mm))
 		mm = &init_mm;
-	return __mem_cgroup_try_charge(mm, mask, ptr, true);
+	return __mem_cgroup_try_charge(mm, mask, ptr, fail_ptr, true);
 }
 
 static void
@@ -1432,6 +1433,13 @@ void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
 					MEM_CGROUP_CHARGE_TYPE_MAPPED);
 }
 
+void mem_cgroup_commit_cache_charge_swapin(struct page *page,
+					struct mem_cgroup *ptr)
+{
+	__mem_cgroup_commit_charge_swapin(page, ptr,
+					MEM_CGROUP_CHARGE_TYPE_SHMEM);
+}
+
 void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem)
 {
 	if (mem_cgroup_disabled())
@@ -1604,7 +1612,8 @@ int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
 	unlock_page_cgroup(pc);
 
 	if (mem) {
-		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
+		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, NULL,
+						false);
 		css_put(&mem->css);
 	}
 	*ptr = mem;
@@ -1668,22 +1677,15 @@ void mem_cgroup_end_migration(struct mem_cgroup *mem,
  * This is typically used for page reclaiming for shmem for reducing side
  * effect of page allocation from shmem, which is used by some mem_cgroup.
  */
-int mem_cgroup_shrink_usage(struct page *page,
-			    struct mm_struct *mm,
-			    gfp_t gfp_mask)
+int mem_cgroup_shrink_usage(struct mem_cgroup *mem, gfp_t gfp_mask)
 {
-	struct mem_cgroup *mem = NULL;
 	int progress = 0;
 	int retry = MEM_CGROUP_RECLAIM_RETRIES;
 
+	VM_BUG_ON(!mem);
+
 	if (mem_cgroup_disabled())
 		return 0;
-	if (page)
-		mem = try_get_mem_cgroup_from_swapcache(page);
-	if (!mem && mm)
-		mem = try_get_mem_cgroup_from_mm(mm);
-	if (unlikely(!mem))
-		return 0;
 
 	do {
 		progress = mem_cgroup_hierarchical_reclaim(mem,
@@ -1691,9 +1693,15 @@ int mem_cgroup_shrink_usage(struct page *page,
 		progress += mem_cgroup_check_under_limit(mem);
 	} while (!progress && --retry);
 
-	css_put(&mem->css);
-	if (!retry)
+	if (!retry) {
+		mutex_lock(&memcg_tasklist);
+		mem_cgroup_out_of_memory(mem, gfp_mask);
+		mutex_unlock(&memcg_tasklist);
+		record_last_oom(mem);
+		css_put(&mem->css);	/* got when try_charge failed */
 		return -ENOMEM;
+	}
+	css_put(&mem->css);	/* got when try_charge failed */
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 28f1e70..b5fa0c6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2456,7 +2456,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	lock_page(page);
 	delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
 
-	if (mem_cgroup_try_charge_swapin(mm, page, GFP_KERNEL, &ptr)) {
+	if (mem_cgroup_try_charge_swapin(mm, page, GFP_KERNEL, &ptr, NULL)) {
 		ret = VM_FAULT_OOM;
 		unlock_page(page);
 		goto out;
diff --git a/mm/shmem.c b/mm/shmem.c
index a5a30fd..ca7751f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1196,6 +1196,7 @@ static int shmem_getpage(struct inode *inode, unsigned long long idx,
 	struct shmem_sb_info *sbinfo;
 	struct page *filepage = *pagep;
 	struct page *swappage;
+	struct mem_cgroup *mem = NULL;
 	swp_entry_t *entry;
 	swp_entry_t swap;
 	gfp_t gfp;
@@ -1312,8 +1313,8 @@ repeat:
 			SetPageUptodate(filepage);
 			set_page_dirty(filepage);
 			swap_free(swap);
-		} else if (!(error = add_to_page_cache_locked(swappage, mapping,
-					idx, GFP_NOWAIT))) {
+		} else if (!(error = add_to_page_cache_store_memcg(swappage,
+					mapping, idx, GFP_NOWAIT, &mem))) {
 			info->flags |= SHMEM_PAGEIN;
 			shmem_swp_set(info, entry, 0);
 			shmem_swp_unmap(entry);
@@ -1325,19 +1326,16 @@ repeat:
 		} else {
 			shmem_swp_unmap(entry);
 			spin_unlock(&info->lock);
+			unlock_page(swappage);
+			page_cache_release(swappage);
 			if (error == -ENOMEM) {
-				/* allow reclaim from this memory cgroup */
-				error = mem_cgroup_shrink_usage(swappage,
-								current->mm,
-								gfp);
-				if (error) {
-					unlock_page(swappage);
-					page_cache_release(swappage);
+				if (mem)
+					/* reclaim from this memory cgroup */
+					error = mem_cgroup_shrink_usage(mem,
+									gfp);
+				if (error)
 					goto failed;
-				}
 			}
-			unlock_page(swappage);
-			page_cache_release(swappage);
 			goto repeat;
 		}
 	} else if (sgp == SGP_READ && !filepage) {
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 312fafe..3c6d8f6 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -698,7 +698,8 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 	pte_t *pte;
 	int ret = 1;
 
-	if (mem_cgroup_try_charge_swapin(vma->vm_mm, page, GFP_KERNEL, &ptr)) {
+	if (mem_cgroup_try_charge_swapin(vma->vm_mm, page, GFP_KERNEL,
+					&ptr, NULL)) {
 		ret = -ENOMEM;
 		goto out_nolock;
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][BUGFIX][PATCH] memcg: fix shrink_usage
  2009-03-26  4:08 [RFC][BUGFIX][PATCH] memcg: fix shrink_usage Daisuke Nishimura
@ 2009-03-26  4:31 ` Daisuke Nishimura
  2009-03-26  5:12 ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 13+ messages in thread
From: Daisuke Nishimura @ 2009-03-26  4:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Hugh Dickins,
	Daisuke Nishimura

On Thu, 26 Mar 2009 13:08:21 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> This is another bug I've working on recently.
> 
> I want this (and the stale swapcache problem) to be fixed for 2.6.30.
> 
> Any comments?
> 
Ah, you need cache_charge cleanup patch to apply this patch.

  http://marc.info/?l=linux-mm&m=123778534632443&w=2


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][BUGFIX][PATCH] memcg: fix shrink_usage
  2009-03-26  4:08 [RFC][BUGFIX][PATCH] memcg: fix shrink_usage Daisuke Nishimura
  2009-03-26  4:31 ` Daisuke Nishimura
@ 2009-03-26  5:12 ` KAMEZAWA Hiroyuki
  2009-03-26  5:51   ` Daisuke Nishimura
  2009-03-26  6:00   ` [RFC][BUGFIX][PATCH] " Balbir Singh
  1 sibling, 2 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-26  5:12 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: linux-mm, Balbir Singh, Li Zefan, Hugh Dickins

On Thu, 26 Mar 2009 13:08:21 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> This is another bug I've working on recently.
> 
> I want this (and the stale swapcache problem) to be fixed for 2.6.30.
> 
> Any comments?
> 
> ===
> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> Current mem_cgroup_shrink_usage has two problems.
> 
> 1. It doesn't call mem_cgroup_out_of_memory and doesn't update last_oom_jiffies,
>    so pagefault_out_of_memory invokes global OOM.
> 2. Considering hierarchy, shrinking has to be done from the mem_over_limit,
>    not from the memcg where the page to be charged to.
> 

Ah, i see. good cacth. 
But it seems to be the patch is a bit big and includes duplications.
Can't we divide this patch into 2 and reduce modification ?

mem_cgroup_shrink_usage() should do something proper...
My brief thinking is a patch like this, how do you think ?

Maybe renaming this function is appropriate...
==
mem_cgroup_shrink_usage() is called by shmem, but its purpose is
not different from try_charge().

In current behavior, it ignores upward hierarchy and doesn't update
OOM status of memcg. That's bad. We can simply call try_charge()
and drop charge later.

Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Index: test/mm/memcontrol.c
===================================================================
--- test.orig/mm/memcontrol.c
+++ test/mm/memcontrol.c
@@ -1655,16 +1655,16 @@ int mem_cgroup_shrink_usage(struct page 
 	if (unlikely(!mem))
 		return 0;
 
-	do {
-		progress = mem_cgroup_hierarchical_reclaim(mem,
-					gfp_mask, true, false);
-		progress += mem_cgroup_check_under_limit(mem);
-	} while (!progress && --retry);
+	ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, mem, true);
 
+	if (!ret) {
+		css_put(&mem->css); /* refcnt by charge *//
+		res_counter_uncharge(&mem->res, PAGE_SIZE);
+		if (do_swap_account)
+			res_counter_uncharge(&mem->memsw, PAGE_SIZE);
+	}
 	css_put(&mem->css);
-	if (!retry)
-		return -ENOMEM;
-	return 0;
+	return ret;
 }
 
 static DEFINE_MUTEX(set_limit_mutex);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][BUGFIX][PATCH] memcg: fix shrink_usage
  2009-03-26  5:12 ` KAMEZAWA Hiroyuki
@ 2009-03-26  5:51   ` Daisuke Nishimura
  2009-03-26  6:06     ` KAMEZAWA Hiroyuki
  2009-03-26  6:00   ` [RFC][BUGFIX][PATCH] " Balbir Singh
  1 sibling, 1 reply; 13+ messages in thread
From: Daisuke Nishimura @ 2009-03-26  5:51 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: nishimura, linux-mm, Balbir Singh, Li Zefan, Hugh Dickins

On Thu, 26 Mar 2009 14:12:46 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 26 Mar 2009 13:08:21 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > This is another bug I've working on recently.
> > 
> > I want this (and the stale swapcache problem) to be fixed for 2.6.30.
> > 
> > Any comments?
> > 
> > ===
> > From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > 
> > Current mem_cgroup_shrink_usage has two problems.
> > 
> > 1. It doesn't call mem_cgroup_out_of_memory and doesn't update last_oom_jiffies,
> >    so pagefault_out_of_memory invokes global OOM.
> > 2. Considering hierarchy, shrinking has to be done from the mem_over_limit,
> >    not from the memcg where the page to be charged to.
> > 
> 
> Ah, i see. good cacth. 
> But it seems to be the patch is a bit big and includes duplications.
> Can't we divide this patch into 2 and reduce modification ?
> 
Will do if needed.
(returning mem_over_limit part and implementing
add_to_page_cache_store_memcg part, perhaps)

> mem_cgroup_shrink_usage() should do something proper...
> My brief thinking is a patch like this, how do you think ?
> 
I thought the same direction at first.
But it's similar to the old implementation before c9b0ed51 conceptually,
so I chose a new direction.

I withdraw my patch if you prefer this direction :)

> Maybe renaming this function is appropriate...
I think so too if we go in this direction.

Just a few comments below.

> ==
> mem_cgroup_shrink_usage() is called by shmem, but its purpose is
> not different from try_charge().
> 
> In current behavior, it ignores upward hierarchy and doesn't update
> OOM status of memcg. That's bad. We can simply call try_charge()
> and drop charge later.
> 
> Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> Index: test/mm/memcontrol.c
> ===================================================================
> --- test.orig/mm/memcontrol.c
> +++ test/mm/memcontrol.c
> @@ -1655,16 +1655,16 @@ int mem_cgroup_shrink_usage(struct page 
>  	if (unlikely(!mem))
>  		return 0;
>  
> -	do {
> -		progress = mem_cgroup_hierarchical_reclaim(mem,
> -					gfp_mask, true, false);
> -		progress += mem_cgroup_check_under_limit(mem);
> -	} while (!progress && --retry);
> +	ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, mem, true);
>  
I think we should simply call mem_cgroup_try_charge_swapin() w/o doing try_get.

> +	if (!ret) {
> +		css_put(&mem->css); /* refcnt by charge *//
It should be done after res_counter_uncharge().

> +		res_counter_uncharge(&mem->res, PAGE_SIZE);
> +		if (do_swap_account)
> +			res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> +	}
>  	css_put(&mem->css);
This put isn't needed if we don't try_get.

> -	if (!retry)
> -		return -ENOMEM;
> -	return 0;
> +	return ret;
>  }
>  
>  static DEFINE_MUTEX(set_limit_mutex);
> 

Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][BUGFIX][PATCH] memcg: fix shrink_usage
  2009-03-26  5:51   ` Daisuke Nishimura
@ 2009-03-26  6:06     ` KAMEZAWA Hiroyuki
  2009-03-26  6:17       ` Daisuke Nishimura
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-26  6:06 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: linux-mm, Balbir Singh, Li Zefan, Hugh Dickins

On Thu, 26 Mar 2009 14:51:48 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> > Ah, i see. good cacth. 
> > But it seems to be the patch is a bit big and includes duplications.
> > Can't we divide this patch into 2 and reduce modification ?
> > 
> Will do if needed.
> (returning mem_over_limit part and implementing
> add_to_page_cache_store_memcg part, perhaps)
> 
> > mem_cgroup_shrink_usage() should do something proper...
> > My brief thinking is a patch like this, how do you think ?
> > 
> I thought the same direction at first.
> But it's similar to the old implementation before c9b0ed51 conceptually,
> so I chose a new direction.
> 
> I withdraw my patch if you prefer this direction :)
> 
Ah, my basic plan is.
  - BUGFIX should be simple.
  - If clean up is necessary, it should be on other patch.

I have no objections to make memcg cleaner.


> > Maybe renaming this function is appropriate...
> I think so too if we go in this direction.
> 
> Just a few comments below.
> 
Thanks,

> > ==
> > mem_cgroup_shrink_usage() is called by shmem, but its purpose is
> > not different from try_charge().
> > 
> > In current behavior, it ignores upward hierarchy and doesn't update
> > OOM status of memcg. That's bad. We can simply call try_charge()
> > and drop charge later.
> > 
> > Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/memcontrol.c |   16 ++++++++--------
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> > 
> > Index: test/mm/memcontrol.c
> > ===================================================================
> > --- test.orig/mm/memcontrol.c
> > +++ test/mm/memcontrol.c
> > @@ -1655,16 +1655,16 @@ int mem_cgroup_shrink_usage(struct page 
> >  	if (unlikely(!mem))
> >  		return 0;
> >  
> > -	do {
> > -		progress = mem_cgroup_hierarchical_reclaim(mem,
> > -					gfp_mask, true, false);
> > -		progress += mem_cgroup_check_under_limit(mem);
> > -	} while (!progress && --retry);
> > +	ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, mem, true);
> >  
> I think we should simply call mem_cgroup_try_charge_swapin() w/o doing try_get.
> 
Hmm, ok. Let me see again.


> > +	if (!ret) {
> > +		css_put(&mem->css); /* refcnt by charge *//
> It should be done after res_counter_uncharge().
> 
yes.

> > +		res_counter_uncharge(&mem->res, PAGE_SIZE);
> > +		if (do_swap_account)
> > +			res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > +	}
> >  	css_put(&mem->css);
> This put isn't needed if we don't try_get.
> 
In shrink_usage() (not in this patch), we called try_get(), I think.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][BUGFIX][PATCH] memcg: fix shrink_usage
  2009-03-26  6:06     ` KAMEZAWA Hiroyuki
@ 2009-03-26  6:17       ` Daisuke Nishimura
  2009-03-26  6:27         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Daisuke Nishimura @ 2009-03-26  6:17 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: nishimura, linux-mm, Balbir Singh, Li Zefan, Hugh Dickins

> > > @@ -1655,16 +1655,16 @@ int mem_cgroup_shrink_usage(struct page 
> > >  	if (unlikely(!mem))
> > >  		return 0;
> > >  
> > > -	do {
> > > -		progress = mem_cgroup_hierarchical_reclaim(mem,
> > > -					gfp_mask, true, false);
> > > -		progress += mem_cgroup_check_under_limit(mem);
> > > -	} while (!progress && --retry);
> > > +	ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, mem, true);
> > >  
> > I think we should simply call mem_cgroup_try_charge_swapin() w/o doing try_get.
> > 
> Hmm, ok. Let me see again.
> 
> 
> > > +	if (!ret) {
> > > +		css_put(&mem->css); /* refcnt by charge *//
> > It should be done after res_counter_uncharge().
> > 
> yes.
> 
> > > +		res_counter_uncharge(&mem->res, PAGE_SIZE);
> > > +		if (do_swap_account)
> > > +			res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > > +	}
> > >  	css_put(&mem->css);
> > This put isn't needed if we don't try_get.
> > 
> In shrink_usage() (not in this patch), we called try_get(), I think.
> 
I meant that if we changed above part to try_charge_swapin w/o calling try_get
in shrink_usage, we don't need this put :)

And I think we can use cancel_charge_swapin at uncharge part.

So, mem_cgroup_shrink_usage would be like this ?

===
int mem_cgroup_shrink_usage(struct page *page,
                            struct mm_struct *mm,
                            gfp_t gfp_mask)
{
        struct mem_cgroup *mem = NULL;
        int ret;

        ret = mem_cgroup_try_charge_swapin(mm, page, mask, &ptr);
        if (!ret && ptr)
                mem_cgroup_cancel_charge_swapin(ptr);

        return ret;
}
===


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][BUGFIX][PATCH] memcg: fix shrink_usage
  2009-03-26  6:17       ` Daisuke Nishimura
@ 2009-03-26  6:27         ` KAMEZAWA Hiroyuki
  2009-03-26  6:38           ` Daisuke Nishimura
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-26  6:27 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: linux-mm, Balbir Singh, Li Zefan, Hugh Dickins

On Thu, 26 Mar 2009 15:17:33 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> > > > @@ -1655,16 +1655,16 @@ int mem_cgroup_shrink_usage(struct page 
> > > >  	if (unlikely(!mem))
> > > >  		return 0;
> > > >  
> > > > -	do {
> > > > -		progress = mem_cgroup_hierarchical_reclaim(mem,
> > > > -					gfp_mask, true, false);
> > > > -		progress += mem_cgroup_check_under_limit(mem);
> > > > -	} while (!progress && --retry);
> > > > +	ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, mem, true);
> > > >  
> > > I think we should simply call mem_cgroup_try_charge_swapin() w/o doing try_get.
> > > 
> > Hmm, ok. Let me see again.
> > 
> > 
> > > > +	if (!ret) {
> > > > +		css_put(&mem->css); /* refcnt by charge *//
> > > It should be done after res_counter_uncharge().
> > > 
> > yes.
> > 
> > > > +		res_counter_uncharge(&mem->res, PAGE_SIZE);
> > > > +		if (do_swap_account)
> > > > +			res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > > > +	}
> > > >  	css_put(&mem->css);
> > > This put isn't needed if we don't try_get.
> > > 
> > In shrink_usage() (not in this patch), we called try_get(), I think.
> > 
> I meant that if we changed above part to try_charge_swapin w/o calling try_get
> in shrink_usage, we don't need this put :)
> 
> And I think we can use cancel_charge_swapin at uncharge part.
> 
> So, mem_cgroup_shrink_usage would be like this ?
> 
> ===
> int mem_cgroup_shrink_usage(struct page *page,
>                             struct mm_struct *mm,
>                             gfp_t gfp_mask)
> {
>         struct mem_cgroup *mem = NULL;
>         int ret;
> 
>         ret = mem_cgroup_try_charge_swapin(mm, page, mask, &ptr);
>         if (!ret && ptr)
>                 mem_cgroup_cancel_charge_swapin(ptr);
> 
>         return ret;
> }

Seems very simple. hmm, I'm thinking of following.
==
int mem_cgroup_shmem_charge_fallback(struct page *page, struct mm_struct *mm, gfp_t mask)
{
	return mem_cgroup_cache_charge(mm, page, mask);
}
==

But I'm afraid that this adds another corner case to account the page not under
radix-tree. (But this is SwapCache...then...this will work.)

Could you write a patch in this direction ? (or I'll write by myself.)
It's obvious that you do better test.

Thanks,
-Kame






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][BUGFIX][PATCH] memcg: fix shrink_usage
  2009-03-26  6:27         ` KAMEZAWA Hiroyuki
@ 2009-03-26  6:38           ` Daisuke Nishimura
  2009-03-26  6:50             ` KAMEZAWA Hiroyuki
  2009-03-27  1:39             ` [BUGFIX][PATCH] " Daisuke Nishimura
  0 siblings, 2 replies; 13+ messages in thread
From: Daisuke Nishimura @ 2009-03-26  6:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: nishimura, linux-mm, Balbir Singh, Li Zefan, Hugh Dickins

On Thu, 26 Mar 2009 15:27:34 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 26 Mar 2009 15:17:33 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > > > > @@ -1655,16 +1655,16 @@ int mem_cgroup_shrink_usage(struct page 
> > > > >  	if (unlikely(!mem))
> > > > >  		return 0;
> > > > >  
> > > > > -	do {
> > > > > -		progress = mem_cgroup_hierarchical_reclaim(mem,
> > > > > -					gfp_mask, true, false);
> > > > > -		progress += mem_cgroup_check_under_limit(mem);
> > > > > -	} while (!progress && --retry);
> > > > > +	ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, mem, true);
> > > > >  
> > > > I think we should simply call mem_cgroup_try_charge_swapin() w/o doing try_get.
> > > > 
> > > Hmm, ok. Let me see again.
> > > 
> > > 
> > > > > +	if (!ret) {
> > > > > +		css_put(&mem->css); /* refcnt by charge *//
> > > > It should be done after res_counter_uncharge().
> > > > 
> > > yes.
> > > 
> > > > > +		res_counter_uncharge(&mem->res, PAGE_SIZE);
> > > > > +		if (do_swap_account)
> > > > > +			res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > > > > +	}
> > > > >  	css_put(&mem->css);
> > > > This put isn't needed if we don't try_get.
> > > > 
> > > In shrink_usage() (not in this patch), we called try_get(), I think.
> > > 
> > I meant that if we changed above part to try_charge_swapin w/o calling try_get
> > in shrink_usage, we don't need this put :)
> > 
> > And I think we can use cancel_charge_swapin at uncharge part.
> > 
> > So, mem_cgroup_shrink_usage would be like this ?
> > 
> > ===
> > int mem_cgroup_shrink_usage(struct page *page,
> >                             struct mm_struct *mm,
> >                             gfp_t gfp_mask)
> > {
> >         struct mem_cgroup *mem = NULL;
> >         int ret;
> > 
> >         ret = mem_cgroup_try_charge_swapin(mm, page, mask, &ptr);
> >         if (!ret && ptr)
> >                 mem_cgroup_cancel_charge_swapin(ptr);
> > 
> >         return ret;
> > }
> 
> Seems very simple. hmm, I'm thinking of following.
> ==
> int mem_cgroup_shmem_charge_fallback(struct page *page, struct mm_struct *mm, gfp_t mask)
> {
> 	return mem_cgroup_cache_charge(mm, page, mask);
> }
> ==
> 
> But I'm afraid that this adds another corner case to account the page not under
> radix-tree. (But this is SwapCache...then...this will work.)
> 
> Could you write a patch in this direction ? (or I'll write by myself.)
> It's obvious that you do better test.
> 
Okey.

I'll make a patch and repost it after doing some tests for review.

BTW, do you have any good idea about the new name of shrink_usage ?


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][BUGFIX][PATCH] memcg: fix shrink_usage
  2009-03-26  6:38           ` Daisuke Nishimura
@ 2009-03-26  6:50             ` KAMEZAWA Hiroyuki
  2009-03-27  1:39             ` [BUGFIX][PATCH] " Daisuke Nishimura
  1 sibling, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-26  6:50 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: linux-mm, Balbir Singh, Li Zefan, Hugh Dickins

On Thu, 26 Mar 2009 15:38:03 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:


> > Seems very simple. hmm, I'm thinking of following.
> > ==
> > int mem_cgroup_shmem_charge_fallback(struct page *page, struct mm_struct *mm, gfp_t mask)
> > {
> > 	return mem_cgroup_cache_charge(mm, page, mask);
> > }
> > ==
> > 
> > But I'm afraid that this adds another corner case to account the page not under
> > radix-tree. (But this is SwapCache...then...this will work.)
> > 
> > Could you write a patch in this direction ? (or I'll write by myself.)
> > It's obvious that you do better test.
> > 
> Okey.
> 
> I'll make a patch and repost it after doing some tests for review.
> 
> BTW, do you have any good idea about the new name of shrink_usage ?
> 
See above ;) mem_cgroup_shmem_charge_fallback() seems to be straightforward.
I'm glad if you rewrite the comment to function at the same time :)

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [BUGFIX][PATCH] memcg: fix shrink_usage
  2009-03-26  6:38           ` Daisuke Nishimura
  2009-03-26  6:50             ` KAMEZAWA Hiroyuki
@ 2009-03-27  1:39             ` Daisuke Nishimura
  2009-03-27  1:51               ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 13+ messages in thread
From: Daisuke Nishimura @ 2009-03-27  1:39 UTC (permalink / raw)
  To: linux-mm
  Cc: KAMEZAWA Hiroyuki, Balbir Singh, Li Zefan, Hugh Dickins,
	Daisuke Nishimura

> > Could you write a patch in this direction ? (or I'll write by myself.)
> > It's obvious that you do better test.
> > 
> Okey.
> 
> I'll make a patch and repost it after doing some tests for review.
> 
This is the updated one.
I've confirmed that this can prevent an invalid OOM.

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

Current mem_cgroup_shrink_usage has two problems.

1. It doesn't call mem_cgroup_out_of_memory and doesn't update last_oom_jiffies,
   so pagefault_out_of_memory invokes global OOM.
2. Considering hierarchy, shrinking has to be done from the mem_over_limit,
   not from the memcg which the page would be charged to.

mem_cgroup_try_charge_swapin does all of these works properly,
so we use it and call cancel_charge_swapin when it succeeded.

The name of "shrink_usage" is not appropriate for this purpose,
so we change it too.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 include/linux/memcontrol.h |    4 ++--
 mm/memcontrol.c            |   33 ++++++++++++---------------------
 mm/shmem.c                 |    8 ++++++--
 3 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 18146c9..928b714 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -56,7 +56,7 @@ extern void mem_cgroup_move_lists(struct page *page,
 				  enum lru_list from, enum lru_list to);
 extern void mem_cgroup_uncharge_page(struct page *page);
 extern void mem_cgroup_uncharge_cache_page(struct page *page);
-extern int mem_cgroup_shrink_usage(struct page *page,
+extern int mem_cgroup_shmem_charge_fallback(struct page *page,
 			struct mm_struct *mm, gfp_t gfp_mask);
 
 extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
@@ -155,7 +155,7 @@ static inline void mem_cgroup_uncharge_cache_page(struct page *page)
 {
 }
 
-static inline int mem_cgroup_shrink_usage(struct page *page,
+static inline int mem_cgroup_shmem_charge_fallback(struct page *page,
 			struct mm_struct *mm, gfp_t gfp_mask)
 {
 	return 0;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3492286..3b88e7f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1664,37 +1664,28 @@ void mem_cgroup_end_migration(struct mem_cgroup *mem,
 }
 
 /*
- * A call to try to shrink memory usage under specified resource controller.
- * This is typically used for page reclaiming for shmem for reducing side
- * effect of page allocation from shmem, which is used by some mem_cgroup.
+ * A call to try to shrink memory usage on charge failure at shmem's swapin.
+ * Calling hierarchical_reclaim is not enough because we should update
+ * last_oom_jiffies to prevent pagefault_out_of_memory from invoking global OOM.
+ * Moreover considering hierarchy, we should reclaim from the mem_over_limit,
+ * not from the memcg which this page would be charged to.
+ * try_charge_swapin does all of these works properly.
  */
-int mem_cgroup_shrink_usage(struct page *page,
+int mem_cgroup_shmem_charge_fallback(struct page *page,
 			    struct mm_struct *mm,
 			    gfp_t gfp_mask)
 {
 	struct mem_cgroup *mem = NULL;
-	int progress = 0;
-	int retry = MEM_CGROUP_RECLAIM_RETRIES;
+	int ret;
 
 	if (mem_cgroup_disabled())
 		return 0;
-	if (page)
-		mem = try_get_mem_cgroup_from_swapcache(page);
-	if (!mem && mm)
-		mem = try_get_mem_cgroup_from_mm(mm);
-	if (unlikely(!mem))
-		return 0;
 
-	do {
-		progress = mem_cgroup_hierarchical_reclaim(mem,
-					gfp_mask, true, false);
-		progress += mem_cgroup_check_under_limit(mem);
-	} while (!progress && --retry);
+	ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &mem);
+	if (!ret)
+		mem_cgroup_cancel_charge_swapin(mem); /* it does !mem check */
 
-	css_put(&mem->css);
-	if (!retry)
-		return -ENOMEM;
-	return 0;
+	return ret;
 }
 
 static DEFINE_MUTEX(set_limit_mutex);
diff --git a/mm/shmem.c b/mm/shmem.c
index a5a30fd..ce99098 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1326,8 +1326,12 @@ repeat:
 			shmem_swp_unmap(entry);
 			spin_unlock(&info->lock);
 			if (error == -ENOMEM) {
-				/* allow reclaim from this memory cgroup */
-				error = mem_cgroup_shrink_usage(swappage,
+				/*
+				 * reclaim from proper memory cgroup and
+				 * call memcg's OOM if needed.
+				 */
+				error = mem_cgroup_shmem_charge_fallback(
+								swappage,
 								current->mm,
 								gfp);
 				if (error) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUGFIX][PATCH] memcg: fix shrink_usage
  2009-03-27  1:39             ` [BUGFIX][PATCH] " Daisuke Nishimura
@ 2009-03-27  1:51               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-27  1:51 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: linux-mm, Balbir Singh, Li Zefan, Hugh Dickins

On Fri, 27 Mar 2009 10:39:11 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> > > Could you write a patch in this direction ? (or I'll write by myself.)
> > > It's obvious that you do better test.
> > > 
> > Okey.
> > 
> > I'll make a patch and repost it after doing some tests for review.
> > 
> This is the updated one.
> I've confirmed that this can prevent an invalid OOM.
> 
> ===
> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> Current mem_cgroup_shrink_usage has two problems.
> 
> 1. It doesn't call mem_cgroup_out_of_memory and doesn't update last_oom_jiffies,
>    so pagefault_out_of_memory invokes global OOM.
> 2. Considering hierarchy, shrinking has to be done from the mem_over_limit,
>    not from the memcg which the page would be charged to.
> 
> mem_cgroup_try_charge_swapin does all of these works properly,
> so we use it and call cancel_charge_swapin when it succeeded.
> 
> The name of "shrink_usage" is not appropriate for this purpose,
> so we change it too.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

Thank you!
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

> ---
>  include/linux/memcontrol.h |    4 ++--
>  mm/memcontrol.c            |   33 ++++++++++++---------------------
>  mm/shmem.c                 |    8 ++++++--
>  3 files changed, 20 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 18146c9..928b714 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -56,7 +56,7 @@ extern void mem_cgroup_move_lists(struct page *page,
>  				  enum lru_list from, enum lru_list to);
>  extern void mem_cgroup_uncharge_page(struct page *page);
>  extern void mem_cgroup_uncharge_cache_page(struct page *page);
> -extern int mem_cgroup_shrink_usage(struct page *page,
> +extern int mem_cgroup_shmem_charge_fallback(struct page *page,
>  			struct mm_struct *mm, gfp_t gfp_mask);
>  
>  extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> @@ -155,7 +155,7 @@ static inline void mem_cgroup_uncharge_cache_page(struct page *page)
>  {
>  }
>  
> -static inline int mem_cgroup_shrink_usage(struct page *page,
> +static inline int mem_cgroup_shmem_charge_fallback(struct page *page,
>  			struct mm_struct *mm, gfp_t gfp_mask)
>  {
>  	return 0;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3492286..3b88e7f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1664,37 +1664,28 @@ void mem_cgroup_end_migration(struct mem_cgroup *mem,
>  }
>  
>  /*
> - * A call to try to shrink memory usage under specified resource controller.
> - * This is typically used for page reclaiming for shmem for reducing side
> - * effect of page allocation from shmem, which is used by some mem_cgroup.
> + * A call to try to shrink memory usage on charge failure at shmem's swapin.
> + * Calling hierarchical_reclaim is not enough because we should update
> + * last_oom_jiffies to prevent pagefault_out_of_memory from invoking global OOM.
> + * Moreover considering hierarchy, we should reclaim from the mem_over_limit,
> + * not from the memcg which this page would be charged to.
> + * try_charge_swapin does all of these works properly.
>   */
> -int mem_cgroup_shrink_usage(struct page *page,
> +int mem_cgroup_shmem_charge_fallback(struct page *page,
>  			    struct mm_struct *mm,
>  			    gfp_t gfp_mask)
>  {
>  	struct mem_cgroup *mem = NULL;
> -	int progress = 0;
> -	int retry = MEM_CGROUP_RECLAIM_RETRIES;
> +	int ret;
>  
>  	if (mem_cgroup_disabled())
>  		return 0;
> -	if (page)
> -		mem = try_get_mem_cgroup_from_swapcache(page);
> -	if (!mem && mm)
> -		mem = try_get_mem_cgroup_from_mm(mm);
> -	if (unlikely(!mem))
> -		return 0;
>  
> -	do {
> -		progress = mem_cgroup_hierarchical_reclaim(mem,
> -					gfp_mask, true, false);
> -		progress += mem_cgroup_check_under_limit(mem);
> -	} while (!progress && --retry);
> +	ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &mem);
> +	if (!ret)
> +		mem_cgroup_cancel_charge_swapin(mem); /* it does !mem check */
>  
> -	css_put(&mem->css);
> -	if (!retry)
> -		return -ENOMEM;
> -	return 0;
> +	return ret;
>  }
>  
>  static DEFINE_MUTEX(set_limit_mutex);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a5a30fd..ce99098 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1326,8 +1326,12 @@ repeat:
>  			shmem_swp_unmap(entry);
>  			spin_unlock(&info->lock);
>  			if (error == -ENOMEM) {
> -				/* allow reclaim from this memory cgroup */
> -				error = mem_cgroup_shrink_usage(swappage,
> +				/*
> +				 * reclaim from proper memory cgroup and
> +				 * call memcg's OOM if needed.
> +				 */
> +				error = mem_cgroup_shmem_charge_fallback(
> +								swappage,
>  								current->mm,
>  								gfp);
>  				if (error) {
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][BUGFIX][PATCH] memcg: fix shrink_usage
  2009-03-26  5:12 ` KAMEZAWA Hiroyuki
  2009-03-26  5:51   ` Daisuke Nishimura
@ 2009-03-26  6:00   ` Balbir Singh
  2009-03-26  6:03     ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 13+ messages in thread
From: Balbir Singh @ 2009-03-26  6:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Daisuke Nishimura, linux-mm, Li Zefan, Hugh Dickins

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-26 14:12:46]:

> On Thu, 26 Mar 2009 13:08:21 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > This is another bug I've working on recently.
> > 
> > I want this (and the stale swapcache problem) to be fixed for 2.6.30.
> > 
> > Any comments?
> > 
> > ===
> > From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > 
> > Current mem_cgroup_shrink_usage has two problems.
> > 
> > 1. It doesn't call mem_cgroup_out_of_memory and doesn't update last_oom_jiffies,
> >    so pagefault_out_of_memory invokes global OOM.
> > 2. Considering hierarchy, shrinking has to be done from the mem_over_limit,
> >    not from the memcg where the page to be charged to.
> > 
> 
> Ah, i see. good cacth. 
> But it seems to be the patch is a bit big and includes duplications.
> Can't we divide this patch into 2 and reduce modification ?
> 
> mem_cgroup_shrink_usage() should do something proper...
> My brief thinking is a patch like this, how do you think ?
> 
> Maybe renaming this function is appropriate...
> ==
> mem_cgroup_shrink_usage() is called by shmem, but its purpose is
> not different from try_charge().
> 
> In current behavior, it ignores upward hierarchy and doesn't update
> OOM status of memcg. That's bad. We can simply call try_charge()
> and drop charge later.
>

This seems much better than the original patch from Daisuke, which
added too much code and changes, hard to review for correctness and
changes outside of memcontrol.c make it more risky.
 
> Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> Index: test/mm/memcontrol.c
> ===================================================================
> --- test.orig/mm/memcontrol.c
> +++ test/mm/memcontrol.c
> @@ -1655,16 +1655,16 @@ int mem_cgroup_shrink_usage(struct page 
>  	if (unlikely(!mem))
>  		return 0;
> 
> -	do {
> -		progress = mem_cgroup_hierarchical_reclaim(mem,
> -					gfp_mask, true, false);
> -		progress += mem_cgroup_check_under_limit(mem);
> -	} while (!progress && --retry);
> +	ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, mem, true);
>

Could you please add a comment/changelog to indicate why we try to
charge when we want to shrink? Is the limit setup so that a try_charge
will cause reclaim, BTW?
 
> +	if (!ret) {
> +		css_put(&mem->css); /* refcnt by charge *//

Does this compile?

> +		res_counter_uncharge(&mem->res, PAGE_SIZE);
> +		if (do_swap_account)
> +			res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> +	}
>  	css_put(&mem->css);
> -	if (!retry)
> -		return -ENOMEM;
> -	return 0;
> +	return ret;
>  }
> 
>  static DEFINE_MUTEX(set_limit_mutex);
> 
> 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][BUGFIX][PATCH] memcg: fix shrink_usage
  2009-03-26  6:00   ` [RFC][BUGFIX][PATCH] " Balbir Singh
@ 2009-03-26  6:03     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-26  6:03 UTC (permalink / raw)
  To: balbir; +Cc: Daisuke Nishimura, linux-mm, Li Zefan, Hugh Dickins

On Thu, 26 Mar 2009 11:30:28 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-26 14:12:46]:
> 
> > On Thu, 26 Mar 2009 13:08:21 +0900
> > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > 
> > > This is another bug I've working on recently.
> > > 
> > > I want this (and the stale swapcache problem) to be fixed for 2.6.30.
> > > 
> > > Any comments?
> > > 
> > > ===
> > > From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > > 
> > > Current mem_cgroup_shrink_usage has two problems.
> > > 
> > > 1. It doesn't call mem_cgroup_out_of_memory and doesn't update last_oom_jiffies,
> > >    so pagefault_out_of_memory invokes global OOM.
> > > 2. Considering hierarchy, shrinking has to be done from the mem_over_limit,
> > >    not from the memcg where the page to be charged to.
> > > 
> > 
> > Ah, i see. good cacth. 
> > But it seems to be the patch is a bit big and includes duplications.
> > Can't we divide this patch into 2 and reduce modification ?
> > 
> > mem_cgroup_shrink_usage() should do something proper...
> > My brief thinking is a patch like this, how do you think ?
> > 
> > Maybe renaming this function is appropriate...
> > ==
> > mem_cgroup_shrink_usage() is called by shmem, but its purpose is
> > not different from try_charge().
> > 
> > In current behavior, it ignores upward hierarchy and doesn't update
> > OOM status of memcg. That's bad. We can simply call try_charge()
> > and drop charge later.
> >
> 
> This seems much better than the original patch from Daisuke, which
> added too much code and changes, hard to review for correctness and
> changes outside of memcontrol.c make it more risky.
>  
> > Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/memcontrol.c |   16 ++++++++--------
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> > 
> > Index: test/mm/memcontrol.c
> > ===================================================================
> > --- test.orig/mm/memcontrol.c
> > +++ test/mm/memcontrol.c
> > @@ -1655,16 +1655,16 @@ int mem_cgroup_shrink_usage(struct page 
> >  	if (unlikely(!mem))
> >  		return 0;
> > 
> > -	do {
> > -		progress = mem_cgroup_hierarchical_reclaim(mem,
> > -					gfp_mask, true, false);
> > -		progress += mem_cgroup_check_under_limit(mem);
> > -	} while (!progress && --retry);
> > +	ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, mem, true);
> >
> 
> Could you please add a comment/changelog to indicate why we try to
> charge when we want to shrink? Is the limit setup so that a try_charge
> will cause reclaim, BTW?
>  
I'll rename this function and add more proper comments.

this shrink_usage() is called only when add_to_page_cache(GFP_NOWAIT) in shmem.c
fails. Then, it's enough to charge this page again with GFP_KERNEL.



> > +	if (!ret) {
> > +		css_put(&mem->css); /* refcnt by charge *//
> 
> Does this compile?
> 
refresh-miss ;) but this is just an idea level patch.

Bye,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-03-27  1:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-26  4:08 [RFC][BUGFIX][PATCH] memcg: fix shrink_usage Daisuke Nishimura
2009-03-26  4:31 ` Daisuke Nishimura
2009-03-26  5:12 ` KAMEZAWA Hiroyuki
2009-03-26  5:51   ` Daisuke Nishimura
2009-03-26  6:06     ` KAMEZAWA Hiroyuki
2009-03-26  6:17       ` Daisuke Nishimura
2009-03-26  6:27         ` KAMEZAWA Hiroyuki
2009-03-26  6:38           ` Daisuke Nishimura
2009-03-26  6:50             ` KAMEZAWA Hiroyuki
2009-03-27  1:39             ` [BUGFIX][PATCH] " Daisuke Nishimura
2009-03-27  1:51               ` KAMEZAWA Hiroyuki
2009-03-26  6:00   ` [RFC][BUGFIX][PATCH] " Balbir Singh
2009-03-26  6:03     ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox