[PATCH 4/7] memcg : fix charge function of THP allocation.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Subject: [PATCH 4/7] memcg : fix charge function of THP allocation.
Date: Fri, 21 Jan 2011 15:44:30 +0900	[thread overview]
Message-ID: <20110121154430.70d45f15.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20110121153431.191134dd.kamezawa.hiroyu@jp.fujitsu.com>

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

When THP is used, Hugepage size charge can happen. It's not handled
correctly in mem_cgroup_do_charge(). For example, THP can fallback
to small page allocation when HUGEPAGE allocation seems difficult
or busy, but memory cgroup doesn't understand it and continue to
try HUGEPAGE charging. And the worst thing is memory cgroup
believes 'memory reclaim succeeded' if limit - usage > PAGE_SIZE.

By this, khugepaged etc...can goes into inifinite reclaim loop
if tasks in memcg are busy.

After this patch 
 - Hugepage allocation will fail if 1st trial of page reclaim fails.
 - distinguish THP allocaton from Bached allocation. 

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   51 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 35 insertions(+), 16 deletions(-)

Index: mmotm-0107/mm/memcontrol.c
===================================================================
--- mmotm-0107.orig/mm/memcontrol.c
+++ mmotm-0107/mm/memcontrol.c
@@ -1812,24 +1812,25 @@ enum {
 	CHARGE_OK,		/* success */
 	CHARGE_RETRY,		/* need to retry but retry is not bad */
 	CHARGE_NOMEM,		/* we can't do more. return -ENOMEM */
+	CHARGE_NEED_BREAK,	/* big size allocation failure */
 	CHARGE_WOULDBLOCK,	/* GFP_WAIT wasn't set and no enough res. */
 	CHARGE_OOM_DIE,		/* the current is killed because of OOM */
 };
 
 static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
-				int csize, bool oom_check)
+			int page_size, bool do_reclaim, bool oom_check)
 {
 	struct mem_cgroup *mem_over_limit;
 	struct res_counter *fail_res;
 	unsigned long flags = 0;
 	int ret;
 
-	ret = res_counter_charge(&mem->res, csize, &fail_res);
+	ret = res_counter_charge(&mem->res, page_size, &fail_res);
 
 	if (likely(!ret)) {
 		if (!do_swap_account)
 			return CHARGE_OK;
-		ret = res_counter_charge(&mem->memsw, csize, &fail_res);
+		ret = res_counter_charge(&mem->memsw, page_size, &fail_res);
 		if (likely(!ret))
 			return CHARGE_OK;
 
@@ -1838,14 +1839,14 @@ static int __mem_cgroup_do_charge(struct
 	} else
 		mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
 
-	if (csize > PAGE_SIZE) /* change csize and retry */
+	if (!do_reclaim)
 		return CHARGE_RETRY;
 
 	if (!(gfp_mask & __GFP_WAIT))
 		return CHARGE_WOULDBLOCK;
 
 	ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, NULL,
-					gfp_mask, flags, csize);
+					gfp_mask, flags, page_size);
 	/*
 	 * try_to_free_mem_cgroup_pages() might not give us a full
 	 * picture of reclaim. Some pages are reclaimed and might be
@@ -1853,19 +1854,28 @@ static int __mem_cgroup_do_charge(struct
 	 * Check the limit again to see if the reclaim reduced the
 	 * current usage of the cgroup before giving up
 	 */
-	if (ret || mem_cgroup_check_under_limit(mem_over_limit, csize))
+	if (ret || mem_cgroup_check_under_limit(mem_over_limit, page_size))
 		return CHARGE_RETRY;
 
 	/*
+	 * When page_size > PAGE_SIZE, THP calls this function and it's
+	 * ok to tell 'there are not enough pages for hugepage'. THP will
+	 * fallback into PAGE_SIZE allocation. If we do reclaim eagerly,
+	 * page splitting will occur and it seems much worse.
+	 */
+	if (page_size > PAGE_SIZE)
+		return CHARGE_NEED_BREAK;
+
+	/*
 	 * At task move, charge accounts can be doubly counted. So, it's
 	 * better to wait until the end of task_move if something is going on.
 	 */
 	if (mem_cgroup_wait_acct_move(mem_over_limit))
 		return CHARGE_RETRY;
-
 	/* If we don't need to call oom-killer at el, return immediately */
 	if (!oom_check)
 		return CHARGE_NOMEM;
+
 	/* check OOM */
 	if (!mem_cgroup_handle_oom(mem_over_limit, gfp_mask))
 		return CHARGE_OOM_DIE;
@@ -1885,7 +1895,7 @@ static int __mem_cgroup_try_charge(struc
 	int nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
 	struct mem_cgroup *mem = NULL;
 	int ret;
-	int csize = max(CHARGE_SIZE, (unsigned long) page_size);
+	bool use_pcp_cache = (page_size == PAGE_SIZE);
 
 	/*
 	 * Unlike gloval-vm's OOM-kill, we're not in memory shortage
@@ -1910,7 +1920,7 @@ again:
 		VM_BUG_ON(css_is_removed(&mem->css));
 		if (mem_cgroup_is_root(mem))
 			goto done;
-		if (page_size == PAGE_SIZE && consume_stock(mem))
+		if (use_pcp_cache && consume_stock(mem))
 			goto done;
 		css_get(&mem->css);
 	} else {
@@ -1933,7 +1943,7 @@ again:
 			rcu_read_unlock();
 			goto done;
 		}
-		if (page_size == PAGE_SIZE && consume_stock(mem)) {
+		if (use_pcp_cache && consume_stock(mem)) {
 			/*
 			 * It seems dagerous to access memcg without css_get().
 			 * But considering how consume_stok works, it's not
@@ -1967,17 +1977,26 @@ again:
 			oom_check = true;
 			nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
 		}
-
-		ret = __mem_cgroup_do_charge(mem, gfp_mask, csize, oom_check);
+		if (use_pcp_cache)
+			ret = __mem_cgroup_do_charge(mem, gfp_mask,
+					CHARGE_SIZE, false, oom_check);
+		else
+			ret = __mem_cgroup_do_charge(mem, gfp_mask,
+					page_size, true, oom_check);
 
 		switch (ret) {
 		case CHARGE_OK:
 			break;
 		case CHARGE_RETRY: /* not in OOM situation but retry */
-			csize = page_size;
+			if (use_pcp_cache)/* need to reclaim pages */
+				use_pcp_cache = false;
 			css_put(&mem->css);
 			mem = NULL;
 			goto again;
+		case CHARGE_NEED_BREAK: /* page_size > PAGE_SIZE */
+			css_put(&mem->css);
+			/* returning faiulre doesn't mean OOM for hugepages */
+			goto nomem;
 		case CHARGE_WOULDBLOCK: /* !__GFP_WAIT */
 			css_put(&mem->css);
 			goto nomem;
@@ -1994,9 +2013,9 @@ again:
 			goto bypass;
 		}
 	} while (ret != CHARGE_OK);
-
-	if (csize > page_size)
-		refill_stock(mem, csize - page_size);
+	/* This flag is cleared when we fail CHAEGE_SIZE charge. */
+	if (use_pcp_cache)
+		refill_stock(mem, CHARGE_SIZE - page_size);
 	css_put(&mem->css);
 done:
 	*memcg = mem;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-01-21  6:50 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-21  6:34 [PATCH 0/7] memcg : more fixes and clean up for 2.6.28-rc KAMEZAWA Hiroyuki
2011-01-21  6:37 ` [PATCH 1/7] memcg : comment, style fixes for recent patch of move_parent KAMEZAWA Hiroyuki
2011-01-21  7:16   ` Daisuke Nishimura
2011-01-24 10:14   ` Johannes Weiner
2011-01-24 10:15     ` KAMEZAWA Hiroyuki
2011-01-24 10:45       ` Johannes Weiner
2011-01-24 11:14         ` Hiroyuki Kamezawa
2011-01-24 11:34           ` Johannes Weiner
2011-01-21  6:39 ` [PATCH 2/7] memcg : more fixes and clean up for 2.6.28-rc KAMEZAWA Hiroyuki
2011-01-21  7:17   ` Daisuke Nishimura
2011-01-24 10:14   ` Johannes Weiner
2011-01-21  6:41 ` [PATCH 3/7] memcg : fix mem_cgroup_check_under_limit KAMEZAWA Hiroyuki
2011-01-21  7:45   ` Daisuke Nishimura
2011-01-24 10:04   ` Johannes Weiner
2011-01-24 10:03     ` KAMEZAWA Hiroyuki
2011-01-21  6:44 ` KAMEZAWA Hiroyuki [this message]
2011-01-21  8:48   ` [PATCH 4/7] memcg : fix charge function of THP allocation Daisuke Nishimura
2011-01-24  0:14     ` KAMEZAWA Hiroyuki
2011-01-27 10:34   ` Johannes Weiner
2011-01-27 10:40     ` [patch] memcg: prevent endless loop with huge pages and near-limit group Johannes Weiner
2011-01-27 23:40       ` KAMEZAWA Hiroyuki
2011-01-27 13:46     ` [patch 2/3] memcg: prevent endless loop on huge page charge Johannes Weiner
2011-01-27 14:00       ` Gleb Natapov
2011-01-27 14:14         ` Johannes Weiner
2011-01-27 23:41           ` KAMEZAWA Hiroyuki
2011-01-27 13:47     ` [patch 3/3] memcg: never OOM when charging huge pages Johannes Weiner
2011-01-27 23:44       ` KAMEZAWA Hiroyuki
2011-01-27 23:45       ` Daisuke Nishimura
2011-01-27 23:49         ` KAMEZAWA Hiroyuki
2011-01-27 14:18     ` [PATCH 4/7] memcg : fix charge function of THP allocation Johannes Weiner
2011-01-27 23:38     ` KAMEZAWA Hiroyuki
2011-01-21  6:46 ` [PATCH 5/7] memcg : fix khugepaged scan of process under buzy memcg KAMEZAWA Hiroyuki
2011-01-21  6:49 ` [PATCH 6/7] memcg : use better variable name KAMEZAWA Hiroyuki
2011-01-21  6:50 ` [PATCH 7/7] memcg : remove ugly vairable initialization by callers KAMEZAWA Hiroyuki
2011-01-21  9:17   ` Daisuke Nishimura
2011-01-24 10:19   ` Johannes Weiner
2011-01-24  0:29 ` [PATCH 0/7] memcg : more fixes and clean up for 2.6.28-rc KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110121154430.70d45f15.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox