Re: [RFC][mm] [PATCH 3/4] Memory cgroup hierarchical reclaim (v3)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux-mm@kvack.org, YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	Paul Menage <menage@google.com>,
	lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	David Rientjes <rientjes@google.com>,
	Pavel Emelianov <xemul@openvz.org>,
	Dhaval Giani <dhaval@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC][mm] [PATCH 3/4] Memory cgroup hierarchical reclaim (v3)
Date: Wed, 12 Nov 2008 11:19:37 +0530	[thread overview]
Message-ID: <491A6E71.5010307@linux.vnet.ibm.com> (raw)
In-Reply-To: <20081112140236.46448b47.kamezawa.hiroyu@jp.fujitsu.com>

KAMEZAWA Hiroyuki wrote:
> On Tue, 11 Nov 2008 18:04:17 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
>> This patch introduces hierarchical reclaim. When an ancestor goes over its
>> limit, the charging routine points to the parent that is above its limit.
>> The reclaim process then starts from the last scanned child of the ancestor
>> and reclaims until the ancestor goes below its limit.
>>
> 
>> +/*
>> + * Dance down the hierarchy if needed to reclaim memory. We remember the
>> + * last child we reclaimed from, so that we don't end up penalizing
>> + * one child extensively based on its position in the children list.
>> + *
>> + * root_mem is the original ancestor that we've been reclaim from.
>> + */
>> +static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *mem,
>> +						struct mem_cgroup *root_mem,
>> +						gfp_t gfp_mask)
>> +{
>> +	struct cgroup *cg_current, *cgroup;
>> +	struct mem_cgroup *mem_child;
>> +	int ret = 0;
>> +
>> +	/*
>> +	 * Reclaim unconditionally and don't check for return value.
>> +	 * We need to reclaim in the current group and down the tree.
>> +	 * One might think about checking for children before reclaiming,
>> +	 * but there might be left over accounting, even after children
>> +	 * have left.
>> +	 */
>> +	try_to_free_mem_cgroup_pages(mem, gfp_mask);
>> +
>> +	if (res_counter_check_under_limit(&root_mem->res))
>> +		return 0;
>> +
>> +	cgroup_lock();
>> +
>> +	if (list_empty(&mem->css.cgroup->children)) {
>> +		cgroup_unlock();
>> +		return 0;
>> +	}
>> +
>> +	/*
>> +	 * Scan all children under the mem_cgroup mem
>> +	 */
>> +	if (!mem->last_scanned_child)
>> +		cgroup = list_first_entry(&mem->css.cgroup->children,
>> +				struct cgroup, sibling);
>> +	else
>> +		cgroup = mem->last_scanned_child->css.cgroup;
>> +
>> +	cg_current = cgroup;
>> +
>> +	do {
>> +		struct list_head *next;
>> +
>> +		mem_child = mem_cgroup_from_cont(cgroup);
>> +		cgroup_unlock();
>> +
>> +		ret = mem_cgroup_hierarchical_reclaim(mem_child, root_mem,
>> +							gfp_mask);
>> +		cgroup_lock();
>> +		mem->last_scanned_child = mem_child;
>> +		if (res_counter_check_under_limit(&root_mem->res)) {
>> +			ret = 0;
>> +			goto done;
>> +		}
>> +
>> +		/*
>> +		 * Since we gave up the lock, it is time to
>> +		 * start from last cgroup
>> +		 */
>> +		cgroup = mem->last_scanned_child->css.cgroup;
>> +		next = cgroup->sibling.next;
>> +
>> +		if (next == &cg_current->parent->children)
>> +			cgroup = list_first_entry(&mem->css.cgroup->children,
>> +							struct cgroup, sibling);
>> +		else
>> +			cgroup = container_of(next, struct cgroup, sibling);
>> +	} while (cgroup != cg_current);
>> +
>> +done:
>> +	cgroup_unlock();
>> +	return ret;
>> +}
> 
> Hmm, does this function is necessary to be complex as this ?
> I'm sorry I don't have enough time to review now. (chasing memory online/offline bug.)
> 
> But I can't convice this is a good way to reclaim in hierachical manner.
> 
> In following tree, Assume that processes hit limitation of Level_2.
> 
>    Level_1 (no limit)
> 	-> Level_2	(limit=1G)
> 		-> Level_3_A (usage=30M)
> 		-> Level_3_B (usage=100M)
> 			-> Level_4_A (usage=50M)
> 			-> Level_4_B (usage=400M)
> 			-> Level_4_C (usage=420M)
> 
> Even if we know Level_4_C incudes tons of Inactive file caches,
> some amount of swap-out will occur until reachin Level_4_C.
> 
> Can't we do this hierarchical reclaim in another way ?
> (start from Level_4_C because we know it has tons of inactive caches.)
> 
> This style of recursive call doesn't have chance to do kind of optimization.
> Can we do this reclaim in more flat manner as loop like following
> =
> try:
>   select the most inactive one
> 	-> try_to_fre_memory
> 		-> check limit
> 			-> go to try;
> ==
> 

I've been thinking along those lines as well and that will get more important as
we try to implement soft limits. However, for the current version I wanted
correctness. Fairness, I've seen is achieved, since groups with large number of
inactive pages, does get reclaimed from more than others (in my simple
experiments).

As far the pseudo code is concerned, select the most inactive one is an O(c)
operation, where c is the number of nodes under the subtree and is expensive.
The data structure and select algorithm get expensive. I am thinking about a
more suitable approach for implementation, but I want to focus on correctness as
the first step. Since the hierarchy is not enabled by default, I am not adding
any additional overhead, so I think that this approach is suitable.

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-11-12  5:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-11 12:33 [RFC][mm][PATCH 0/4] Memory cgroup hierarchy introduction (v3) Balbir Singh
2008-11-11 12:33 ` [RFC][mm] [PATCH 1/4] Memory cgroup hierarchy documentation (v3) Balbir Singh
2008-11-11 12:34 ` [RFC][mm] [PATCH 2/4] Memory cgroup resource counters for hierarchy (v3) Balbir Singh
2008-11-11 12:34 ` [RFC][mm] [PATCH 3/4] Memory cgroup hierarchical reclaim (v3) Balbir Singh
2008-11-12  3:52   ` KAMEZAWA Hiroyuki
2008-11-12  4:00     ` Balbir Singh
2008-11-12  5:02   ` KAMEZAWA Hiroyuki
2008-11-12  5:49     ` Balbir Singh [this message]
2008-11-12  6:01       ` KAMEZAWA Hiroyuki
2008-11-12  6:10         ` Balbir Singh
2008-11-12  6:12           ` KAMEZAWA Hiroyuki
2008-11-12  6:22             ` Balbir Singh
2008-11-12  6:33               ` KAMEZAWA Hiroyuki
2008-11-12 11:21                 ` Balbir Singh
2008-11-13  4:18                   ` KAMEZAWA Hiroyuki
2008-11-13 13:33                     ` Balbir Singh
2008-11-11 12:34 ` [RFC][mm] [PATCH 4/4] Memory cgroup hierarchy feature selector (v3) Balbir Singh
2008-11-13  1:28   ` Li Zefan
2008-11-13  1:34     ` Balbir Singh
2008-11-13  1:39   ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=491A6E71.5010307@linux.vnet.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=rientjes@google.com \
    --cc=xemul@openvz.org \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox