Re: [PATCH 3/4] Memory controller soft limit organize cgroups (v5)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>
To: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org, YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	lizf@cn.fujitsu.com,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 3/4] Memory controller soft limit organize cgroups (v5)
Date: Fri, 13 Mar 2009 15:59:49 +0900 (JST)	[thread overview]
Message-ID: <cfa8f3be8873274b74d5c5bd0dbcb532.squirrel@webmail-b.css.fujitsu.com> (raw)
In-Reply-To: <20090312175625.17890.94795.sendpatchset@localhost.localdomain>

Balbir Singh wrote:
> Feature: Organize cgroups over soft limit in a RB-Tree
>
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
>
> Changelog v5...v4
> 1. res_counter_uncharge has an additional parameter to indicate if the
>    counter was over its soft limit, before uncharge.
>
> Changelog v4...v3
> 1. Optimizations to ensure we don't uncessarily get res_counter values
> 2. Fixed a bug in usage of time_after()
>
> Changelog v3...v2
> 1. Add only the ancestor to the RB-Tree
> 2. Use css_tryget/css_put instead of mem_cgroup_get/mem_cgroup_put
>
> Changelog v2...v1
> 1. Add support for hierarchies
> 2. The res_counter that is highest in the hierarchy is returned on soft
>    limit being exceeded. Since we do hierarchical reclaim and add all
>    groups exceeding their soft limits, this approach seems to work well
>    in practice.
>
> This patch introduces a RB-Tree for storing memory cgroups that are over
> their
> soft limit. The overall goal is to
>
> 1. Add a memory cgroup to the RB-Tree when the soft limit is exceeded.
>    We are careful about updates, updates take place only after a
> particular
>    time interval has passed
> 2. We remove the node from the RB-Tree when the usage goes below the soft
>    limit
>
> The next set of patches will exploit the RB-Tree to get the group that is
> over its soft limit by the largest amount and reclaim from it, when we
> face memory contention.
>
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
>
>  include/linux/res_counter.h |    6 +-
>  kernel/res_counter.c        |   18 +++++
>  mm/memcontrol.c             |  141
> ++++++++++++++++++++++++++++++++++++++-----
>  3 files changed, 143 insertions(+), 22 deletions(-)
>
>
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index 5c821fd..5bbf8b1 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -112,7 +112,8 @@ void res_counter_init(struct res_counter *counter,
> struct res_counter *parent);
>  int __must_check res_counter_charge_locked(struct res_counter *counter,
>  		unsigned long val);
>  int __must_check res_counter_charge(struct res_counter *counter,
> -		unsigned long val, struct res_counter **limit_fail_at);
> +		unsigned long val, struct res_counter **limit_fail_at,
> +		struct res_counter **soft_limit_at);
>
>  /*
>   * uncharge - tell that some portion of the resource is released
> @@ -125,7 +126,8 @@ int __must_check res_counter_charge(struct res_counter
> *counter,
>   */
>
>  void res_counter_uncharge_locked(struct res_counter *counter, unsigned
> long val);
> -void res_counter_uncharge(struct res_counter *counter, unsigned long
> val);
> +void res_counter_uncharge(struct res_counter *counter, unsigned long val,
> +				bool *was_soft_limit_excess);
>
>  static inline bool res_counter_limit_check_locked(struct res_counter
> *cnt)
>  {
> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> index 4e6dafe..51ec438 100644
> --- a/kernel/res_counter.c
> +++ b/kernel/res_counter.c
> @@ -37,17 +37,27 @@ int res_counter_charge_locked(struct res_counter
> *counter, unsigned long val)
>  }
>
>  int res_counter_charge(struct res_counter *counter, unsigned long val,
> -			struct res_counter **limit_fail_at)
> +			struct res_counter **limit_fail_at,
> +			struct res_counter **soft_limit_fail_at)
>  {
>  	int ret;
>  	unsigned long flags;
>  	struct res_counter *c, *u;
>
>  	*limit_fail_at = NULL;
> +	if (soft_limit_fail_at)
> +		*soft_limit_fail_at = NULL;
>  	local_irq_save(flags);
>  	for (c = counter; c != NULL; c = c->parent) {
>  		spin_lock(&c->lock);
>  		ret = res_counter_charge_locked(c, val);
> +		/*
> +		 * With soft limits, we return the highest ancestor
> +		 * that exceeds its soft limit
> +		 */
> +		if (soft_limit_fail_at &&
> +			!res_counter_soft_limit_check_locked(c))
> +			*soft_limit_fail_at = c;
>  		spin_unlock(&c->lock);
>  		if (ret < 0) {
>  			*limit_fail_at = c;

Do we need these all check at every call ?
If you just check tree update once per HZ/?? , please check
this onece per HZ/??. And please help people who doesn't use softlimit
i.e. who doesn't mount cgroup but have to use configured kernel.



> +static struct rb_root mem_cgroup_soft_limit_tree;
> +static DEFINE_SPINLOCK(memcg_soft_limit_tree_lock);
> +
Can't we breakd down this lock to per node or per cpu or...?

> +/*
>   * The memory controller data structure. The memory controller controls
> both
>   * page cache and RSS per cgroup. We would eventually like to provide
>   * statistics based on the statistics developed by Rik Van Riel for
> clock-pro,
> @@ -176,12 +185,20 @@ struct mem_cgroup {
>
>  	unsigned int	swappiness;
>
> +	struct rb_node mem_cgroup_node;		/* RB tree node */
> +	unsigned long long usage_in_excess;	/* Set to the value by which */
> +						/* the soft limit is exceeded*/
> +	unsigned long last_tree_update;		/* Last time the tree was */
> +						/* updated in jiffies     */
> +
>  	/*
>  	 * statistics. This must be placed at the end of memcg.
>  	 */
>  	struct mem_cgroup_stat stat;
>  };
>
> +#define	MEM_CGROUP_TREE_UPDATE_INTERVAL		(HZ/4)
> +

Why HZ/4 again.

>  enum charge_type {
>  	MEM_CGROUP_CHARGE_TYPE_CACHE = 0,
>  	MEM_CGROUP_CHARGE_TYPE_MAPPED,
> @@ -214,6 +231,41 @@ static void mem_cgroup_get(struct mem_cgroup *mem);
>  static void mem_cgroup_put(struct mem_cgroup *mem);
>  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
>
> +static void mem_cgroup_insert_exceeded(struct mem_cgroup *mem)
> +{
> +	struct rb_node **p = &mem_cgroup_soft_limit_tree.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct mem_cgroup *mem_node;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&memcg_soft_limit_tree_lock, flags);
> +	while (*p) {
> +		parent = *p;
> +		mem_node = rb_entry(parent, struct mem_cgroup, mem_cgroup_node);
> +		if (mem->usage_in_excess < mem_node->usage_in_excess)
> +			p = &(*p)->rb_left;
> +		/*
> +		 * We can't avoid mem cgroups that are over their soft
> +		 * limit by the same amount
> +		 */
> +		else if (mem->usage_in_excess >= mem_node->usage_in_excess)
> +			p = &(*p)->rb_right;
> +	}
> +	rb_link_node(&mem->mem_cgroup_node, parent, p);
> +	rb_insert_color(&mem->mem_cgroup_node,
> +			&mem_cgroup_soft_limit_tree);
> +	mem->last_tree_update = jiffies;
> +	spin_unlock_irqrestore(&memcg_soft_limit_tree_lock, flags);
> +}
> +
> +static void mem_cgroup_remove_exceeded(struct mem_cgroup *mem)
> +{
> +	unsigned long flags;
> +	spin_lock_irqsave(&memcg_soft_limit_tree_lock, flags);
> +	rb_erase(&mem->mem_cgroup_node, &mem_cgroup_soft_limit_tree);
> +	spin_unlock_irqrestore(&memcg_soft_limit_tree_lock, flags);
> +}
> +
>  static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
>  					 struct page_cgroup *pc,
>  					 bool charge)
> @@ -897,6 +949,40 @@ static void record_last_oom(struct mem_cgroup *mem)
>  	mem_cgroup_walk_tree(mem, NULL, record_last_oom_cb);
>  }
>
> +static void mem_cgroup_check_and_update_tree(struct mem_cgroup *mem,
> +						bool time_check)
> +{
> +	unsigned long long prev_usage_in_excess, new_usage_in_excess;
> +	bool updated_tree = false;
> +	unsigned long next_update = 0;
> +	unsigned long flags;
> +
> +	prev_usage_in_excess = mem->usage_in_excess;
> +
> +	if (time_check)
> +		next_update = mem->last_tree_update +
> +				MEM_CGROUP_TREE_UPDATE_INTERVAL;
> +
> +	if (!time_check || time_after(jiffies, next_update)) {
> +		new_usage_in_excess = res_counter_soft_limit_excess(&mem->res);
> +		if (prev_usage_in_excess) {
> +			mem_cgroup_remove_exceeded(mem);
> +			updated_tree = true;
> +		}
> +		if (!new_usage_in_excess)
> +			goto done;
> +		mem_cgroup_insert_exceeded(mem);
> +		updated_tree = true;
> +	}
> +
> +done:
> +	if (updated_tree) {
> +		spin_lock_irqsave(&memcg_soft_limit_tree_lock, flags);
> +		mem->last_tree_update = jiffies;
> +		mem->usage_in_excess = new_usage_in_excess;
> +		spin_unlock_irqrestore(&memcg_soft_limit_tree_lock, flags);
> +	}
> +}
Why update key parameter after inserting tree ? Is this bug ?
Maybe RB-tree will be not sorted.

Bye,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-03-13  6:59 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-12 17:56 [PATCH 0/4] Memory controller soft limit patches (v5) Balbir Singh
2009-03-12 17:56 ` [PATCH 1/4] Memory controller soft limit documentation (v5) Balbir Singh
2009-03-12 17:56 ` [PATCH 2/4] Memory controller soft limit interface (v5) Balbir Singh
2009-03-12 22:59   ` Andrew Morton
2009-03-13  4:58     ` Balbir Singh
2009-03-12 17:56 ` [PATCH 3/4] Memory controller soft limit organize cgroups (v5) Balbir Singh
2009-03-12 23:04   ` Andrew Morton
2009-03-13  5:03     ` Balbir Singh
2009-03-13  0:47   ` KOSAKI Motohiro
2009-03-13  5:04     ` Balbir Singh
2009-03-13  5:22       ` KOSAKI Motohiro
2009-03-13  8:20         ` Balbir Singh
2009-03-13  6:59   ` KAMEZAWA Hiroyuki [this message]
2009-03-13  7:09     ` Balbir Singh
2009-03-12 17:56 ` [PATCH 4/4] Memory controller soft limit reclaim on contention (v5) Balbir Singh
2009-03-12 23:34   ` Andrew Morton
2009-03-13  7:53     ` Balbir Singh
2009-03-13  1:36   ` KOSAKI Motohiro
2009-03-13  4:13     ` Balbir Singh
2009-03-13  4:31       ` KOSAKI Motohiro
2009-03-13  4:50         ` KOSAKI Motohiro
2009-03-13  5:07           ` Balbir Singh
2009-03-13  6:54             ` KOSAKI Motohiro
2009-03-13  7:03               ` Balbir Singh
2009-03-13  7:17                 ` KOSAKI Motohiro
2009-03-13  7:26                   ` Balbir Singh
2009-03-13  8:37                     ` KAMEZAWA Hiroyuki
2009-03-13  5:26           ` Balbir Singh
2009-03-13  5:34             ` KOSAKI Motohiro
2009-03-13  4:58         ` Balbir Singh
2009-03-13  6:51   ` KAMEZAWA Hiroyuki
2009-03-13  7:15     ` Balbir Singh
2009-03-13  8:41       ` KAMEZAWA Hiroyuki
2009-03-13  7:02 ` [PATCH 0/4] Memory controller soft limit patches (v5) KAMEZAWA Hiroyuki
2009-03-13  7:07   ` KAMEZAWA Hiroyuki
2009-03-13  7:15     ` Andrew Morton
2009-03-13  7:29       ` Balbir Singh
2009-03-13  7:18     ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cfa8f3be8873274b74d5c5bd0dbcb532.squirrel@webmail-b.css.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=riel@redhat.com \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox