[RFC] Reduce the resource counter lock overhead

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Reduce the resource counter lock overhead
@ 2009-06-24 17:05 Balbir Singh
  2009-06-24 19:40 ` Paul Menage
  2009-06-24 23:10 ` Andrew Morton
  0 siblings, 2 replies; 14+ messages in thread
From: Balbir Singh @ 2009-06-24 17:05 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, nishimura; +Cc: Andrew Morton, menage, xemul, linux-mm, lizf

Hi, All,

I've been experimenting with reduction of resource counter locking
overhead. My benchmarks show a marginal improvement, /proc/lock_stat
however shows that the lock contention time and held time reduce
by quite an amount after this patch. 

Before the patch, I see

lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions
waittime-min   waittime-max waittime-total    acq-bounces
acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                          &counter->lock:       1534627        1575341
0.57          18.39      675713.23       43330446      138524248
0.43         148.13    54133607.05
                          --------------
                          &counter->lock         809559
[<ffffffff810810c5>] res_counter_charge+0x3f/0xed
                          &counter->lock         765782
[<ffffffff81081045>] res_counter_uncharge+0x2c/0x6d
                          --------------
                          &counter->lock         653284
[<ffffffff81081045>] res_counter_uncharge+0x2c/0x6d
                          &counter->lock         922057
[<ffffffff810810c5>] res_counter_charge+0x3f/0xed


After the patch I see

lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions
waittime-min   waittime-max waittime-total    acq-bounces
acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                 &(&counter->lock)->lock:        962193         976349
0.60          14.07      465926.04       21364165       66041988
0.45          88.31    25395513.12
                 -----------------------
                 &(&counter->lock)->lock         495468
[<ffffffff8108106e>] res_counter_uncharge+0x2c/0x77
                 &(&counter->lock)->lock         480881
[<ffffffff810810f7>] res_counter_charge+0x3e/0xfb
                 -----------------------
                 &(&counter->lock)->lock         564419
[<ffffffff810810f7>] res_counter_charge+0x3e/0xfb
                 &(&counter->lock)->lock         411930
[<ffffffff8108106e>] res_counter_uncharge+0x2c/0x77

Please review, comment on the usefulness of this approach. I do have
another approach in mind for reducing res_counter lock overhead, but
this one seems the most straight forward


Feature: Change locking of res_counter

From: Balbir Singh <balbir@linux.vnet.ibm.com>

Resource Counters today use spin_lock_irq* variants for locking.
This patch converts the lock to a seqlock_t
---

 include/linux/res_counter.h |   24 +++++++++++++-----------
 kernel/res_counter.c        |   18 +++++++++---------
 2 files changed, 22 insertions(+), 20 deletions(-)


diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index 511f42f..4c61757 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -14,6 +14,7 @@
  */
 
 #include <linux/cgroup.h>
+#include <linux/seqlock.h>
 
 /*
  * The core object. the cgroup that wishes to account for some
@@ -42,7 +43,7 @@ struct res_counter {
 	 * the lock to protect all of the above.
 	 * the routines below consider this to be IRQ-safe
 	 */
-	spinlock_t lock;
+	seqlock_t lock;
 	/*
 	 * Parent counter, used for hierarchial resource accounting
 	 */
@@ -139,11 +140,12 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
 static inline bool res_counter_check_under_limit(struct res_counter *cnt)
 {
 	bool ret;
-	unsigned long flags;
+	unsigned long flags, seq;
 
-	spin_lock_irqsave(&cnt->lock, flags);
-	ret = res_counter_limit_check_locked(cnt);
-	spin_unlock_irqrestore(&cnt->lock, flags);
+	do {
+		seq = read_seqbegin_irqsave(&cnt->lock, flags);
+		ret = res_counter_limit_check_locked(cnt);
+	} while (read_seqretry_irqrestore(&cnt->lock, seq, flags));
 	return ret;
 }
 
@@ -151,18 +153,18 @@ static inline void res_counter_reset_max(struct res_counter *cnt)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&cnt->lock, flags);
+	write_seqlock_irqsave(&cnt->lock, flags);
 	cnt->max_usage = cnt->usage;
-	spin_unlock_irqrestore(&cnt->lock, flags);
+	write_sequnlock_irqrestore(&cnt->lock, flags);
 }
 
 static inline void res_counter_reset_failcnt(struct res_counter *cnt)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&cnt->lock, flags);
+	write_seqlock_irqsave(&cnt->lock, flags);
 	cnt->failcnt = 0;
-	spin_unlock_irqrestore(&cnt->lock, flags);
+	write_sequnlock_irqrestore(&cnt->lock, flags);
 }
 
 static inline int res_counter_set_limit(struct res_counter *cnt,
@@ -171,12 +173,12 @@ static inline int res_counter_set_limit(struct res_counter *cnt,
 	unsigned long flags;
 	int ret = -EBUSY;
 
-	spin_lock_irqsave(&cnt->lock, flags);
+	write_seqlock_irqsave(&cnt->lock, flags);
 	if (cnt->usage <= limit) {
 		cnt->limit = limit;
 		ret = 0;
 	}
-	spin_unlock_irqrestore(&cnt->lock, flags);
+	write_sequnlock_irqrestore(&cnt->lock, flags);
 	return ret;
 }
 
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index e1338f0..9830c00 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -17,7 +17,7 @@
 
 void res_counter_init(struct res_counter *counter, struct res_counter *parent)
 {
-	spin_lock_init(&counter->lock);
+	seqlock_init(&counter->lock);
 	counter->limit = RESOURCE_MAX;
 	counter->parent = parent;
 }
@@ -45,9 +45,9 @@ int res_counter_charge(struct res_counter *counter, unsigned long val,
 	*limit_fail_at = NULL;
 	local_irq_save(flags);
 	for (c = counter; c != NULL; c = c->parent) {
-		spin_lock(&c->lock);
+		write_seqlock(&c->lock);
 		ret = res_counter_charge_locked(c, val);
-		spin_unlock(&c->lock);
+		write_sequnlock(&c->lock);
 		if (ret < 0) {
 			*limit_fail_at = c;
 			goto undo;
@@ -57,9 +57,9 @@ int res_counter_charge(struct res_counter *counter, unsigned long val,
 	goto done;
 undo:
 	for (u = counter; u != c; u = u->parent) {
-		spin_lock(&u->lock);
+		write_seqlock(&u->lock);
 		res_counter_uncharge_locked(u, val);
-		spin_unlock(&u->lock);
+		write_sequnlock(&u->lock);
 	}
 done:
 	local_irq_restore(flags);
@@ -81,9 +81,9 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
 
 	local_irq_save(flags);
 	for (c = counter; c != NULL; c = c->parent) {
-		spin_lock(&c->lock);
+		write_seqlock(&c->lock);
 		res_counter_uncharge_locked(c, val);
-		spin_unlock(&c->lock);
+		write_sequnlock(&c->lock);
 	}
 	local_irq_restore(flags);
 }
@@ -167,9 +167,9 @@ int res_counter_write(struct res_counter *counter, int member,
 		if (*end != '\0')
 			return -EINVAL;
 	}
-	spin_lock_irqsave(&counter->lock, flags);
+	write_seqlock_irqsave(&counter->lock, flags);
 	val = res_counter_member(counter, member);
 	*val = tmp;
-	spin_unlock_irqrestore(&counter->lock, flags);
+	write_sequnlock_irqrestore(&counter->lock, flags);
 	return 0;
 }

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-24 17:05 [RFC] Reduce the resource counter lock overhead Balbir Singh
@ 2009-06-24 19:40 ` Paul Menage
  2009-06-24 23:10 ` Andrew Morton
  1 sibling, 0 replies; 14+ messages in thread
From: Paul Menage @ 2009-06-24 19:40 UTC (permalink / raw)
  To: balbir; +Cc: KAMEZAWA Hiroyuki, nishimura, Andrew Morton, xemul, linux-mm, lizf

Looks like a sensible change.

Paul

On Wed, Jun 24, 2009 at 10:05 AM, Balbir Singh<balbir@linux.vnet.ibm.com> wrote:
> Hi, All,
>
> I've been experimenting with reduction of resource counter locking
> overhead. My benchmarks show a marginal improvement, /proc/lock_stat
> however shows that the lock contention time and held time reduce
> by quite an amount after this patch.
>
> Before the patch, I see
>
> lock_stat version 0.3
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                              class name    con-bounces    contentions
> waittime-min   waittime-max waittime-total    acq-bounces
> acquisitions   holdtime-min   holdtime-max holdtime-total
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                          &counter->lock:       1534627        1575341
> 0.57          18.39      675713.23       43330446      138524248
> 0.43         148.13    54133607.05
>                          --------------
>                          &counter->lock         809559
> [<ffffffff810810c5>] res_counter_charge+0x3f/0xed
>                          &counter->lock         765782
> [<ffffffff81081045>] res_counter_uncharge+0x2c/0x6d
>                          --------------
>                          &counter->lock         653284
> [<ffffffff81081045>] res_counter_uncharge+0x2c/0x6d
>                          &counter->lock         922057
> [<ffffffff810810c5>] res_counter_charge+0x3f/0xed
>
>
> After the patch I see
>
> lock_stat version 0.3
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                              class name    con-bounces    contentions
> waittime-min   waittime-max waittime-total    acq-bounces
> acquisitions   holdtime-min   holdtime-max holdtime-total
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 &(&counter->lock)->lock:        962193         976349
> 0.60          14.07      465926.04       21364165       66041988
> 0.45          88.31    25395513.12
>                 -----------------------
>                 &(&counter->lock)->lock         495468
> [<ffffffff8108106e>] res_counter_uncharge+0x2c/0x77
>                 &(&counter->lock)->lock         480881
> [<ffffffff810810f7>] res_counter_charge+0x3e/0xfb
>                 -----------------------
>                 &(&counter->lock)->lock         564419
> [<ffffffff810810f7>] res_counter_charge+0x3e/0xfb
>                 &(&counter->lock)->lock         411930
> [<ffffffff8108106e>] res_counter_uncharge+0x2c/0x77
>
> Please review, comment on the usefulness of this approach. I do have
> another approach in mind for reducing res_counter lock overhead, but
> this one seems the most straight forward
>
>
> Feature: Change locking of res_counter
>
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
>
> Resource Counters today use spin_lock_irq* variants for locking.
> This patch converts the lock to a seqlock_t
> ---
>
>  include/linux/res_counter.h |   24 +++++++++++++-----------
>  kernel/res_counter.c        |   18 +++++++++---------
>  2 files changed, 22 insertions(+), 20 deletions(-)
>
>
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index 511f42f..4c61757 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -14,6 +14,7 @@
>  */
>
>  #include <linux/cgroup.h>
> +#include <linux/seqlock.h>
>
>  /*
>  * The core object. the cgroup that wishes to account for some
> @@ -42,7 +43,7 @@ struct res_counter {
>         * the lock to protect all of the above.
>         * the routines below consider this to be IRQ-safe
>         */
> -       spinlock_t lock;
> +       seqlock_t lock;
>        /*
>         * Parent counter, used for hierarchial resource accounting
>         */
> @@ -139,11 +140,12 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
>  static inline bool res_counter_check_under_limit(struct res_counter *cnt)
>  {
>        bool ret;
> -       unsigned long flags;
> +       unsigned long flags, seq;
>
> -       spin_lock_irqsave(&cnt->lock, flags);
> -       ret = res_counter_limit_check_locked(cnt);
> -       spin_unlock_irqrestore(&cnt->lock, flags);
> +       do {
> +               seq = read_seqbegin_irqsave(&cnt->lock, flags);
> +               ret = res_counter_limit_check_locked(cnt);
> +       } while (read_seqretry_irqrestore(&cnt->lock, seq, flags));
>        return ret;
>  }
>
> @@ -151,18 +153,18 @@ static inline void res_counter_reset_max(struct res_counter *cnt)
>  {
>        unsigned long flags;
>
> -       spin_lock_irqsave(&cnt->lock, flags);
> +       write_seqlock_irqsave(&cnt->lock, flags);
>        cnt->max_usage = cnt->usage;
> -       spin_unlock_irqrestore(&cnt->lock, flags);
> +       write_sequnlock_irqrestore(&cnt->lock, flags);
>  }
>
>  static inline void res_counter_reset_failcnt(struct res_counter *cnt)
>  {
>        unsigned long flags;
>
> -       spin_lock_irqsave(&cnt->lock, flags);
> +       write_seqlock_irqsave(&cnt->lock, flags);
>        cnt->failcnt = 0;
> -       spin_unlock_irqrestore(&cnt->lock, flags);
> +       write_sequnlock_irqrestore(&cnt->lock, flags);
>  }
>
>  static inline int res_counter_set_limit(struct res_counter *cnt,
> @@ -171,12 +173,12 @@ static inline int res_counter_set_limit(struct res_counter *cnt,
>        unsigned long flags;
>        int ret = -EBUSY;
>
> -       spin_lock_irqsave(&cnt->lock, flags);
> +       write_seqlock_irqsave(&cnt->lock, flags);
>        if (cnt->usage <= limit) {
>                cnt->limit = limit;
>                ret = 0;
>        }
> -       spin_unlock_irqrestore(&cnt->lock, flags);
> +       write_sequnlock_irqrestore(&cnt->lock, flags);
>        return ret;
>  }
>
> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> index e1338f0..9830c00 100644
> --- a/kernel/res_counter.c
> +++ b/kernel/res_counter.c
> @@ -17,7 +17,7 @@
>
>  void res_counter_init(struct res_counter *counter, struct res_counter *parent)
>  {
> -       spin_lock_init(&counter->lock);
> +       seqlock_init(&counter->lock);
>        counter->limit = RESOURCE_MAX;
>        counter->parent = parent;
>  }
> @@ -45,9 +45,9 @@ int res_counter_charge(struct res_counter *counter, unsigned long val,
>        *limit_fail_at = NULL;
>        local_irq_save(flags);
>        for (c = counter; c != NULL; c = c->parent) {
> -               spin_lock(&c->lock);
> +               write_seqlock(&c->lock);
>                ret = res_counter_charge_locked(c, val);
> -               spin_unlock(&c->lock);
> +               write_sequnlock(&c->lock);
>                if (ret < 0) {
>                        *limit_fail_at = c;
>                        goto undo;
> @@ -57,9 +57,9 @@ int res_counter_charge(struct res_counter *counter, unsigned long val,
>        goto done;
>  undo:
>        for (u = counter; u != c; u = u->parent) {
> -               spin_lock(&u->lock);
> +               write_seqlock(&u->lock);
>                res_counter_uncharge_locked(u, val);
> -               spin_unlock(&u->lock);
> +               write_sequnlock(&u->lock);
>        }
>  done:
>        local_irq_restore(flags);
> @@ -81,9 +81,9 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
>
>        local_irq_save(flags);
>        for (c = counter; c != NULL; c = c->parent) {
> -               spin_lock(&c->lock);
> +               write_seqlock(&c->lock);
>                res_counter_uncharge_locked(c, val);
> -               spin_unlock(&c->lock);
> +               write_sequnlock(&c->lock);
>        }
>        local_irq_restore(flags);
>  }
> @@ -167,9 +167,9 @@ int res_counter_write(struct res_counter *counter, int member,
>                if (*end != '\0')
>                        return -EINVAL;
>        }
> -       spin_lock_irqsave(&counter->lock, flags);
> +       write_seqlock_irqsave(&counter->lock, flags);
>        val = res_counter_member(counter, member);
>        *val = tmp;
> -       spin_unlock_irqrestore(&counter->lock, flags);
> +       write_sequnlock_irqrestore(&counter->lock, flags);
>        return 0;
>  }
>
> --
>        Balbir
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-24 17:05 [RFC] Reduce the resource counter lock overhead Balbir Singh
  2009-06-24 19:40 ` Paul Menage
@ 2009-06-24 23:10 ` Andrew Morton
  2009-06-24 23:53   ` KAMEZAWA Hiroyuki
  2009-06-25  3:04   ` Balbir Singh
  1 sibling, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2009-06-24 23:10 UTC (permalink / raw)
  To: balbir; +Cc: kamezawa.hiroyu, nishimura, menage, xemul, linux-mm, lizf

On Wed, 24 Jun 2009 22:35:16 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Hi, All,
> 
> I've been experimenting with reduction of resource counter locking
> overhead. My benchmarks show a marginal improvement, /proc/lock_stat
> however shows that the lock contention time and held time reduce
> by quite an amount after this patch. 

That looks sane.

> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                               class name    con-bounces    contentions
> waittime-min   waittime-max waittime-total    acq-bounces
> acquisitions   holdtime-min   holdtime-max holdtime-total
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>                           &counter->lock:       1534627        1575341
> 0.57          18.39      675713.23       43330446      138524248
> 0.43         148.13    54133607.05
>                           --------------
>                           &counter->lock         809559
> [<ffffffff810810c5>] res_counter_charge+0x3f/0xed
>                           &counter->lock         765782
> [<ffffffff81081045>] res_counter_uncharge+0x2c/0x6d
>                           --------------
>                           &counter->lock         653284
> [<ffffffff81081045>] res_counter_uncharge+0x2c/0x6d
>                           &counter->lock         922057
> [<ffffffff810810c5>] res_counter_charge+0x3f/0xed

Please turn off the wordwrapping before sending the signed-off version.

>  static inline bool res_counter_check_under_limit(struct res_counter *cnt)
>  {
>  	bool ret;
> -	unsigned long flags;
> +	unsigned long flags, seq;
>  
> -	spin_lock_irqsave(&cnt->lock, flags);
> -	ret = res_counter_limit_check_locked(cnt);
> -	spin_unlock_irqrestore(&cnt->lock, flags);
> +	do {
> +		seq = read_seqbegin_irqsave(&cnt->lock, flags);
> +		ret = res_counter_limit_check_locked(cnt);
> +	} while (read_seqretry_irqrestore(&cnt->lock, seq, flags));
>  	return ret;
>  }

This change makes the inlining of these functions even more
inappropriate than it already was.

This function should be static in memcontrol.c anyway?

Which function is calling mem_cgroup_check_under_limit() so much? 
__mem_cgroup_try_charge()?  If so, I'm a bit surprised because
inefficiencies of this nature in page reclaim rarely are demonstrable -
reclaim just doesn't get called much.  Perhaps this is a sign that
reclaim is scanning the same pages over and over again and is being
inefficient at a higher level?

Do we really need to call mem_cgroup_hierarchical_reclaim() as
frequently as we apparently are doing?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-24 23:10 ` Andrew Morton
@ 2009-06-24 23:53   ` KAMEZAWA Hiroyuki
  2009-06-25  3:27     ` Balbir Singh
  2009-06-25  3:04   ` Balbir Singh
  1 sibling, 1 reply; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-24 23:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: balbir, nishimura, menage, xemul, linux-mm, lizf

On Wed, 24 Jun 2009 16:10:28 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 24 Jun 2009 22:35:16 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > Hi, All,
> > 
> > I've been experimenting with reduction of resource counter locking
> > overhead. My benchmarks show a marginal improvement, /proc/lock_stat
> > however shows that the lock contention time and held time reduce
> > by quite an amount after this patch. 
> 
> That looks sane.
> 
I suprized to see seq_lock here can reduce the overhead.


> > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >                               class name    con-bounces    contentions
> > waittime-min   waittime-max waittime-total    acq-bounces
> > acquisitions   holdtime-min   holdtime-max holdtime-total
> > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > 
> >                           &counter->lock:       1534627        1575341
> > 0.57          18.39      675713.23       43330446      138524248
> > 0.43         148.13    54133607.05
> >                           --------------
> >                           &counter->lock         809559
> > [<ffffffff810810c5>] res_counter_charge+0x3f/0xed
> >                           &counter->lock         765782
> > [<ffffffff81081045>] res_counter_uncharge+0x2c/0x6d
> >                           --------------
> >                           &counter->lock         653284
> > [<ffffffff81081045>] res_counter_uncharge+0x2c/0x6d
> >                           &counter->lock         922057
> > [<ffffffff810810c5>] res_counter_charge+0x3f/0xed
> 
> Please turn off the wordwrapping before sending the signed-off version.
> 
> >  static inline bool res_counter_check_under_limit(struct res_counter *cnt)
> >  {
> >  	bool ret;
> > -	unsigned long flags;
> > +	unsigned long flags, seq;
> >  
> > -	spin_lock_irqsave(&cnt->lock, flags);
> > -	ret = res_counter_limit_check_locked(cnt);
> > -	spin_unlock_irqrestore(&cnt->lock, flags);
> > +	do {
> > +		seq = read_seqbegin_irqsave(&cnt->lock, flags);
> > +		ret = res_counter_limit_check_locked(cnt);
> > +	} while (read_seqretry_irqrestore(&cnt->lock, seq, flags));
> >  	return ret;
> >  }
> 
> This change makes the inlining of these functions even more
> inappropriate than it already was.
> 
> This function should be static in memcontrol.c anyway?
> 
> Which function is calling mem_cgroup_check_under_limit() so much? 
> __mem_cgroup_try_charge()?  If so, I'm a bit surprised because
> inefficiencies of this nature in page reclaim rarely are demonstrable -
> reclaim just doesn't get called much.  Perhaps this is a sign that
> reclaim is scanning the same pages over and over again and is being
> inefficient at a higher level?
> 
> Do we really need to call mem_cgroup_hierarchical_reclaim() as
> frequently as we apparently are doing?
> 

Most of modification to res_counter is
	- charge
	- uncharge
and not
	- read

What kind of workload can be much improved ?
IIUC, in general, using seq_lock to frequently modified counter just makes
it slow.

Could you show improved kernbench or unixbench score ?

Thanks,
-Kame







--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-24 23:53   ` KAMEZAWA Hiroyuki
@ 2009-06-25  3:27     ` Balbir Singh
  2009-06-25  3:44       ` Andrew Morton
  2009-06-25  4:37       ` KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 14+ messages in thread
From: Balbir Singh @ 2009-06-25  3:27 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Andrew Morton, nishimura, menage, xemul, linux-mm, lizf

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-06-25 08:53:47]:

> On Wed, 24 Jun 2009 16:10:28 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Wed, 24 Jun 2009 22:35:16 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > Hi, All,
> > > 
> > > I've been experimenting with reduction of resource counter locking
> > > overhead. My benchmarks show a marginal improvement, /proc/lock_stat
> > > however shows that the lock contention time and held time reduce
> > > by quite an amount after this patch. 
> > 
> > That looks sane.
> > 
> I suprized to see seq_lock here can reduce the overhead.
>

I am not too surprised, given that we do frequent read-writes. We do a
read everytime before we charge.
 
> 
> > > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > >                               class name    con-bounces    contentions
> > > waittime-min   waittime-max waittime-total    acq-bounces
> > > acquisitions   holdtime-min   holdtime-max holdtime-total
> > > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > 
> > >                           &counter->lock:       1534627        1575341
> > > 0.57          18.39      675713.23       43330446      138524248
> > > 0.43         148.13    54133607.05
> > >                           --------------
> > >                           &counter->lock         809559
> > > [<ffffffff810810c5>] res_counter_charge+0x3f/0xed
> > >                           &counter->lock         765782
> > > [<ffffffff81081045>] res_counter_uncharge+0x2c/0x6d
> > >                           --------------
> > >                           &counter->lock         653284
> > > [<ffffffff81081045>] res_counter_uncharge+0x2c/0x6d
> > >                           &counter->lock         922057
> > > [<ffffffff810810c5>] res_counter_charge+0x3f/0xed
> > 
> > Please turn off the wordwrapping before sending the signed-off version.
> > 
> > >  static inline bool res_counter_check_under_limit(struct res_counter *cnt)
> > >  {
> > >  	bool ret;
> > > -	unsigned long flags;
> > > +	unsigned long flags, seq;
> > >  
> > > -	spin_lock_irqsave(&cnt->lock, flags);
> > > -	ret = res_counter_limit_check_locked(cnt);
> > > -	spin_unlock_irqrestore(&cnt->lock, flags);
> > > +	do {
> > > +		seq = read_seqbegin_irqsave(&cnt->lock, flags);
> > > +		ret = res_counter_limit_check_locked(cnt);
> > > +	} while (read_seqretry_irqrestore(&cnt->lock, seq, flags));
> > >  	return ret;
> > >  }
> > 
> > This change makes the inlining of these functions even more
> > inappropriate than it already was.
> > 
> > This function should be static in memcontrol.c anyway?
> > 
> > Which function is calling mem_cgroup_check_under_limit() so much? 
> > __mem_cgroup_try_charge()?  If so, I'm a bit surprised because
> > inefficiencies of this nature in page reclaim rarely are demonstrable -
> > reclaim just doesn't get called much.  Perhaps this is a sign that
> > reclaim is scanning the same pages over and over again and is being
> > inefficient at a higher level?
> > 
> > Do we really need to call mem_cgroup_hierarchical_reclaim() as
> > frequently as we apparently are doing?
> > 
> 
> Most of modification to res_counter is
> 	- charge
> 	- uncharge
> and not
> 	- read
> 
> What kind of workload can be much improved ?
> IIUC, in general, using seq_lock to frequently modified counter just makes
> it slow.

Why do you think so? I've been looking primarily at do_gettimeofday().
Yes, frequent updates can hurt readers in the worst case. I've been
meaning to experiment with percpu counters as well, but we'll need to
decide what is the tolerance limit, since we can have a batch value
fuzziness, before all CPUs see that the limit is exceeded, but it
might be worth experimenting.

> 
> Could you show improved kernbench or unixbench score ?
> 

I'll start some of these and see if I can get a large machine to test
on. I ran reaim for the current run.


-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-25  3:27     ` Balbir Singh
@ 2009-06-25  3:44       ` Andrew Morton
  2009-06-25  4:39         ` KAMEZAWA Hiroyuki
  2009-06-25  5:01         ` Balbir Singh
  2009-06-25  4:37       ` KAMEZAWA Hiroyuki
  1 sibling, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2009-06-25  3:44 UTC (permalink / raw)
  To: balbir; +Cc: KAMEZAWA Hiroyuki, nishimura, menage, xemul, linux-mm, lizf

On Thu, 25 Jun 2009 08:57:17 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> We do a read everytime before we charge.

See, a good way to fix that is to not do it.  Instead of

	if (under_limit())
		charge_some_more(amount);
	else
		goto fail;

one can do 

	if (try_to_charge_some_more(amount) < 0)
		goto fail;

which will halve the locking frequency.  Which may not be as beneficial
as avoiding the locking altogether on the read side, dunno.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-25  3:44       ` Andrew Morton
@ 2009-06-25  4:39         ` KAMEZAWA Hiroyuki
  2009-06-25  5:40           ` Balbir Singh
  2009-06-25  5:01         ` Balbir Singh
  1 sibling, 1 reply; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-25  4:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: balbir, nishimura, menage, xemul, linux-mm, lizf

On Wed, 24 Jun 2009 20:44:26 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 25 Jun 2009 08:57:17 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > We do a read everytime before we charge.
> 
> See, a good way to fix that is to not do it.  Instead of
> 
> 	if (under_limit())
> 		charge_some_more(amount);
> 	else
> 		goto fail;
> 
> one can do 
> 
> 	if (try_to_charge_some_more(amount) < 0)
> 		goto fail;
> 
> which will halve the locking frequency.  Which may not be as beneficial
> as avoiding the locking altogether on the read side, dunno.
> 
I don't think we do read-before-write ;)

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-25  4:39         ` KAMEZAWA Hiroyuki
@ 2009-06-25  5:40           ` Balbir Singh
  2009-06-25  6:30             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 14+ messages in thread
From: Balbir Singh @ 2009-06-25  5:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Andrew Morton, nishimura, menage, xemul, linux-mm, lizf

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-06-25 13:39:08]:

> On Wed, 24 Jun 2009 20:44:26 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Thu, 25 Jun 2009 08:57:17 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > We do a read everytime before we charge.
> > 
> > See, a good way to fix that is to not do it.  Instead of
> > 
> > 	if (under_limit())
> > 		charge_some_more(amount);
> > 	else
> > 		goto fail;
> > 
> > one can do 
> > 
> > 	if (try_to_charge_some_more(amount) < 0)
> > 		goto fail;
> > 
> > which will halve the locking frequency.  Which may not be as beneficial
> > as avoiding the locking altogether on the read side, dunno.
> > 
> I don't think we do read-before-write ;)
>

I need to figure out the reason for read contention and why seqlock's
help. Like I said before I am seeing some strange values for
reclaim_stats on the root cgroup, even though it is not reclaimable or
not used for reclaim. There can be two reasons

1. Reclaim
2. User space constantly reading the counters

I have no user space utilities I am aware of running on the system,
constantly reading the contents of the files. 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-25  5:40           ` Balbir Singh
@ 2009-06-25  6:30             ` KAMEZAWA Hiroyuki
  2009-06-25 16:16               ` Balbir Singh
  0 siblings, 1 reply; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-25  6:30 UTC (permalink / raw)
  To: balbir; +Cc: Andrew Morton, nishimura, menage, xemul, linux-mm, lizf

On Thu, 25 Jun 2009 11:10:42 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-06-25 13:39:08]:
> 
> > On Wed, 24 Jun 2009 20:44:26 -0700
> > Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > On Thu, 25 Jun 2009 08:57:17 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > 
> > > > We do a read everytime before we charge.
> > > 
> > > See, a good way to fix that is to not do it.  Instead of
> > > 
> > > 	if (under_limit())
> > > 		charge_some_more(amount);
> > > 	else
> > > 		goto fail;
> > > 
> > > one can do 
> > > 
> > > 	if (try_to_charge_some_more(amount) < 0)
> > > 		goto fail;
> > > 
> > > which will halve the locking frequency.  Which may not be as beneficial
> > > as avoiding the locking altogether on the read side, dunno.
> > > 
> > I don't think we do read-before-write ;)
> >
> 
> I need to figure out the reason for read contention and why seqlock's
> help. Like I said before I am seeing some strange values for
> reclaim_stats on the root cgroup, even though it is not reclaimable or
> not used for reclaim. There can be two reasons
> 
I don't remember but reclaim_stat goes bad ? new BUG ?
reclaim_stat means zone_recaim_stat gotten by get_reclaim_stat() ?

IIUC, after your ROOT_CGROUP-no-LRU patch, reclaim_stat of root cgroup
will never be accessed. Right ?


> 1. Reclaim
> 2. User space constantly reading the counters
> 
> I have no user space utilities I am aware of running on the system,
> constantly reading the contents of the files. 
> 

This is from your result.

                    Before                 After
class name       &counter->lock:   &(&counter->lock)->lock
con-bounces      1534627                   962193  
contentions      1575341                   976349 
waittime-min     0.57                      0.60  
waittime-max     18.39                     14.07
waittime-total   675713.23                 465926.04 
acq-bounces      43330446                  21364165
acquisitions     138524248                 66041988
holdtime-min     0.43                      0.45
holdtime-max     148.13                    88.31
holdtime-total   54133607.05               25395513.12

>From this result, acquisitions is changed as
 - 138524248 => 66041988
Almost half.

Then,
- "read" should be half of all counter access.
or
- did you enabped swap cgroup in "after" test ?

BTW, if this result is against "Root" cgroup, no reclaim by memcg
will happen after your no-ROOT-LRU patch.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-25  6:30             ` KAMEZAWA Hiroyuki
@ 2009-06-25 16:16               ` Balbir Singh
  0 siblings, 0 replies; 14+ messages in thread
From: Balbir Singh @ 2009-06-25 16:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Andrew Morton, nishimura, menage, xemul, linux-mm, lizf

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-06-25 15:30:33]:

> On Thu, 25 Jun 2009 11:10:42 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-06-25 13:39:08]:
> > 
> > > On Wed, 24 Jun 2009 20:44:26 -0700
> > > Andrew Morton <akpm@linux-foundation.org> wrote:
> > > 
> > > > On Thu, 25 Jun 2009 08:57:17 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > 
> > > > > We do a read everytime before we charge.
> > > > 
> > > > See, a good way to fix that is to not do it.  Instead of
> > > > 
> > > > 	if (under_limit())
> > > > 		charge_some_more(amount);
> > > > 	else
> > > > 		goto fail;
> > > > 
> > > > one can do 
> > > > 
> > > > 	if (try_to_charge_some_more(amount) < 0)
> > > > 		goto fail;
> > > > 
> > > > which will halve the locking frequency.  Which may not be as beneficial
> > > > as avoiding the locking altogether on the read side, dunno.
> > > > 
> > > I don't think we do read-before-write ;)
> > >
> > 
> > I need to figure out the reason for read contention and why seqlock's
> > help. Like I said before I am seeing some strange values for
> > reclaim_stats on the root cgroup, even though it is not reclaimable or
> > not used for reclaim. There can be two reasons
> > 
> I don't remember but reclaim_stat goes bad ? new BUG ?
> reclaim_stat means zone_recaim_stat gotten by get_reclaim_stat() ?
> 
> IIUC, after your ROOT_CGROUP-no-LRU patch, reclaim_stat of root cgroup
> will never be accessed. Right ?
>

Correct!
 
> 
> > 1. Reclaim
> > 2. User space constantly reading the counters
> > 
> > I have no user space utilities I am aware of running on the system,
> > constantly reading the contents of the files. 
> > 
> 
> This is from your result.
> 
>                     Before                 After
> class name       &counter->lock:   &(&counter->lock)->lock
> con-bounces      1534627                   962193  
> contentions      1575341                   976349 
> waittime-min     0.57                      0.60  
> waittime-max     18.39                     14.07
> waittime-total   675713.23                 465926.04 
> acq-bounces      43330446                  21364165
> acquisitions     138524248                 66041988
> holdtime-min     0.43                      0.45
> holdtime-max     148.13                    88.31
> holdtime-total   54133607.05               25395513.12
> 
> >From this result, acquisitions is changed as
>  - 138524248 => 66041988
> Almost half.
> 

Yes, precisely! That is why I thought it was a great result.

> Then,
> - "read" should be half of all counter access.
> or
> - did you enabped swap cgroup in "after" test ?
> 
> BTW, if this result is against "Root" cgroup, no reclaim by memcg
> will happen after your no-ROOT-LRU patch.
>

The configuration was the same for both runs. I'll rerun and see why
that is.
 
-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-25  3:44       ` Andrew Morton
  2009-06-25  4:39         ` KAMEZAWA Hiroyuki
@ 2009-06-25  5:01         ` Balbir Singh
  1 sibling, 0 replies; 14+ messages in thread
From: Balbir Singh @ 2009-06-25  5:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KAMEZAWA Hiroyuki, nishimura, menage, xemul, linux-mm, lizf

* Andrew Morton <akpm@linux-foundation.org> [2009-06-24 20:44:26]:

> On Thu, 25 Jun 2009 08:57:17 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > We do a read everytime before we charge.
> 
> See, a good way to fix that is to not do it.  Instead of
> 
> 	if (under_limit())
> 		charge_some_more(amount);
> 	else
> 		goto fail;
> 
> one can do 
> 
> 	if (try_to_charge_some_more(amount) < 0)
> 		goto fail;
> 
> which will halve the locking frequency.  Which may not be as beneficial
> as avoiding the locking altogether on the read side, dunno.
>

My bad, we do it all under one lock. We do a read within the charge
lock. I should get some Tea or coffee before responding to emails in
the morning. 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-25  3:27     ` Balbir Singh
  2009-06-25  3:44       ` Andrew Morton
@ 2009-06-25  4:37       ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-25  4:37 UTC (permalink / raw)
  To: balbir; +Cc: Andrew Morton, nishimura, menage, xemul, linux-mm, lizf

On Thu, 25 Jun 2009 08:57:17 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > What kind of workload can be much improved ?
> > IIUC, in general, using seq_lock to frequently modified counter just makes
> > it slow.
> 
> Why do you think so? I've been looking primarily at do_gettimeofday().
IIUC, modification to xtime is _not_ frequent.

> Yes, frequent updates can hurt readers in the worst case. 
You don't understand my point. write-side of seqlock itself is
heavy. I have no interests in read-side.

What need to be faster is here.
==
 929         while (1) {
 930                 int ret;
 931                 bool noswap = false;
 932 
 933                 ret = res_counter_charge(&mem->res, PAGE_SIZE, &fail_res);
 934                 if (likely(!ret)) {
 935                         if (!do_swap_account)
 936                                 break;
 937                         ret = res_counter_charge(&mem->memsw, PAGE_SIZE,
 938                                                         &fail_res);
 939                         if (likely(!ret))
 940                                 break;
 941                         /* mem+swap counter fails */
 942                         res_counter_uncharge(&mem->res, PAGE_SIZE);
 943                         noswap = true;
 944                         mem_over_limit = mem_cgroup_from_res_counter(fail_res,
 945                                                                         memsw);
 946                 } else
 947                         /* mem counter fails */
 948                         mem_over_limit = mem_cgroup_from_res_counter(fail_res,
 949                                              
==
And using seq_lock will add more overheads to here.

> I've been
> meaning to experiment with percpu counters as well, but we'll need to
> decide what is the tolerance limit, since we can have a batch value
> fuzziness, before all CPUs see that the limit is exceeded, but it
> might be worth experimenting.
> 

per-cpu counter is a choice. but "batch" value is very difficult if
we never allow "exceeds". And if # of bactch is too small, percpu
counter is slower than current one.
And if hierarchy is used, jitter by batch will be very big in parent nodes.



Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-24 23:10 ` Andrew Morton
  2009-06-24 23:53   ` KAMEZAWA Hiroyuki
@ 2009-06-25  3:04   ` Balbir Singh
  2009-06-25  3:40     ` Andrew Morton
  1 sibling, 1 reply; 14+ messages in thread
From: Balbir Singh @ 2009-06-25  3:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: kamezawa.hiroyu, nishimura, menage, xemul, linux-mm, lizf

* Andrew Morton <akpm@linux-foundation.org> [2009-06-24 16:10:28]:

> On Wed, 24 Jun 2009 22:35:16 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > Hi, All,
> > 
> > I've been experimenting with reduction of resource counter locking
> > overhead. My benchmarks show a marginal improvement, /proc/lock_stat
> > however shows that the lock contention time and held time reduce
> > by quite an amount after this patch. 
> 
> That looks sane.
> 
> > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >                               class name    con-bounces    contentions
> > waittime-min   waittime-max waittime-total    acq-bounces
> > acquisitions   holdtime-min   holdtime-max holdtime-total
> > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > 
> >                           &counter->lock:       1534627        1575341
> > 0.57          18.39      675713.23       43330446      138524248
> > 0.43         148.13    54133607.05
> >                           --------------
> >                           &counter->lock         809559
> > [<ffffffff810810c5>] res_counter_charge+0x3f/0xed
> >                           &counter->lock         765782
> > [<ffffffff81081045>] res_counter_uncharge+0x2c/0x6d
> >                           --------------
> >                           &counter->lock         653284
> > [<ffffffff81081045>] res_counter_uncharge+0x2c/0x6d
> >                           &counter->lock         922057
> > [<ffffffff810810c5>] res_counter_charge+0x3f/0xed
> 
> Please turn off the wordwrapping before sending the signed-off version.
>

I'll need to see what caused the problem here. Thanks for the heads-up
 
> >  static inline bool res_counter_check_under_limit(struct res_counter *cnt)
> >  {
> >  	bool ret;
> > -	unsigned long flags;
> > +	unsigned long flags, seq;
> >  
> > -	spin_lock_irqsave(&cnt->lock, flags);
> > -	ret = res_counter_limit_check_locked(cnt);
> > -	spin_unlock_irqrestore(&cnt->lock, flags);
> > +	do {
> > +		seq = read_seqbegin_irqsave(&cnt->lock, flags);
> > +		ret = res_counter_limit_check_locked(cnt);
> > +	} while (read_seqretry_irqrestore(&cnt->lock, seq, flags));
> >  	return ret;
> >  }
> 
> This change makes the inlining of these functions even more
> inappropriate than it already was.
> 
> This function should be static in memcontrol.c anyway?

We wanted to modularize resource counters and keep the code isolated
from memcontrol.c, hence it continues to live outside

> 
> Which function is calling mem_cgroup_check_under_limit() so much?
> __mem_cgroup_try_charge()?  If so, I'm a bit surprised because
> inefficiencies of this nature in page reclaim rarely are demonstrable -
> reclaim just doesn't get called much.  Perhaps this is a sign that
> reclaim is scanning the same pages over and over again and is being
> inefficient at a higher level?
> 

We do a check everytime before we charge. To answer the other part of
reclaim, I am currently seeing some interesting data, even with no
groups created, I see memcg reclaim_stats set to root to be quite
high, even though we are not reclaiming from root.
I am yet to get to the root cause of the issue


> Do we really need to call mem_cgroup_hierarchical_reclaim() as
> frequently as we apparently are doing?
>

All our reclaim is now hierarchical, was there anything specific you
saw? 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Reduce the resource counter lock overhead
  2009-06-25  3:04   ` Balbir Singh
@ 2009-06-25  3:40     ` Andrew Morton
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2009-06-25  3:40 UTC (permalink / raw)
  To: balbir; +Cc: kamezawa.hiroyu, nishimura, menage, xemul, linux-mm, lizf

On Thu, 25 Jun 2009 08:34:46 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * Andrew Morton <akpm@linux-foundation.org> [2009-06-24 16:10:28]:
> 
> > On Wed, 24 Jun 2009 22:35:16 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>  
> ...
>
> > >  static inline bool res_counter_check_under_limit(struct res_counter *cnt)
> > >  {
> > >  	bool ret;
> > > -	unsigned long flags;
> > > +	unsigned long flags, seq;
> > >  
> > > -	spin_lock_irqsave(&cnt->lock, flags);
> > > -	ret = res_counter_limit_check_locked(cnt);
> > > -	spin_unlock_irqrestore(&cnt->lock, flags);
> > > +	do {
> > > +		seq = read_seqbegin_irqsave(&cnt->lock, flags);
> > > +		ret = res_counter_limit_check_locked(cnt);
> > > +	} while (read_seqretry_irqrestore(&cnt->lock, seq, flags));
> > >  	return ret;
> > >  }
> > 
> > This change makes the inlining of these functions even more
> > inappropriate than it already was.
> > 
> > This function should be static in memcontrol.c anyway?
> 
> We wanted to modularize resource counters and keep the code isolated
> from memcontrol.c, hence it continues to live outside

That doesn't mean that is has to be inlined.  That function is really
really big, especially with lockdep enabled.

> > 
> > Which function is calling mem_cgroup_check_under_limit() so much?
> > __mem_cgroup_try_charge()?  If so, I'm a bit surprised because
> > inefficiencies of this nature in page reclaim rarely are demonstrable -
> > reclaim just doesn't get called much.  Perhaps this is a sign that
> > reclaim is scanning the same pages over and over again and is being
> > inefficient at a higher level?
> > 
> 
> We do a check everytime before we charge. To answer the other part of
> reclaim, I am currently seeing some interesting data, even with no
> groups created, I see memcg reclaim_stats set to root to be quite
> high, even though we are not reclaiming from root.
> I am yet to get to the root cause of the issue
> 
> 
> > Do we really need to call mem_cgroup_hierarchical_reclaim() as
> > frequently as we apparently are doing?
> >
> 
> All our reclaim is now hierarchical, was there anything specific you
> saw? 

My point is that when one sees a function high in the profiles,
speeding up that function isn't the only fix.  Another (often superior)
fix is to call that function less frequently.  Or perhaps to cache its
result in some fashion.

Have you established that this function is being called at the minimum
possible frequency?  Is the frequency at which it being called
reasonable and expected?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-06-25 19:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-24 17:05 [RFC] Reduce the resource counter lock overhead Balbir Singh
2009-06-24 19:40 ` Paul Menage
2009-06-24 23:10 ` Andrew Morton
2009-06-24 23:53   ` KAMEZAWA Hiroyuki
2009-06-25  3:27     ` Balbir Singh
2009-06-25  3:44       ` Andrew Morton
2009-06-25  4:39         ` KAMEZAWA Hiroyuki
2009-06-25  5:40           ` Balbir Singh
2009-06-25  6:30             ` KAMEZAWA Hiroyuki
2009-06-25 16:16               ` Balbir Singh
2009-06-25  5:01         ` Balbir Singh
2009-06-25  4:37       ` KAMEZAWA Hiroyuki
2009-06-25  3:04   ` Balbir Singh
2009-06-25  3:40     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox