[PATCH] memcg: make mem_cgroup_read

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] memcg: make mem_cgroup_read_stat() unsigned
@ 2015-09-22 22:16 Greg Thelen
  2015-09-22 22:24 ` Andrew Morton
  2015-09-25 15:25 ` Michal Hocko
  0 siblings, 2 replies; 8+ messages in thread
From: Greg Thelen @ 2015-09-22 22:16 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner, Michal Hocko
  Cc: cgroups, linux-mm, linux-kernel, Greg Thelen

mem_cgroup_read_stat() returns a page count by summing per cpu page
counters.  The summing is racy wrt. updates, so a transient negative sum
is possible.  Callers don't want negative values:
- mem_cgroup_wb_stats() doesn't want negative nr_dirty or nr_writeback.
- oom reports and memory.stat shouldn't show confusing negative usage.
- tree_usage() already avoids negatives.

Avoid returning negative page counts from mem_cgroup_read_stat() and
convert it to unsigned.

Signed-off-by: Greg Thelen <gthelen@google.com>
---
 mm/memcontrol.c | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6ddaeba34e09..2633e9be4a99 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -644,12 +644,14 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
 }
 
 /*
+ * Return page count for single (non recursive) @memcg.
+ *
  * Implementation Note: reading percpu statistics for memcg.
  *
  * Both of vmstat[] and percpu_counter has threshold and do periodic
  * synchronization to implement "quick" read. There are trade-off between
  * reading cost and precision of value. Then, we may have a chance to implement
- * a periodic synchronizion of counter in memcg's counter.
+ * a periodic synchronization of counter in memcg's counter.
  *
  * But this _read() function is used for user interface now. The user accounts
  * memory usage by memory cgroup and he _always_ requires exact value because
@@ -659,17 +661,24 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
  *
  * If there are kernel internal actions which can make use of some not-exact
  * value, and reading all cpu value can be performance bottleneck in some
- * common workload, threashold and synchonization as vmstat[] should be
+ * common workload, threashold and synchronization as vmstat[] should be
  * implemented.
  */
-static long mem_cgroup_read_stat(struct mem_cgroup *memcg,
-				 enum mem_cgroup_stat_index idx)
+static unsigned long
+mem_cgroup_read_stat(struct mem_cgroup *memcg, enum mem_cgroup_stat_index idx)
 {
 	long val = 0;
 	int cpu;
 
+	/* Per-cpu values can be negative, use a signed accumulator */
 	for_each_possible_cpu(cpu)
 		val += per_cpu(memcg->stat->count[idx], cpu);
+	/*
+	 * Summing races with updates, so val may be negative.  Avoid exposing
+	 * transient negative values.
+	 */
+	if (val < 0)
+		val = 0;
 	return val;
 }
 
@@ -1254,7 +1263,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
 			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
 				continue;
-			pr_cont(" %s:%ldKB", mem_cgroup_stat_names[i],
+			pr_cont(" %s:%luKB", mem_cgroup_stat_names[i],
 				K(mem_cgroup_read_stat(iter, i)));
 		}
 
@@ -2819,14 +2828,11 @@ static unsigned long tree_stat(struct mem_cgroup *memcg,
 			       enum mem_cgroup_stat_index idx)
 {
 	struct mem_cgroup *iter;
-	long val = 0;
+	unsigned long val = 0;
 
-	/* Per-cpu values can be negative, use a signed accumulator */
 	for_each_mem_cgroup_tree(iter, memcg)
 		val += mem_cgroup_read_stat(iter, idx);
 
-	if (val < 0) /* race ? */
-		val = 0;
 	return val;
 }
 
@@ -3169,7 +3175,7 @@ static int memcg_stat_show(struct seq_file *m, void *v)
 	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
 		if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
 			continue;
-		seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
+		seq_printf(m, "%s %lu\n", mem_cgroup_stat_names[i],
 			   mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
 	}
 
@@ -3194,13 +3200,13 @@ static int memcg_stat_show(struct seq_file *m, void *v)
 			   (u64)memsw * PAGE_SIZE);
 
 	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
-		long long val = 0;
+		unsigned long long val = 0;
 
 		if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
 			continue;
 		for_each_mem_cgroup_tree(mi, memcg)
 			val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
-		seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
+		seq_printf(m, "total_%s %llu\n", mem_cgroup_stat_names[i], val);
 	}
 
 	for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
-- 
2.6.0.rc0.131.gf624c3d

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memcg: make mem_cgroup_read_stat() unsigned
  2015-09-22 22:16 [PATCH] memcg: make mem_cgroup_read_stat() unsigned Greg Thelen
@ 2015-09-22 22:24 ` Andrew Morton
  2015-09-25 15:25 ` Michal Hocko
  1 sibling, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2015-09-22 22:24 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Johannes Weiner, Michal Hocko, cgroups, linux-mm, linux-kernel

On Tue, 22 Sep 2015 15:16:32 -0700 Greg Thelen <gthelen@google.com> wrote:

> mem_cgroup_read_stat() returns a page count by summing per cpu page
> counters.  The summing is racy wrt. updates, so a transient negative sum
> is possible.  Callers don't want negative values:
> - mem_cgroup_wb_stats() doesn't want negative nr_dirty or nr_writeback.
> - oom reports and memory.stat shouldn't show confusing negative usage.
> - tree_usage() already avoids negatives.
> 
> Avoid returning negative page counts from mem_cgroup_read_stat() and
> convert it to unsigned.

Someone please remind me why this code doesn't use the existing
percpu_counter library which solved this problem years ago.

>  	for_each_possible_cpu(cpu)

and which doesn't iterate across offlined CPUs.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memcg: make mem_cgroup_read_stat() unsigned
  2015-09-22 22:16 [PATCH] memcg: make mem_cgroup_read_stat() unsigned Greg Thelen
  2015-09-22 22:24 ` Andrew Morton
@ 2015-09-25 15:25 ` Michal Hocko
  2015-09-25 16:17   ` Greg Thelen
  1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2015-09-25 15:25 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Johannes Weiner, cgroups, linux-mm, linux-kernel

On Tue 22-09-15 15:16:32, Greg Thelen wrote:
> mem_cgroup_read_stat() returns a page count by summing per cpu page
> counters.  The summing is racy wrt. updates, so a transient negative sum
> is possible.  Callers don't want negative values:
> - mem_cgroup_wb_stats() doesn't want negative nr_dirty or nr_writeback.

OK, this can confuse dirty throttling AFAIU

> - oom reports and memory.stat shouldn't show confusing negative usage.

I guess this is not earth shattering.

> - tree_usage() already avoids negatives.
> 
> Avoid returning negative page counts from mem_cgroup_read_stat() and
> convert it to unsigned.
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>

I guess we want that for stable 4.2 because of the dirty throttling
part. Longterm we should use generic per-cpu counter.

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/memcontrol.c | 30 ++++++++++++++++++------------
>  1 file changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6ddaeba34e09..2633e9be4a99 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -644,12 +644,14 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
>  }
>  
>  /*
> + * Return page count for single (non recursive) @memcg.
> + *
>   * Implementation Note: reading percpu statistics for memcg.
>   *
>   * Both of vmstat[] and percpu_counter has threshold and do periodic
>   * synchronization to implement "quick" read. There are trade-off between
>   * reading cost and precision of value. Then, we may have a chance to implement
> - * a periodic synchronizion of counter in memcg's counter.
> + * a periodic synchronization of counter in memcg's counter.
>   *
>   * But this _read() function is used for user interface now. The user accounts
>   * memory usage by memory cgroup and he _always_ requires exact value because
> @@ -659,17 +661,24 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
>   *
>   * If there are kernel internal actions which can make use of some not-exact
>   * value, and reading all cpu value can be performance bottleneck in some
> - * common workload, threashold and synchonization as vmstat[] should be
> + * common workload, threashold and synchronization as vmstat[] should be
>   * implemented.
>   */
> -static long mem_cgroup_read_stat(struct mem_cgroup *memcg,
> -				 enum mem_cgroup_stat_index idx)
> +static unsigned long
> +mem_cgroup_read_stat(struct mem_cgroup *memcg, enum mem_cgroup_stat_index idx)
>  {
>  	long val = 0;
>  	int cpu;
>  
> +	/* Per-cpu values can be negative, use a signed accumulator */
>  	for_each_possible_cpu(cpu)
>  		val += per_cpu(memcg->stat->count[idx], cpu);
> +	/*
> +	 * Summing races with updates, so val may be negative.  Avoid exposing
> +	 * transient negative values.
> +	 */
> +	if (val < 0)
> +		val = 0;
>  	return val;
>  }
>  
> @@ -1254,7 +1263,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
>  		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
>  			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
>  				continue;
> -			pr_cont(" %s:%ldKB", mem_cgroup_stat_names[i],
> +			pr_cont(" %s:%luKB", mem_cgroup_stat_names[i],
>  				K(mem_cgroup_read_stat(iter, i)));
>  		}
>  
> @@ -2819,14 +2828,11 @@ static unsigned long tree_stat(struct mem_cgroup *memcg,
>  			       enum mem_cgroup_stat_index idx)
>  {
>  	struct mem_cgroup *iter;
> -	long val = 0;
> +	unsigned long val = 0;
>  
> -	/* Per-cpu values can be negative, use a signed accumulator */
>  	for_each_mem_cgroup_tree(iter, memcg)
>  		val += mem_cgroup_read_stat(iter, idx);
>  
> -	if (val < 0) /* race ? */
> -		val = 0;
>  	return val;
>  }
>  
> @@ -3169,7 +3175,7 @@ static int memcg_stat_show(struct seq_file *m, void *v)
>  	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
>  		if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
>  			continue;
> -		seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
> +		seq_printf(m, "%s %lu\n", mem_cgroup_stat_names[i],
>  			   mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
>  	}
>  
> @@ -3194,13 +3200,13 @@ static int memcg_stat_show(struct seq_file *m, void *v)
>  			   (u64)memsw * PAGE_SIZE);
>  
>  	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> -		long long val = 0;
> +		unsigned long long val = 0;
>  
>  		if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
>  			continue;
>  		for_each_mem_cgroup_tree(mi, memcg)
>  			val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
> -		seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
> +		seq_printf(m, "total_%s %llu\n", mem_cgroup_stat_names[i], val);
>  	}
>  
>  	for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
> -- 
> 2.6.0.rc0.131.gf624c3d

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memcg: make mem_cgroup_read_stat() unsigned
  2015-09-25 15:25 ` Michal Hocko
@ 2015-09-25 16:17   ` Greg Thelen
  0 siblings, 0 replies; 8+ messages in thread
From: Greg Thelen @ 2015-09-25 16:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Johannes Weiner, cgroups, linux-mm, linux-kernel, tj

Michal Hocko wrote:

> On Tue 22-09-15 15:16:32, Greg Thelen wrote:
>> mem_cgroup_read_stat() returns a page count by summing per cpu page
>> counters.  The summing is racy wrt. updates, so a transient negative sum
>> is possible.  Callers don't want negative values:
>> - mem_cgroup_wb_stats() doesn't want negative nr_dirty or nr_writeback.
>
> OK, this can confuse dirty throttling AFAIU
>
>> - oom reports and memory.stat shouldn't show confusing negative usage.
>
> I guess this is not earth shattering.
>
>> - tree_usage() already avoids negatives.
>> 
>> Avoid returning negative page counts from mem_cgroup_read_stat() and
>> convert it to unsigned.
>> 
>> Signed-off-by: Greg Thelen <gthelen@google.com>
>
> I guess we want that for stable 4.2 because of the dirty throttling
> part. Longterm we should use generic per-cpu counter.
>
> Acked-by: Michal Hocko <mhocko@suse.com>
>
> Thanks!

Correct, this is not an earth shattering patch.  The patch only filters
out negative memcg stat values from mem_cgroup_read_stat() callers.
Negative values should only be temporary due to stat update races.  So
I'm not sure it's worth sending it to stable.  I've heard no reports of
it troubling anyone.  The worst case without this patch is that memcg
temporarily burps up a negative dirty and/or writeback count which
causes balance_dirty_pages() to sleep for a (at most) 200ms nap
(MAX_PAUSE).  Ccing Tejun in case there are more serious consequences to
balance_dirty_pages() occasionally seeing a massive (underflowed) dirty
or writeback count.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memcg: make mem_cgroup_read_stat() unsigned
  2015-09-23  7:21   ` Greg Thelen
@ 2015-09-25 15:20     ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2015-09-25 15:20 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Andrew Morton, Johannes Weiner, Cgroups, linux-mm, linux-kernel

On Wed 23-09-15 00:21:33, Greg Thelen wrote:
> 
> Andrew Morton wrote:
> 
> > On Tue, 22 Sep 2015 17:42:13 -0700 Greg Thelen <gthelen@google.com> wrote:
[...]
> >> I assume it's pretty straightforward to create generic
> >> percpu_counter_array routines which memcg could use.  Possibly something
> >> like this could be made general enough could be created to satisfy
> >> vmstat, but less clear.
> >> 
> >> [1] http://www.spinics.net/lists/cgroups/msg06216.html
> >> [2] https://lkml.org/lkml/2014/9/11/1057
> >
> > That all sounds rather bogus to me.  __percpu_counter_add() doesn't
> > modify struct percpu_counter at all except for when the cpu-local
> > counter overflows the configured batch size.  And for the memcg
> > application I suspect we can set the batch size to INT_MAX...
> 
> Nod.  The memory usage will be a bit larger, but the code reuse is
> attractive.  I dusted off Vladimir's
> https://lkml.org/lkml/2014/9/11/710.  Next step is to benchmark it
> before posting.

I am definitely in favor of using generic per-cpu counters.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memcg: make mem_cgroup_read_stat() unsigned
  2015-09-23  4:03 ` Andrew Morton
@ 2015-09-23  7:21   ` Greg Thelen
  2015-09-25 15:20     ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Greg Thelen @ 2015-09-23  7:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Cgroups, linux-mm, linux-kernel


Andrew Morton wrote:

> On Tue, 22 Sep 2015 17:42:13 -0700 Greg Thelen <gthelen@google.com> wrote:
>
>> Andrew Morton wrote:
>> 
>> > On Tue, 22 Sep 2015 15:16:32 -0700 Greg Thelen <gthelen@google.com> wrote:
>> >
>> >> mem_cgroup_read_stat() returns a page count by summing per cpu page
>> >> counters.  The summing is racy wrt. updates, so a transient negative sum
>> >> is possible.  Callers don't want negative values:
>> >> - mem_cgroup_wb_stats() doesn't want negative nr_dirty or nr_writeback.
>> >> - oom reports and memory.stat shouldn't show confusing negative usage.
>> >> - tree_usage() already avoids negatives.
>> >>
>> >> Avoid returning negative page counts from mem_cgroup_read_stat() and
>> >> convert it to unsigned.
>> >
>> > Someone please remind me why this code doesn't use the existing
>> > percpu_counter library which solved this problem years ago.
>> >
>> >>   for_each_possible_cpu(cpu)
>> >
>> > and which doesn't iterate across offlined CPUs.
>> 
>> I found [1] and [2] discussing memory layout differences between:
>> a) existing memcg hand rolled per cpu arrays of counters
>> vs
>> b) array of generic percpu_counter
>> The current approach was claimed to have lower memory overhead and
>> better cache behavior.
>> 
>> I assume it's pretty straightforward to create generic
>> percpu_counter_array routines which memcg could use.  Possibly something
>> like this could be made general enough could be created to satisfy
>> vmstat, but less clear.
>> 
>> [1] http://www.spinics.net/lists/cgroups/msg06216.html
>> [2] https://lkml.org/lkml/2014/9/11/1057
>
> That all sounds rather bogus to me.  __percpu_counter_add() doesn't
> modify struct percpu_counter at all except for when the cpu-local
> counter overflows the configured batch size.  And for the memcg
> application I suspect we can set the batch size to INT_MAX...

Nod.  The memory usage will be a bit larger, but the code reuse is
attractive.  I dusted off Vladimir's
https://lkml.org/lkml/2014/9/11/710.  Next step is to benchmark it
before posting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memcg: make mem_cgroup_read_stat() unsigned
  2015-09-23  0:42 Greg Thelen
@ 2015-09-23  4:03 ` Andrew Morton
  2015-09-23  7:21   ` Greg Thelen
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2015-09-23  4:03 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Johannes Weiner, Michal Hocko, Cgroups, linux-mm, linux-kernel

On Tue, 22 Sep 2015 17:42:13 -0700 Greg Thelen <gthelen@google.com> wrote:

> Andrew Morton wrote:
> 
> > On Tue, 22 Sep 2015 15:16:32 -0700 Greg Thelen <gthelen@google.com> wrote:
> >
> >> mem_cgroup_read_stat() returns a page count by summing per cpu page
> >> counters.  The summing is racy wrt. updates, so a transient negative sum
> >> is possible.  Callers don't want negative values:
> >> - mem_cgroup_wb_stats() doesn't want negative nr_dirty or nr_writeback.
> >> - oom reports and memory.stat shouldn't show confusing negative usage.
> >> - tree_usage() already avoids negatives.
> >>
> >> Avoid returning negative page counts from mem_cgroup_read_stat() and
> >> convert it to unsigned.
> >
> > Someone please remind me why this code doesn't use the existing
> > percpu_counter library which solved this problem years ago.
> >
> >>   for_each_possible_cpu(cpu)
> >
> > and which doesn't iterate across offlined CPUs.
> 
> I found [1] and [2] discussing memory layout differences between:
> a) existing memcg hand rolled per cpu arrays of counters
> vs
> b) array of generic percpu_counter
> The current approach was claimed to have lower memory overhead and
> better cache behavior.
> 
> I assume it's pretty straightforward to create generic
> percpu_counter_array routines which memcg could use.  Possibly something
> like this could be made general enough could be created to satisfy
> vmstat, but less clear.
> 
> [1] http://www.spinics.net/lists/cgroups/msg06216.html
> [2] https://lkml.org/lkml/2014/9/11/1057

That all sounds rather bogus to me.  __percpu_counter_add() doesn't
modify struct percpu_counter at all except for when the cpu-local
counter overflows the configured batch size.  And for the memcg
application I suspect we can set the batch size to INT_MAX...


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memcg: make mem_cgroup_read_stat() unsigned
@ 2015-09-23  0:42 Greg Thelen
  2015-09-23  4:03 ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Greg Thelen @ 2015-09-23  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Cgroups, linux-mm, linux-kernel

Andrew Morton wrote:

> On Tue, 22 Sep 2015 15:16:32 -0700 Greg Thelen <gthelen@google.com> wrote:
>
>> mem_cgroup_read_stat() returns a page count by summing per cpu page
>> counters.  The summing is racy wrt. updates, so a transient negative sum
>> is possible.  Callers don't want negative values:
>> - mem_cgroup_wb_stats() doesn't want negative nr_dirty or nr_writeback.
>> - oom reports and memory.stat shouldn't show confusing negative usage.
>> - tree_usage() already avoids negatives.
>>
>> Avoid returning negative page counts from mem_cgroup_read_stat() and
>> convert it to unsigned.
>
> Someone please remind me why this code doesn't use the existing
> percpu_counter library which solved this problem years ago.
>
>>   for_each_possible_cpu(cpu)
>
> and which doesn't iterate across offlined CPUs.

I found [1] and [2] discussing memory layout differences between:
a) existing memcg hand rolled per cpu arrays of counters
vs
b) array of generic percpu_counter
The current approach was claimed to have lower memory overhead and
better cache behavior.

I assume it's pretty straightforward to create generic
percpu_counter_array routines which memcg could use.  Possibly something
like this could be made general enough could be created to satisfy
vmstat, but less clear.

[1] http://www.spinics.net/lists/cgroups/msg06216.html
[2] https://lkml.org/lkml/2014/9/11/1057

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-09-25 16:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-22 22:16 [PATCH] memcg: make mem_cgroup_read_stat() unsigned Greg Thelen
2015-09-22 22:24 ` Andrew Morton
2015-09-25 15:25 ` Michal Hocko
2015-09-25 16:17   ` Greg Thelen
2015-09-23  0:42 Greg Thelen
2015-09-23  4:03 ` Andrew Morton
2015-09-23  7:21   ` Greg Thelen
2015-09-25 15:20     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox