* [PATCH 0/2] memcg make use of new percpu implementations
@ 2009-11-06 8:52 KAMEZAWA Hiroyuki
2009-11-06 8:54 ` [PATCH 1/2] memcg : rename index to short name KAMEZAWA Hiroyuki
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-06 8:52 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, balbir, nishimura, cl, akpm
Hi,
Recent updates on dynamic percpu allocation looks good and I tries to rewrite
memcg's poor implementation of percpu status counter.
(It's not NUMA-aware ...)
Thanks for great works.
For this time. I added Christoph to CC because I'm not fully sure my usage of
__this_cpu_xxx is correct...I'm glad if you check the usage when you have time.
Patch 1/2 is just clean up (prepare for patch 2/2)
Patch 2/2 is for percpu.
Tested on my 8cpu box and works well.
Pathcesa are against the latest mmotm.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 1/2] memcg : rename index to short name 2009-11-06 8:52 [PATCH 0/2] memcg make use of new percpu implementations KAMEZAWA Hiroyuki @ 2009-11-06 8:54 ` KAMEZAWA Hiroyuki 2009-11-06 8:55 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces KAMEZAWA Hiroyuki 2009-11-09 7:04 ` [PATCH 0/2] memcg make use of new percpu implementations Balbir Singh 2 siblings, 0 replies; 10+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-11-06 8:54 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, balbir, nishimura, cl, akpm From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Before touching percpu counters on memcg, rename status to be short name. Current too long name is my mistake, sorry. By this, a change around percpu statistics can be in a line. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/memcontrol.c | 70 ++++++++++++++++++++++++++------------------------------ 1 file changed, 33 insertions(+), 37 deletions(-) Index: mmotm-2.6.32-Nov2/mm/memcontrol.c =================================================================== --- mmotm-2.6.32-Nov2.orig/mm/memcontrol.c +++ mmotm-2.6.32-Nov2/mm/memcontrol.c @@ -65,19 +65,19 @@ enum mem_cgroup_stat_index { /* * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss. */ - MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */ - MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ - MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ - MEM_CGROUP_STAT_PGPGIN_COUNT, /* # of pages paged in */ - MEM_CGROUP_STAT_PGPGOUT_COUNT, /* # of pages paged out */ - MEM_CGROUP_STAT_EVENTS, /* sum of pagein + pageout for internal use */ - MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */ + MEMCG_NR_CACHE, /* # of pages charged as cache */ + MEMCG_NR_RSS, /* # of pages charged as anon rss */ + MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */ + MEMCG_PGPGIN, /* # of pages paged in */ + MEMCG_PGPGOUT, /* # of pages paged out */ + MEMCG_EVENTS, /* sum of pagein + pageout for internal use */ + MEMCG_NR_SWAP, /* # of pages, swapped out */ - MEM_CGROUP_STAT_NSTATS, + MEMCG_STAT_NSTATS, }; struct mem_cgroup_stat_cpu { - s64 count[MEM_CGROUP_STAT_NSTATS]; + s64 count[MEMCG_STAT_NSTATS]; } ____cacheline_aligned_in_smp; struct mem_cgroup_stat { @@ -121,8 +121,8 @@ static s64 mem_cgroup_local_usage(struct { s64 ret; - ret = mem_cgroup_read_stat(stat, MEM_CGROUP_STAT_CACHE); - ret += mem_cgroup_read_stat(stat, MEM_CGROUP_STAT_RSS); + ret = mem_cgroup_read_stat(stat, MEMCG_NR_CACHE); + ret += mem_cgroup_read_stat(stat, MEMCG_NR_RSS); return ret; } @@ -376,9 +376,9 @@ static bool mem_cgroup_soft_limit_check( cpu = get_cpu(); cpustat = &mem->stat.cpustat[cpu]; - val = __mem_cgroup_stat_read_local(cpustat, MEM_CGROUP_STAT_EVENTS); + val = __mem_cgroup_stat_read_local(cpustat, MEMCG_EVENTS); if (unlikely(val > SOFTLIMIT_EVENTS_THRESH)) { - __mem_cgroup_stat_reset_safe(cpustat, MEM_CGROUP_STAT_EVENTS); + __mem_cgroup_stat_reset_safe(cpustat, MEMCG_EVENTS); ret = true; } put_cpu(); @@ -486,7 +486,7 @@ static void mem_cgroup_swap_statistics(s int cpu = get_cpu(); cpustat = &stat->cpustat[cpu]; - __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_SWAPOUT, val); + __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_SWAP, val); put_cpu(); } @@ -501,17 +501,15 @@ static void mem_cgroup_charge_statistics cpustat = &stat->cpustat[cpu]; if (PageCgroupCache(pc)) - __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_CACHE, val); + __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_CACHE, val); else - __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_RSS, val); + __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_RSS, val); if (charge) - __mem_cgroup_stat_add_safe(cpustat, - MEM_CGROUP_STAT_PGPGIN_COUNT, 1); + __mem_cgroup_stat_add_safe(cpustat, MEMCG_PGPGIN, 1); else - __mem_cgroup_stat_add_safe(cpustat, - MEM_CGROUP_STAT_PGPGOUT_COUNT, 1); - __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_EVENTS, 1); + __mem_cgroup_stat_add_safe(cpustat, MEMCG_PGPGOUT, 1); + __mem_cgroup_stat_add_safe(cpustat, MEMCG_EVENTS, 1); put_cpu(); } @@ -1254,7 +1252,7 @@ void mem_cgroup_update_file_mapped(struc stat = &mem->stat; cpustat = &stat->cpustat[cpu]; - __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_FILE_MAPPED, val); + __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, val); done: unlock_page_cgroup(pc); } @@ -1656,14 +1654,12 @@ static int mem_cgroup_move_account(struc /* Update mapped_file data for mem_cgroup "from" */ stat = &from->stat; cpustat = &stat->cpustat[cpu]; - __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_FILE_MAPPED, - -1); + __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, -1); /* Update mapped_file data for mem_cgroup "to" */ stat = &to->stat; cpustat = &stat->cpustat[cpu]; - __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_FILE_MAPPED, - 1); + __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, 1); } if (do_swap_account && !mem_cgroup_is_root(from)) @@ -2746,10 +2742,10 @@ static u64 mem_cgroup_read(struct cgroup case _MEM: if (name == RES_USAGE && mem_cgroup_is_root(mem)) { mem_cgroup_get_recursive_idx_stat(mem, - MEM_CGROUP_STAT_CACHE, &idx_val); + MEMCG_NR_CACHE, &idx_val); val = idx_val; mem_cgroup_get_recursive_idx_stat(mem, - MEM_CGROUP_STAT_RSS, &idx_val); + MEMCG_NR_RSS, &idx_val); val += idx_val; val <<= PAGE_SHIFT; } else @@ -2758,13 +2754,13 @@ static u64 mem_cgroup_read(struct cgroup case _MEMSWAP: if (name == RES_USAGE && mem_cgroup_is_root(mem)) { mem_cgroup_get_recursive_idx_stat(mem, - MEM_CGROUP_STAT_CACHE, &idx_val); + MEMCG_NR_CACHE, &idx_val); val = idx_val; mem_cgroup_get_recursive_idx_stat(mem, - MEM_CGROUP_STAT_RSS, &idx_val); + MEMCG_NR_RSS, &idx_val); val += idx_val; mem_cgroup_get_recursive_idx_stat(mem, - MEM_CGROUP_STAT_SWAPOUT, &idx_val); + MEMCG_NR_SWAP, &idx_val); val <<= PAGE_SHIFT; } else val = res_counter_read_u64(&mem->memsw, name); @@ -2924,18 +2920,18 @@ static int mem_cgroup_get_local_stat(str s64 val; /* per cpu stat */ - val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_CACHE); + val = mem_cgroup_read_stat(&mem->stat, MEMCG_NR_CACHE); s->stat[MCS_CACHE] += val * PAGE_SIZE; - val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_RSS); + val = mem_cgroup_read_stat(&mem->stat, MEMCG_NR_RSS); s->stat[MCS_RSS] += val * PAGE_SIZE; - val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_FILE_MAPPED); + val = mem_cgroup_read_stat(&mem->stat, MEMCG_NR_FILE_MAPPED); s->stat[MCS_FILE_MAPPED] += val * PAGE_SIZE; - val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_PGPGIN_COUNT); + val = mem_cgroup_read_stat(&mem->stat, MEMCG_PGPGIN); s->stat[MCS_PGPGIN] += val; - val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_PGPGOUT_COUNT); + val = mem_cgroup_read_stat(&mem->stat, MEMCG_PGPGOUT); s->stat[MCS_PGPGOUT] += val; if (do_swap_account) { - val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_SWAPOUT); + val = mem_cgroup_read_stat(&mem->stat, MEMCG_NR_SWAP); s->stat[MCS_SWAP] += val * PAGE_SIZE; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] memcg : rewrite percpu countings with new interfaces 2009-11-06 8:52 [PATCH 0/2] memcg make use of new percpu implementations KAMEZAWA Hiroyuki 2009-11-06 8:54 ` [PATCH 1/2] memcg : rename index to short name KAMEZAWA Hiroyuki @ 2009-11-06 8:55 ` KAMEZAWA Hiroyuki 2009-11-06 17:27 ` Christoph Lameter ` (2 more replies) 2009-11-09 7:04 ` [PATCH 0/2] memcg make use of new percpu implementations Balbir Singh 2 siblings, 3 replies; 10+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-11-06 8:55 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, balbir, nishimura, cl, akpm From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Now, alloc_percpu() alloc good dynamic allocations and Recent updates on percpu.h gives us following kind of ops - __this_cpu_add() etc... This is designed to be a help for reduce code size in hot-path and very useful to handle percpu area. Thanks for great works. This patch rewrite memcg's (not-good) percpu status with new percpu support macros. This decreases code size and instruction size. By this, this area is now NUMA-aware and may have performance benefit. I got good result in parallel pagefault test. (my host is 8cpu/2socket) before== Performance counter stats for './runpause.sh' (5 runs): 474070.055912 task-clock-msecs # 7.881 CPUs ( +- 0.013% ) 35829310 page-faults # 0.076 M/sec ( +- 0.217% ) 3803016722 cache-references # 8.022 M/sec ( +- 0.215% ) (scaled from 100.00%) 1104083123 cache-misses # 2.329 M/sec ( +- 0.961% ) (scaled from 100.00%) 60.154982314 seconds time elapsed ( +- 0.018% ) after== Performance counter stats for './runpause.sh' (5 runs): 474919.429670 task-clock-msecs # 7.896 CPUs ( +- 0.013% ) 36520440 page-faults # 0.077 M/sec ( +- 1.854% ) 3109834751 cache-references # 6.548 M/sec ( +- 0.276% ) 1053275160 cache-misses # 2.218 M/sec ( +- 0.036% ) 60.146585280 seconds time elapsed ( +- 0.019% ) This test is affected by cpu-utilization but I think more improvements will be found in bigger system. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/memcontrol.c | 168 +++++++++++++++++++------------------------------------- 1 file changed, 58 insertions(+), 110 deletions(-) Index: mmotm-2.6.32-Nov2/mm/memcontrol.c =================================================================== --- mmotm-2.6.32-Nov2.orig/mm/memcontrol.c +++ mmotm-2.6.32-Nov2/mm/memcontrol.c @@ -39,6 +39,7 @@ #include <linux/mm_inline.h> #include <linux/page_cgroup.h> #include <linux/cpu.h> +#include <linux/percpu.h> #include "internal.h" #include <asm/uaccess.h> @@ -78,54 +79,8 @@ enum mem_cgroup_stat_index { struct mem_cgroup_stat_cpu { s64 count[MEMCG_STAT_NSTATS]; -} ____cacheline_aligned_in_smp; - -struct mem_cgroup_stat { - struct mem_cgroup_stat_cpu cpustat[0]; }; -static inline void -__mem_cgroup_stat_reset_safe(struct mem_cgroup_stat_cpu *stat, - enum mem_cgroup_stat_index idx) -{ - stat->count[idx] = 0; -} - -static inline s64 -__mem_cgroup_stat_read_local(struct mem_cgroup_stat_cpu *stat, - enum mem_cgroup_stat_index idx) -{ - return stat->count[idx]; -} - -/* - * For accounting under irq disable, no need for increment preempt count. - */ -static inline void __mem_cgroup_stat_add_safe(struct mem_cgroup_stat_cpu *stat, - enum mem_cgroup_stat_index idx, int val) -{ - stat->count[idx] += val; -} - -static s64 mem_cgroup_read_stat(struct mem_cgroup_stat *stat, - enum mem_cgroup_stat_index idx) -{ - int cpu; - s64 ret = 0; - for_each_possible_cpu(cpu) - ret += stat->cpustat[cpu].count[idx]; - return ret; -} - -static s64 mem_cgroup_local_usage(struct mem_cgroup_stat *stat) -{ - s64 ret; - - ret = mem_cgroup_read_stat(stat, MEMCG_NR_CACHE); - ret += mem_cgroup_read_stat(stat, MEMCG_NR_RSS); - return ret; -} - /* * per-zone information in memory controller. */ @@ -226,10 +181,7 @@ struct mem_cgroup { /* set when res.limit == memsw.limit */ bool memsw_is_minimum; - /* - * statistics. This must be placed at the end of memcg. - */ - struct mem_cgroup_stat stat; + struct mem_cgroup_stat_cpu *cpustat; }; /* @@ -370,18 +322,13 @@ mem_cgroup_remove_exceeded(struct mem_cg static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem) { bool ret = false; - int cpu; s64 val; - struct mem_cgroup_stat_cpu *cpustat; - cpu = get_cpu(); - cpustat = &mem->stat.cpustat[cpu]; - val = __mem_cgroup_stat_read_local(cpustat, MEMCG_EVENTS); + val = __this_cpu_read(mem->cpustat->count[MEMCG_EVENTS]); if (unlikely(val > SOFTLIMIT_EVENTS_THRESH)) { - __mem_cgroup_stat_reset_safe(cpustat, MEMCG_EVENTS); + __this_cpu_write(mem->cpustat->count[MEMCG_EVENTS], 0); ret = true; } - put_cpu(); return ret; } @@ -477,17 +424,35 @@ mem_cgroup_largest_soft_limit_node(struc return mz; } +static s64 mem_cgroup_read_stat(struct mem_cgroup *mem, + enum mem_cgroup_stat_index idx) +{ + struct mem_cgroup_stat_cpu *cstat; + int cpu; + s64 ret = 0; + + for_each_possible_cpu(cpu) { + cstat = per_cpu_ptr(mem->cpustat, cpu); + ret += cstat->count[idx]; + } + return ret; +} + +static s64 mem_cgroup_local_usage(struct mem_cgroup *mem) +{ + s64 ret; + + ret = mem_cgroup_read_stat(mem, MEMCG_NR_CACHE); + ret += mem_cgroup_read_stat(mem, MEMCG_NR_RSS); + return ret; +} + static void mem_cgroup_swap_statistics(struct mem_cgroup *mem, bool charge) { int val = (charge) ? 1 : -1; - struct mem_cgroup_stat *stat = &mem->stat; - struct mem_cgroup_stat_cpu *cpustat; - int cpu = get_cpu(); - cpustat = &stat->cpustat[cpu]; - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_SWAP, val); - put_cpu(); + __this_cpu_add(mem->cpustat->count[MEMCG_NR_SWAP], val); } static void mem_cgroup_charge_statistics(struct mem_cgroup *mem, @@ -495,22 +460,17 @@ static void mem_cgroup_charge_statistics bool charge) { int val = (charge) ? 1 : -1; - struct mem_cgroup_stat *stat = &mem->stat; - struct mem_cgroup_stat_cpu *cpustat; - int cpu = get_cpu(); - cpustat = &stat->cpustat[cpu]; if (PageCgroupCache(pc)) - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_CACHE, val); + __this_cpu_add(mem->cpustat->count[MEMCG_NR_CACHE], val); else - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_RSS, val); + __this_cpu_add(mem->cpustat->count[MEMCG_NR_RSS], val); if (charge) - __mem_cgroup_stat_add_safe(cpustat, MEMCG_PGPGIN, 1); + __this_cpu_inc(mem->cpustat->count[MEMCG_PGPGIN]); else - __mem_cgroup_stat_add_safe(cpustat, MEMCG_PGPGOUT, 1); - __mem_cgroup_stat_add_safe(cpustat, MEMCG_EVENTS, 1); - put_cpu(); + __this_cpu_inc(mem->cpustat->count[MEMCG_PGPGOUT]); + __this_cpu_inc(mem->cpustat->count[MEMCG_EVENTS]); } static unsigned long mem_cgroup_get_local_zonestat(struct mem_cgroup *mem, @@ -1162,7 +1122,7 @@ static int mem_cgroup_hierarchical_recla } } } - if (!mem_cgroup_local_usage(&victim->stat)) { + if (!mem_cgroup_local_usage(victim)) { /* this cgroup's local usage == 0 */ css_put(&victim->css); continue; @@ -1228,9 +1188,6 @@ static void record_last_oom(struct mem_c void mem_cgroup_update_file_mapped(struct page *page, int val) { struct mem_cgroup *mem; - struct mem_cgroup_stat *stat; - struct mem_cgroup_stat_cpu *cpustat; - int cpu; struct page_cgroup *pc; pc = lookup_page_cgroup(page); @@ -1245,14 +1202,7 @@ void mem_cgroup_update_file_mapped(struc if (!PageCgroupUsed(pc)) goto done; - /* - * Preemption is already disabled, we don't need get_cpu() - */ - cpu = smp_processor_id(); - stat = &mem->stat; - cpustat = &stat->cpustat[cpu]; - - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, val); + __this_cpu_add(mem->cpustat->count[MEMCG_NR_FILE_MAPPED], val); done: unlock_page_cgroup(pc); } @@ -1623,9 +1573,6 @@ static int mem_cgroup_move_account(struc int nid, zid; int ret = -EBUSY; struct page *page; - int cpu; - struct mem_cgroup_stat *stat; - struct mem_cgroup_stat_cpu *cpustat; VM_BUG_ON(from == to); VM_BUG_ON(PageLRU(pc->page)); @@ -1650,16 +1597,11 @@ static int mem_cgroup_move_account(struc page = pc->page; if (page_mapped(page) && !PageAnon(page)) { - cpu = smp_processor_id(); /* Update mapped_file data for mem_cgroup "from" */ - stat = &from->stat; - cpustat = &stat->cpustat[cpu]; - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, -1); + __this_cpu_dec(from->cpustat->count[MEMCG_NR_FILE_MAPPED]); /* Update mapped_file data for mem_cgroup "to" */ - stat = &to->stat; - cpustat = &stat->cpustat[cpu]; - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, 1); + __this_cpu_inc(to->cpustat->count[MEMCG_NR_FILE_MAPPED]); } if (do_swap_account && !mem_cgroup_is_root(from)) @@ -2715,7 +2657,7 @@ static int mem_cgroup_get_idx_stat(struct mem_cgroup *mem, void *data) { struct mem_cgroup_idx_data *d = data; - d->val += mem_cgroup_read_stat(&mem->stat, d->idx); + d->val += mem_cgroup_read_stat(mem, d->idx); return 0; } @@ -2920,18 +2862,18 @@ static int mem_cgroup_get_local_stat(str s64 val; /* per cpu stat */ - val = mem_cgroup_read_stat(&mem->stat, MEMCG_NR_CACHE); + val = mem_cgroup_read_stat(mem, MEMCG_NR_CACHE); s->stat[MCS_CACHE] += val * PAGE_SIZE; - val = mem_cgroup_read_stat(&mem->stat, MEMCG_NR_RSS); + val = mem_cgroup_read_stat(mem, MEMCG_NR_RSS); s->stat[MCS_RSS] += val * PAGE_SIZE; - val = mem_cgroup_read_stat(&mem->stat, MEMCG_NR_FILE_MAPPED); + val = mem_cgroup_read_stat(mem, MEMCG_NR_FILE_MAPPED); s->stat[MCS_FILE_MAPPED] += val * PAGE_SIZE; - val = mem_cgroup_read_stat(&mem->stat, MEMCG_PGPGIN); + val = mem_cgroup_read_stat(mem, MEMCG_PGPGIN); s->stat[MCS_PGPGIN] += val; - val = mem_cgroup_read_stat(&mem->stat, MEMCG_PGPGOUT); + val = mem_cgroup_read_stat(mem, MEMCG_PGPGOUT); s->stat[MCS_PGPGOUT] += val; if (do_swap_account) { - val = mem_cgroup_read_stat(&mem->stat, MEMCG_NR_SWAP); + val = mem_cgroup_read_stat(mem, MEMCG_NR_SWAP); s->stat[MCS_SWAP] += val * PAGE_SIZE; } @@ -3190,17 +3132,12 @@ static void free_mem_cgroup_per_zone_inf kfree(mem->info.nodeinfo[node]); } -static int mem_cgroup_size(void) -{ - int cpustat_size = nr_cpu_ids * sizeof(struct mem_cgroup_stat_cpu); - return sizeof(struct mem_cgroup) + cpustat_size; -} - static struct mem_cgroup *mem_cgroup_alloc(void) { struct mem_cgroup *mem; - int size = mem_cgroup_size(); + int size = sizeof(struct mem_cgroup); + /* Can be very big if MAX_NUMNODES is big */ if (size < PAGE_SIZE) mem = kmalloc(size, GFP_KERNEL); else @@ -3208,6 +3145,15 @@ static struct mem_cgroup *mem_cgroup_all if (mem) memset(mem, 0, size); + mem->cpustat = alloc_percpu(struct mem_cgroup_stat_cpu); + if (!mem->cpustat) { + if (size < PAGE_SIZE) + kfree(mem); + else + vfree(mem); + mem = NULL; + } + return mem; } @@ -3232,7 +3178,9 @@ static void __mem_cgroup_free(struct mem for_each_node_state(node, N_POSSIBLE) free_mem_cgroup_per_zone_info(mem, node); - if (mem_cgroup_size() < PAGE_SIZE) + free_percpu(mem->cpustat); + + if (sizeof(struct mem_cgroup) < PAGE_SIZE) kfree(mem); else vfree(mem); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] memcg : rewrite percpu countings with new interfaces 2009-11-06 8:55 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces KAMEZAWA Hiroyuki @ 2009-11-06 17:27 ` Christoph Lameter 2009-11-06 18:44 ` KAMEZAWA Hiroyuki 2009-11-09 6:44 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces v2 KAMEZAWA Hiroyuki 2009-11-09 7:07 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces Balbir Singh 2 siblings, 1 reply; 10+ messages in thread From: Christoph Lameter @ 2009-11-06 17:27 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, balbir, nishimura, akpm On Fri, 6 Nov 2009, KAMEZAWA Hiroyuki wrote: > @@ -370,18 +322,13 @@ mem_cgroup_remove_exceeded(struct mem_cg > static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem) > { > bool ret = false; > - int cpu; > s64 val; > - struct mem_cgroup_stat_cpu *cpustat; > > - cpu = get_cpu(); > - cpustat = &mem->stat.cpustat[cpu]; > - val = __mem_cgroup_stat_read_local(cpustat, MEMCG_EVENTS); > + val = __this_cpu_read(mem->cpustat->count[MEMCG_EVENTS]); > if (unlikely(val > SOFTLIMIT_EVENTS_THRESH)) { > - __mem_cgroup_stat_reset_safe(cpustat, MEMCG_EVENTS); > + __this_cpu_write(mem->cpustat->count[MEMCG_EVENTS], 0); > ret = true; > } > - put_cpu(); > return ret; If you want to use the __this_cpu_xx versions then you need to manage preempt on your own. You need to keep preempt_disable/enable here because otherwise the per cpu variable zeroed may be on a different cpu than the per cpu variable where you got the value from. > +static s64 mem_cgroup_read_stat(struct mem_cgroup *mem, > + enum mem_cgroup_stat_index idx) > +{ > + struct mem_cgroup_stat_cpu *cstat; > + int cpu; > + s64 ret = 0; > + > + for_each_possible_cpu(cpu) { > + cstat = per_cpu_ptr(mem->cpustat, cpu); > + ret += cstat->count[idx]; > + } == ret += per_cpu(mem->cpustat->cstat->count[idx], cpu) > static void mem_cgroup_swap_statistics(struct mem_cgroup *mem, > bool charge) > { > int val = (charge) ? 1 : -1; > - struct mem_cgroup_stat *stat = &mem->stat; > - struct mem_cgroup_stat_cpu *cpustat; > - int cpu = get_cpu(); > > - cpustat = &stat->cpustat[cpu]; > - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_SWAP, val); > - put_cpu(); > + __this_cpu_add(mem->cpustat->count[MEMCG_NR_SWAP], val); > } You do not disable preempt on your own so you have to use this_cpu_add() There is no difference between __this_cpu_add and this_cpu_add on x86 but they will differ on platforms that do not have atomic per cpu instructions. The fallback for this_cpu_add is to protect the add with preempt_disable()/enable. The fallback fro __this_cpu_add is just to rely on the caller to ensure that preempt is disabled somehow. > @@ -495,22 +460,17 @@ static void mem_cgroup_charge_statistics > bool charge) > { > int val = (charge) ? 1 : -1; > - struct mem_cgroup_stat *stat = &mem->stat; > - struct mem_cgroup_stat_cpu *cpustat; > - int cpu = get_cpu(); > > - cpustat = &stat->cpustat[cpu]; > if (PageCgroupCache(pc)) > - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_CACHE, val); > + __this_cpu_add(mem->cpustat->count[MEMCG_NR_CACHE], val); Remove __ > else > - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_RSS, val); > + __this_cpu_add(mem->cpustat->count[MEMCG_NR_RSS], val); Remove __ > > if (charge) > - __mem_cgroup_stat_add_safe(cpustat, MEMCG_PGPGIN, 1); > + __this_cpu_inc(mem->cpustat->count[MEMCG_PGPGIN]); Remove __ > else > - __mem_cgroup_stat_add_safe(cpustat, MEMCG_PGPGOUT, 1); > - __mem_cgroup_stat_add_safe(cpustat, MEMCG_EVENTS, 1); > - put_cpu(); > + __this_cpu_inc(mem->cpustat->count[MEMCG_PGPGOUT]); > + __this_cpu_inc(mem->cpustat->count[MEMCG_EVENTS]); Remove __ > - /* > - * Preemption is already disabled, we don't need get_cpu() > - */ > - cpu = smp_processor_id(); > - stat = &mem->stat; > - cpustat = &stat->cpustat[cpu]; > - > - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, val); > + __this_cpu_add(mem->cpustat->count[MEMCG_NR_FILE_MAPPED], val); Remove __ > @@ -1650,16 +1597,11 @@ static int mem_cgroup_move_account(struc > > page = pc->page; > if (page_mapped(page) && !PageAnon(page)) { > - cpu = smp_processor_id(); > /* Update mapped_file data for mem_cgroup "from" */ > - stat = &from->stat; > - cpustat = &stat->cpustat[cpu]; > - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, -1); > + __this_cpu_dec(from->cpustat->count[MEMCG_NR_FILE_MAPPED]); You can keep it here since the context already has preempt disabled it seems. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] memcg : rewrite percpu countings with new interfaces 2009-11-06 17:27 ` Christoph Lameter @ 2009-11-06 18:44 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 10+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-11-06 18:44 UTC (permalink / raw) To: Christoph Lameter Cc: KAMEZAWA Hiroyuki, linux-kernel, linux-mm, balbir, nishimura, akpm Christoph Lameter wrote: > On Fri, 6 Nov 2009, KAMEZAWA Hiroyuki wrote: >> - __mem_cgroup_stat_reset_safe(cpustat, MEMCG_EVENTS); >> + __this_cpu_write(mem->cpustat->count[MEMCG_EVENTS], 0); >> ret = true; >> } >> - put_cpu(); >> return ret; > > If you want to use the __this_cpu_xx versions then you need to manage > preempt on your own. > Ah, I see. I understand I haven't understood. > You need to keep preempt_disable/enable here because otherwise the per > cpu variable zeroed may be on a different cpu than the per cpu variable > where you got the value from. > Thank you. I think I can do well in the next version. >> +static s64 mem_cgroup_read_stat(struct mem_cgroup *mem, >> + enum mem_cgroup_stat_index idx) >> +{ >> + struct mem_cgroup_stat_cpu *cstat; >> + int cpu; >> + s64 ret = 0; >> + >> + for_each_possible_cpu(cpu) { >> + cstat = per_cpu_ptr(mem->cpustat, cpu); >> + ret += cstat->count[idx]; >> + } > > == ret += per_cpu(mem->cpustat->cstat->count[idx], cpu) > Hmm, Hmm. Will use that. >> static void mem_cgroup_swap_statistics(struct mem_cgroup *mem, >> bool charge) >> { >> int val = (charge) ? 1 : -1; >> - struct mem_cgroup_stat *stat = &mem->stat; >> - struct mem_cgroup_stat_cpu *cpustat; >> - int cpu = get_cpu(); >> >> - cpustat = &stat->cpustat[cpu]; >> - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_SWAP, val); >> - put_cpu(); >> + __this_cpu_add(mem->cpustat->count[MEMCG_NR_SWAP], val); >> } > > You do not disable preempt on your own so you have to use > > this_cpu_add() > > There is no difference between __this_cpu_add and this_cpu_add on x86 but > they will differ on platforms that do not have atomic per cpu > instructions. The fallback for this_cpu_add is to protect the add with > preempt_disable()/enable. The fallback fro __this_cpu_add is just to rely > on the caller to ensure that preempt is disabled somehow. > Ok. >> - /* >> - * Preemption is already disabled, we don't need get_cpu() >> - */ >> - cpu = smp_processor_id(); >> - stat = &mem->stat; >> - cpustat = &stat->cpustat[cpu]; >> - >> - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, val); >> + __this_cpu_add(mem->cpustat->count[MEMCG_NR_FILE_MAPPED], val); > > Remove __ > > >> @@ -1650,16 +1597,11 @@ static int mem_cgroup_move_account(struc >> >> page = pc->page; >> if (page_mapped(page) && !PageAnon(page)) { >> - cpu = smp_processor_id(); >> /* Update mapped_file data for mem_cgroup "from" */ >> - stat = &from->stat; >> - cpustat = &stat->cpustat[cpu]; >> - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, -1); >> + __this_cpu_dec(from->cpustat->count[MEMCG_NR_FILE_MAPPED]); > > You can keep it here since the context already has preempt disabled it > seems. > Thank you for kindly review. Regards, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] memcg : rewrite percpu countings with new interfaces v2 2009-11-06 8:55 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces KAMEZAWA Hiroyuki 2009-11-06 17:27 ` Christoph Lameter @ 2009-11-09 6:44 ` KAMEZAWA Hiroyuki 2009-11-09 7:07 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces Balbir Singh 2 siblings, 0 replies; 10+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-11-09 6:44 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, balbir, nishimura, cl, akpm Thank you Chirsotph for review. I think this version works well. == From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Now, alloc_percpu() alloc good dynamic allocations and Recent updates on percpu.h gives us following kind of ops - this_cpu_add() etc... This is designed to be a help for reduce code size in hot-path and very useful to handle percpu area. Thanks for great works. This patch rewrite memcg's (not-good) percpu status with new percpu support macros. This decreases code size and instruction size. By this, this area is now NUMA-aware and may have performance benefit. Changelog: 2009/11/09 - fixed misusage of __this_cpu_xxx Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/memcontrol.c | 172 +++++++++++++++++++------------------------------------- 1 file changed, 61 insertions(+), 111 deletions(-) Index: mmotm-2.6.32-Nov2/mm/memcontrol.c =================================================================== --- mmotm-2.6.32-Nov2.orig/mm/memcontrol.c +++ mmotm-2.6.32-Nov2/mm/memcontrol.c @@ -39,6 +39,7 @@ #include <linux/mm_inline.h> #include <linux/page_cgroup.h> #include <linux/cpu.h> +#include <linux/percpu.h> #include "internal.h" #include <asm/uaccess.h> @@ -78,54 +79,8 @@ enum mem_cgroup_stat_index { struct mem_cgroup_stat_cpu { s64 count[MEMCG_STAT_NSTATS]; -} ____cacheline_aligned_in_smp; - -struct mem_cgroup_stat { - struct mem_cgroup_stat_cpu cpustat[0]; }; -static inline void -__mem_cgroup_stat_reset_safe(struct mem_cgroup_stat_cpu *stat, - enum mem_cgroup_stat_index idx) -{ - stat->count[idx] = 0; -} - -static inline s64 -__mem_cgroup_stat_read_local(struct mem_cgroup_stat_cpu *stat, - enum mem_cgroup_stat_index idx) -{ - return stat->count[idx]; -} - -/* - * For accounting under irq disable, no need for increment preempt count. - */ -static inline void __mem_cgroup_stat_add_safe(struct mem_cgroup_stat_cpu *stat, - enum mem_cgroup_stat_index idx, int val) -{ - stat->count[idx] += val; -} - -static s64 mem_cgroup_read_stat(struct mem_cgroup_stat *stat, - enum mem_cgroup_stat_index idx) -{ - int cpu; - s64 ret = 0; - for_each_possible_cpu(cpu) - ret += stat->cpustat[cpu].count[idx]; - return ret; -} - -static s64 mem_cgroup_local_usage(struct mem_cgroup_stat *stat) -{ - s64 ret; - - ret = mem_cgroup_read_stat(stat, MEMCG_NR_CACHE); - ret += mem_cgroup_read_stat(stat, MEMCG_NR_RSS); - return ret; -} - /* * per-zone information in memory controller. */ @@ -226,10 +181,7 @@ struct mem_cgroup { /* set when res.limit == memsw.limit */ bool memsw_is_minimum; - /* - * statistics. This must be placed at the end of memcg. - */ - struct mem_cgroup_stat stat; + struct mem_cgroup_stat_cpu *cpustat; }; /* @@ -370,18 +322,13 @@ mem_cgroup_remove_exceeded(struct mem_cg static bool mem_cgroup_soft_limit_check(struct mem_cgroup *mem) { bool ret = false; - int cpu; s64 val; - struct mem_cgroup_stat_cpu *cpustat; - cpu = get_cpu(); - cpustat = &mem->stat.cpustat[cpu]; - val = __mem_cgroup_stat_read_local(cpustat, MEMCG_EVENTS); + val = this_cpu_read(mem->cpustat->count[MEMCG_EVENTS]); if (unlikely(val > SOFTLIMIT_EVENTS_THRESH)) { - __mem_cgroup_stat_reset_safe(cpustat, MEMCG_EVENTS); + this_cpu_write(mem->cpustat->count[MEMCG_EVENTS], 0); ret = true; } - put_cpu(); return ret; } @@ -477,17 +424,34 @@ mem_cgroup_largest_soft_limit_node(struc return mz; } +static s64 mem_cgroup_read_stat(struct mem_cgroup *mem, + enum mem_cgroup_stat_index idx) +{ + struct mem_cgroup_stat_cpu *cstat; + int cpu; + s64 ret = 0; + + for_each_possible_cpu(cpu) + ret += per_cpu_ptr(mem->cpustat->count[idx], cpu); + + return ret; +} + +static s64 mem_cgroup_local_usage(struct mem_cgroup *mem) +{ + s64 ret; + + ret = mem_cgroup_read_stat(mem, MEMCG_NR_CACHE); + ret += mem_cgroup_read_stat(mem, MEMCG_NR_RSS); + return ret; +} + static void mem_cgroup_swap_statistics(struct mem_cgroup *mem, bool charge) { int val = (charge) ? 1 : -1; - struct mem_cgroup_stat *stat = &mem->stat; - struct mem_cgroup_stat_cpu *cpustat; - int cpu = get_cpu(); - cpustat = &stat->cpustat[cpu]; - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_SWAP, val); - put_cpu(); + this_cpu_add(mem->cpustat->count[MEMCG_NR_SWAP], val); } static void mem_cgroup_charge_statistics(struct mem_cgroup *mem, @@ -495,22 +459,19 @@ static void mem_cgroup_charge_statistics bool charge) { int val = (charge) ? 1 : -1; - struct mem_cgroup_stat *stat = &mem->stat; - struct mem_cgroup_stat_cpu *cpustat; - int cpu = get_cpu(); - cpustat = &stat->cpustat[cpu]; + preempt_disable(); if (PageCgroupCache(pc)) - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_CACHE, val); + __this_cpu_add(mem->cpustat->count[MEMCG_NR_CACHE], val); else - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_RSS, val); + __this_cpu_add(mem->cpustat->count[MEMCG_NR_RSS], val); if (charge) - __mem_cgroup_stat_add_safe(cpustat, MEMCG_PGPGIN, 1); + __this_cpu_inc(mem->cpustat->count[MEMCG_PGPGIN]); else - __mem_cgroup_stat_add_safe(cpustat, MEMCG_PGPGOUT, 1); - __mem_cgroup_stat_add_safe(cpustat, MEMCG_EVENTS, 1); - put_cpu(); + __this_cpu_inc(mem->cpustat->count[MEMCG_PGPGOUT]); + __this_cpu_inc(mem->cpustat->count[MEMCG_EVENTS]); + preempt_enable(); } static unsigned long mem_cgroup_get_local_zonestat(struct mem_cgroup *mem, @@ -1162,7 +1123,7 @@ static int mem_cgroup_hierarchical_recla } } } - if (!mem_cgroup_local_usage(&victim->stat)) { + if (!mem_cgroup_local_usage(victim)) { /* this cgroup's local usage == 0 */ css_put(&victim->css); continue; @@ -1228,9 +1189,6 @@ static void record_last_oom(struct mem_c void mem_cgroup_update_file_mapped(struct page *page, int val) { struct mem_cgroup *mem; - struct mem_cgroup_stat *stat; - struct mem_cgroup_stat_cpu *cpustat; - int cpu; struct page_cgroup *pc; pc = lookup_page_cgroup(page); @@ -1245,14 +1203,7 @@ void mem_cgroup_update_file_mapped(struc if (!PageCgroupUsed(pc)) goto done; - /* - * Preemption is already disabled, we don't need get_cpu() - */ - cpu = smp_processor_id(); - stat = &mem->stat; - cpustat = &stat->cpustat[cpu]; - - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, val); + this_cpu_add(mem->cpustat->count[MEMCG_NR_FILE_MAPPED], val); done: unlock_page_cgroup(pc); } @@ -1623,9 +1574,6 @@ static int mem_cgroup_move_account(struc int nid, zid; int ret = -EBUSY; struct page *page; - int cpu; - struct mem_cgroup_stat *stat; - struct mem_cgroup_stat_cpu *cpustat; VM_BUG_ON(from == to); VM_BUG_ON(PageLRU(pc->page)); @@ -1650,16 +1598,12 @@ static int mem_cgroup_move_account(struc page = pc->page; if (page_mapped(page) && !PageAnon(page)) { - cpu = smp_processor_id(); + preempt_disable(); /* Update mapped_file data for mem_cgroup "from" */ - stat = &from->stat; - cpustat = &stat->cpustat[cpu]; - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, -1); - + __this_cpu_dec(from->cpustat->count[MEMCG_NR_FILE_MAPPED]); /* Update mapped_file data for mem_cgroup "to" */ - stat = &to->stat; - cpustat = &stat->cpustat[cpu]; - __mem_cgroup_stat_add_safe(cpustat, MEMCG_NR_FILE_MAPPED, 1); + __this_cpu_inc(to->cpustat->count[MEMCG_NR_FILE_MAPPED]); + preempt_enable(); } if (do_swap_account && !mem_cgroup_is_root(from)) @@ -2715,7 +2659,7 @@ static int mem_cgroup_get_idx_stat(struct mem_cgroup *mem, void *data) { struct mem_cgroup_idx_data *d = data; - d->val += mem_cgroup_read_stat(&mem->stat, d->idx); + d->val += mem_cgroup_read_stat(mem, d->idx); return 0; } @@ -2920,18 +2864,18 @@ static int mem_cgroup_get_local_stat(str s64 val; /* per cpu stat */ - val = mem_cgroup_read_stat(&mem->stat, MEMCG_NR_CACHE); + val = mem_cgroup_read_stat(mem, MEMCG_NR_CACHE); s->stat[MCS_CACHE] += val * PAGE_SIZE; - val = mem_cgroup_read_stat(&mem->stat, MEMCG_NR_RSS); + val = mem_cgroup_read_stat(mem, MEMCG_NR_RSS); s->stat[MCS_RSS] += val * PAGE_SIZE; - val = mem_cgroup_read_stat(&mem->stat, MEMCG_NR_FILE_MAPPED); + val = mem_cgroup_read_stat(mem, MEMCG_NR_FILE_MAPPED); s->stat[MCS_FILE_MAPPED] += val * PAGE_SIZE; - val = mem_cgroup_read_stat(&mem->stat, MEMCG_PGPGIN); + val = mem_cgroup_read_stat(mem, MEMCG_PGPGIN); s->stat[MCS_PGPGIN] += val; - val = mem_cgroup_read_stat(&mem->stat, MEMCG_PGPGOUT); + val = mem_cgroup_read_stat(mem, MEMCG_PGPGOUT); s->stat[MCS_PGPGOUT] += val; if (do_swap_account) { - val = mem_cgroup_read_stat(&mem->stat, MEMCG_NR_SWAP); + val = mem_cgroup_read_stat(mem, MEMCG_NR_SWAP); s->stat[MCS_SWAP] += val * PAGE_SIZE; } @@ -3190,17 +3134,12 @@ static void free_mem_cgroup_per_zone_inf kfree(mem->info.nodeinfo[node]); } -static int mem_cgroup_size(void) -{ - int cpustat_size = nr_cpu_ids * sizeof(struct mem_cgroup_stat_cpu); - return sizeof(struct mem_cgroup) + cpustat_size; -} - static struct mem_cgroup *mem_cgroup_alloc(void) { struct mem_cgroup *mem; - int size = mem_cgroup_size(); + int size = sizeof(struct mem_cgroup); + /* Can be very big if MAX_NUMNODES is big */ if (size < PAGE_SIZE) mem = kmalloc(size, GFP_KERNEL); else @@ -3208,6 +3147,15 @@ static struct mem_cgroup *mem_cgroup_all if (mem) memset(mem, 0, size); + mem->cpustat = alloc_percpu(struct mem_cgroup_stat_cpu); + if (!mem->cpustat) { + if (size < PAGE_SIZE) + kfree(mem); + else + vfree(mem); + mem = NULL; + } + return mem; } @@ -3232,7 +3180,9 @@ static void __mem_cgroup_free(struct mem for_each_node_state(node, N_POSSIBLE) free_mem_cgroup_per_zone_info(mem, node); - if (mem_cgroup_size() < PAGE_SIZE) + free_percpu(mem->cpustat); + + if (sizeof(struct mem_cgroup) < PAGE_SIZE) kfree(mem); else vfree(mem); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] memcg : rewrite percpu countings with new interfaces 2009-11-06 8:55 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces KAMEZAWA Hiroyuki 2009-11-06 17:27 ` Christoph Lameter 2009-11-09 6:44 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces v2 KAMEZAWA Hiroyuki @ 2009-11-09 7:07 ` Balbir Singh 2009-11-09 8:36 ` KAMEZAWA Hiroyuki 2 siblings, 1 reply; 10+ messages in thread From: Balbir Singh @ 2009-11-09 7:07 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, nishimura, cl, akpm * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-11-06 17:55:45]: > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > Now, alloc_percpu() alloc good dynamic allocations and > Recent updates on percpu.h gives us following kind of ops > - __this_cpu_add() etc... > This is designed to be a help for reduce code size in hot-path > and very useful to handle percpu area. Thanks for great works. > > This patch rewrite memcg's (not-good) percpu status with new > percpu support macros. This decreases code size and instruction > size. By this, this area is now NUMA-aware and may have performance > benefit. > > I got good result in parallel pagefault test. (my host is 8cpu/2socket) > > before== > Performance counter stats for './runpause.sh' (5 runs): > > 474070.055912 task-clock-msecs # 7.881 CPUs ( +- 0.013% ) > 35829310 page-faults # 0.076 M/sec ( +- 0.217% ) > 3803016722 cache-references # 8.022 M/sec ( +- 0.215% ) (scaled from 100.00%) > 1104083123 cache-misses # 2.329 M/sec ( +- 0.961% ) (scaled from 100.00%) > > 60.154982314 seconds time elapsed ( +- 0.018% ) > > after== > Performance counter stats for './runpause.sh' (5 runs): > > 474919.429670 task-clock-msecs # 7.896 CPUs ( +- 0.013% ) > 36520440 page-faults # 0.077 M/sec ( +- 1.854% ) > 3109834751 cache-references # 6.548 M/sec ( +- 0.276% ) > 1053275160 cache-misses # 2.218 M/sec ( +- 0.036% ) > > 60.146585280 seconds time elapsed ( +- 0.019% ) > > This test is affected by cpu-utilization but I think more improvements > will be found in bigger system. > Hi, Kamezawa-San, Could you please post the IPC results as well? -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] memcg : rewrite percpu countings with new interfaces 2009-11-09 7:07 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces Balbir Singh @ 2009-11-09 8:36 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 10+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-11-09 8:36 UTC (permalink / raw) To: balbir; +Cc: linux-kernel, linux-mm, nishimura, cl, akpm On Mon, 9 Nov 2009 12:37:37 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > after== > > Performance counter stats for './runpause.sh' (5 runs): > > > > 474919.429670 task-clock-msecs # 7.896 CPUs ( +- 0.013% ) > > 36520440 page-faults # 0.077 M/sec ( +- 1.854% ) > > 3109834751 cache-references # 6.548 M/sec ( +- 0.276% ) > > 1053275160 cache-misses # 2.218 M/sec ( +- 0.036% ) > > > > 60.146585280 seconds time elapsed ( +- 0.019% ) > > > > This test is affected by cpu-utilization but I think more improvements > > will be found in bigger system. > > > > Hi, Kamezawa-San, > > Could you please post the IPC results as well? > Because PREEMPT=n, no differnce between v1/v2, basically. Here. == Performance counter stats for './runpause.sh' (5 runs): 475884.969949 task-clock-msecs # 7.913 CPUs ( +- 0.005% ) 36592060 page-faults # 0.077 M/sec ( +- 0.301% ) 3037784893 cache-references # 6.383 M/sec ( +- 0.361% ) (scaled from 99.71%) 1130761297 cache-misses # 2.376 M/sec ( +- 0.244% ) (scaled from 98.24%) 60.136803969 seconds time elapsed ( +- 0.006% ) == But this program is highly affected by cpu utilization etc... Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] memcg make use of new percpu implementations 2009-11-06 8:52 [PATCH 0/2] memcg make use of new percpu implementations KAMEZAWA Hiroyuki 2009-11-06 8:54 ` [PATCH 1/2] memcg : rename index to short name KAMEZAWA Hiroyuki 2009-11-06 8:55 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces KAMEZAWA Hiroyuki @ 2009-11-09 7:04 ` Balbir Singh 2009-11-09 7:34 ` KAMEZAWA Hiroyuki 2 siblings, 1 reply; 10+ messages in thread From: Balbir Singh @ 2009-11-09 7:04 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm, nishimura, cl, akpm * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-11-06 17:52:42]: > Hi, > > Recent updates on dynamic percpu allocation looks good and I tries to rewrite > memcg's poor implementation of percpu status counter. > (It's not NUMA-aware ...) > Thanks for great works. > > For this time. I added Christoph to CC because I'm not fully sure my usage of > __this_cpu_xxx is correct...I'm glad if you check the usage when you have time. > > > Patch 1/2 is just clean up (prepare for patch 2/2) > Patch 2/2 is for percpu. > > Tested on my 8cpu box and works well. > Pathcesa are against the latest mmotm. How do the test results look? DO you see a significant boost? BTW, I've been experimenting a bit with the earlier percpu counter patches, I might post an iteration once I have some good results. -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] memcg make use of new percpu implementations 2009-11-09 7:04 ` [PATCH 0/2] memcg make use of new percpu implementations Balbir Singh @ 2009-11-09 7:34 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 10+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-11-09 7:34 UTC (permalink / raw) To: balbir; +Cc: linux-kernel, linux-mm, nishimura, cl, akpm On Mon, 9 Nov 2009 12:34:21 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-11-06 17:52:42]: > > > Hi, > > > > Recent updates on dynamic percpu allocation looks good and I tries to rewrite > > memcg's poor implementation of percpu status counter. > > (It's not NUMA-aware ...) > > Thanks for great works. > > > > For this time. I added Christoph to CC because I'm not fully sure my usage of > > __this_cpu_xxx is correct...I'm glad if you check the usage when you have time. > > > > > > Patch 1/2 is just clean up (prepare for patch 2/2) > > Patch 2/2 is for percpu. > > > > Tested on my 8cpu box and works well. > > Pathcesa are against the latest mmotm. > > How do the test results look? DO you see a significant boost? Because my test enviroment is just an SMP (not NUMA), improvement will not be siginificant (visible), I think. But It's good to make use of recent updates of per_cpu implementation. We can make use of offset calculation methods of them and one-instruction-access macros. This is code size (PREEMPT=y) [Before patch] [kamezawa@bluextal mmotm-2.6.32-Nov2]$ size mm/memcontrol.o text data bss dec hex filename 22403 3420 4132 29955 7503 mm/memcontrol.o [After patch] [kamezawa@bluextal mmotm-2.6.32-Nov2]$ size mm/memcontrol.o text data bss dec hex filename 22188 3420 4132 29740 742c mm/memcontrol.o Then, text size is surely reduced. One example is mem_cgroup_swap_statistics(). This function is inlined after this patch. this code: mem_cgroup_uncharge_swap(), modifies percpu counter. == memcg = mem_cgroup_lookup(id); if (memcg) { /* * We uncharge this because swap is freed. * This memcg can be obsolete one. We avoid calling css_tryget */ if (!mem_cgroup_is_root(memcg)) res_counter_uncharge(&memcg->memsw, PAGE_SIZE); mem_cgroup_swap_statistics(memcg, false); mem_cgroup_put(memcg); } == [before patch] mem_cgroup_swap_statistics() is not inlined and uses 0x69 bytes. 0000000000001d30 <mem_cgroup_swap_statistics>: 1d30: push %rbp 1d31: mov %rsp,%rbp 1d34: push %r12 <snip> 1d88: callq 1d8d <mem_cgroup_swap_statistics+0x5d> 1d8d: nopl (%rax) 1d90: jmp 1d83 <mem_cgroup_swap_statistics+0x53> 1d92: nopw %cs:0x0(%rax,%rax,1) 1d99: [After patch] mem_cgroup_uncharge_swap()'s inlined code. 3b67: cmp 0x0(%rip),%rax # 3b6e <mem_cgroup_uncharge_swap+0 xbe> 3b6e: je 3b81 <mem_cgroup_uncharge_swap+0xd1> <=== check mem_cgroup_is_root 3b70: lea 0x90(%rax),%rdi 3b77: mov $0x1000,%esi 3b7c: callq 3b81 <mem_cgroup_uncharge_swap+0xd1> <=== calling res_counter_uncahrge() 3b81: 48 89 df mov %rbx,%rdi 3b84: 48 8b 83 70 01 00 00 mov 0x170(%rbx),%rax <=== get offset of mem->cpustat 3b8b: 65 48 83 40 30 ff addq $0xffffffffffffffff,%gs:0x30(%rax) mem->cpustat.count[index]--; 3b91: e8 6a e0 ff ff callq 1c00 <mem_cgroup_put> This uses 2 instruction. Then, code size reduction is enough large, I think. >BTW, I've been experimenting a bit with the earlier percpu counter patches, > I might post an iteration once I have some good results. > Thank you, it's helpful. Regards, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-11-09 8:38 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-11-06 8:52 [PATCH 0/2] memcg make use of new percpu implementations KAMEZAWA Hiroyuki 2009-11-06 8:54 ` [PATCH 1/2] memcg : rename index to short name KAMEZAWA Hiroyuki 2009-11-06 8:55 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces KAMEZAWA Hiroyuki 2009-11-06 17:27 ` Christoph Lameter 2009-11-06 18:44 ` KAMEZAWA Hiroyuki 2009-11-09 6:44 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces v2 KAMEZAWA Hiroyuki 2009-11-09 7:07 ` [PATCH 2/2] memcg : rewrite percpu countings with new interfaces Balbir Singh 2009-11-09 8:36 ` KAMEZAWA Hiroyuki 2009-11-09 7:04 ` [PATCH 0/2] memcg make use of new percpu implementations Balbir Singh 2009-11-09 7:34 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox