Ooops.... it seems unreachable, change Glauber's email... On Fri, Jul 12, 2013 at 8:59 PM, Sha Zhengju wrote: > Add cc to Glauber > > > On Thu, Jul 11, 2013 at 10:56 PM, Michal Hocko wrote: > > On Sat 06-07-13 01:33:43, Sha Zhengju wrote: > >> From: Sha Zhengju > >> > >> If memcg is enabled and no non-root memcg exists, all allocated > >> pages belongs to root_mem_cgroup and wil go through root memcg > >> statistics routines. So in order to reduce overheads after adding > >> memcg dirty/writeback accounting in hot paths, we use jump label to > >> patch mem_cgroup_{begin,end}_update_page_stat() in or out when not > >> used. > > > > I do not think this is enough. How much do you save? One atomic read. > > This doesn't seem like a killer. > > > > I hoped we could simply not account at all and move counters to the root > > cgroup once the label gets enabled. > > I have thought of this approach before, but it would probably run into > another issue, e.g, each zone has a percpu stock named ->pageset to > optimize the increment and decrement operations, and I haven't figure out a > simpler and cheaper approach to handle that stock numbers if moving global > counters to root cgroup, maybe we can just leave them and can afford the > approximation? > > Glauber have already done lots of works here, in his previous patchset he > also tried to move some global stats to root ( > http://comments.gmane.org/gmane.linux.kernel.cgroups/6291). May I steal > some of your ideas here, Glauber? :P > > > > > > > Besides that, the current patch is racy. Consider what happens when: > > > > mem_cgroup_begin_update_page_stat > > arm_inuse_keys > > > mem_cgroup_move_account > > mem_cgroup_move_account_page_stat > > mem_cgroup_end_update_page_stat > > > > The race window is small of course but it is there. I guess we need > > rcu_read_lock at least. > > Yes, you're right. I'm afraid we need to take care of the racy in the next > updates as well. But mem_cgroup_begin/end_update_page_stat() already have > rcu lock, so here we maybe only need a synchronize_rcu() after changing > memcg_inuse_key? > > > > > >> If no non-root memcg comes to life, we do not need to accquire moving > >> locks, so patch them out. > >> > >> cc: Michal Hocko > >> cc: Greg Thelen > >> cc: KAMEZAWA Hiroyuki > >> cc: Andrew Morton > >> cc: Fengguang Wu > >> cc: Mel Gorman > >> --- > >> include/linux/memcontrol.h | 15 +++++++++++++++ > >> mm/memcontrol.c | 23 ++++++++++++++++++++++- > >> 2 files changed, 37 insertions(+), 1 deletion(-) > >> > >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > >> index ccd35d8..0483e1a 100644 > >> --- a/include/linux/memcontrol.h > >> +++ b/include/linux/memcontrol.h > >> @@ -55,6 +55,13 @@ struct mem_cgroup_reclaim_cookie { > >> }; > >> > >> #ifdef CONFIG_MEMCG > >> + > >> +extern struct static_key memcg_inuse_key; > >> +static inline bool mem_cgroup_in_use(void) > >> +{ > >> + return static_key_false(&memcg_inuse_key); > >> +} > >> + > >> /* > >> * All "charge" functions with gfp_mask should use GFP_KERNEL or > >> * (gfp_mask & GFP_RECLAIM_MASK). In current implementatin, memcg > doesn't > >> @@ -159,6 +166,8 @@ static inline void > mem_cgroup_begin_update_page_stat(struct page *page, > >> { > >> if (mem_cgroup_disabled()) > >> return; > >> + if (!mem_cgroup_in_use()) > >> + return; > >> rcu_read_lock(); > >> *locked = false; > >> if (atomic_read(&memcg_moving)) > >> @@ -172,6 +181,8 @@ static inline void > mem_cgroup_end_update_page_stat(struct page *page, > >> { > >> if (mem_cgroup_disabled()) > >> return; > >> + if (!mem_cgroup_in_use()) > >> + return; > >> if (*locked) > >> __mem_cgroup_end_update_page_stat(page, flags); > >> rcu_read_unlock(); > >> @@ -215,6 +226,10 @@ void mem_cgroup_print_bad_page(struct page *page); > >> #endif > >> #else /* CONFIG_MEMCG */ > >> struct mem_cgroup; > >> +static inline bool mem_cgroup_in_use(void) > >> +{ > >> + return false; > >> +} > >> > >> static inline int mem_cgroup_newpage_charge(struct page *page, > >> struct mm_struct *mm, gfp_t > gfp_mask) > >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >> index 9126abc..a85f7c5 100644 > >> --- a/mm/memcontrol.c > >> +++ b/mm/memcontrol.c > >> @@ -463,6 +463,13 @@ enum res_type { > >> #define MEM_CGROUP_RECLAIM_SHRINK_BIT 0x1 > >> #define MEM_CGROUP_RECLAIM_SHRINK (1 << > MEM_CGROUP_RECLAIM_SHRINK_BIT) > >> > >> +/* static_key used for marking memcg in use or not. We use this jump > label to > >> + * patch some memcg page stat accounting code in or out. > >> + * The key will be increased when non-root memcg is created, and be > decreased > >> + * when memcg is destroyed. > >> + */ > >> +struct static_key memcg_inuse_key; > >> + > >> /* > >> * The memcg_create_mutex will be held whenever a new cgroup is > created. > >> * As a consequence, any change that needs to protect against new > child cgroups > >> @@ -630,10 +637,22 @@ static void disarm_kmem_keys(struct mem_cgroup > *memcg) > >> } > >> #endif /* CONFIG_MEMCG_KMEM */ > >> > >> +static void disarm_inuse_keys(struct mem_cgroup *memcg) > >> +{ > >> + if (!mem_cgroup_is_root(memcg)) > >> + static_key_slow_dec(&memcg_inuse_key); > >> +} > >> + > >> +static void arm_inuse_keys(void) > >> +{ > >> + static_key_slow_inc(&memcg_inuse_key); > >> +} > >> + > >> static void disarm_static_keys(struct mem_cgroup *memcg) > >> { > >> disarm_sock_keys(memcg); > >> disarm_kmem_keys(memcg); > >> + disarm_inuse_keys(memcg); > >> } > >> > >> static void drain_all_stock_async(struct mem_cgroup *memcg); > >> @@ -2298,7 +2317,6 @@ void mem_cgroup_update_page_stat(struct page > *page, > >> { > >> struct mem_cgroup *memcg; > >> struct page_cgroup *pc = lookup_page_cgroup(page); > >> - unsigned long uninitialized_var(flags); > >> > >> if (mem_cgroup_disabled()) > >> return; > >> @@ -6293,6 +6311,9 @@ mem_cgroup_css_online(struct cgroup *cont) > >> } > >> > >> error = memcg_init_kmem(memcg, &mem_cgroup_subsys); > >> + if (!error) > >> + arm_inuse_keys(); > >> + > >> mutex_unlock(&memcg_create_mutex); > >> return error; > >> } > >> -- > >> 1.7.9.5 > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe cgroups" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > Michal Hocko > > SUSE Labs > > > > -- > Thanks, > Sha > -- Thanks, Sha