From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx105.postini.com [74.125.245.105]) by kanga.kvack.org (Postfix) with SMTP id 7FEA86B0031 for ; Fri, 12 Jul 2013 09:13:58 -0400 (EDT) Received: by mail-bk0-f54.google.com with SMTP id it16so3792054bkc.27 for ; Fri, 12 Jul 2013 06:13:56 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <1373044710-27371-1-git-send-email-handai.szj@taobao.com> <1373045623-27712-1-git-send-email-handai.szj@taobao.com> <20130711145625.GK21667@dhcp22.suse.cz> Date: Fri, 12 Jul 2013 21:13:56 +0800 Message-ID: Subject: Re: [PATCH V4 5/6] memcg: patch mem_cgroup_{begin,end}_update_page_stat() out if only root memcg exists From: Sha Zhengju Content-Type: multipart/alternative; boundary=20cf302aceda7df8d604e15048fb Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: "linux-mm@kvack.org" , Cgroups , Greg Thelen , KAMEZAWA Hiroyuki , Andrew Morton , Wu Fengguang , Mel Gorman , glommer@gmail.com, Sha Zhengju --20cf302aceda7df8d604e15048fb Content-Type: text/plain; charset=ISO-8859-1 Ooops.... it seems unreachable, change Glauber's email... On Fri, Jul 12, 2013 at 8:59 PM, Sha Zhengju wrote: > Add cc to Glauber > > > On Thu, Jul 11, 2013 at 10:56 PM, Michal Hocko wrote: > > On Sat 06-07-13 01:33:43, Sha Zhengju wrote: > >> From: Sha Zhengju > >> > >> If memcg is enabled and no non-root memcg exists, all allocated > >> pages belongs to root_mem_cgroup and wil go through root memcg > >> statistics routines. So in order to reduce overheads after adding > >> memcg dirty/writeback accounting in hot paths, we use jump label to > >> patch mem_cgroup_{begin,end}_update_page_stat() in or out when not > >> used. > > > > I do not think this is enough. How much do you save? One atomic read. > > This doesn't seem like a killer. > > > > I hoped we could simply not account at all and move counters to the root > > cgroup once the label gets enabled. > > I have thought of this approach before, but it would probably run into > another issue, e.g, each zone has a percpu stock named ->pageset to > optimize the increment and decrement operations, and I haven't figure out a > simpler and cheaper approach to handle that stock numbers if moving global > counters to root cgroup, maybe we can just leave them and can afford the > approximation? > > Glauber have already done lots of works here, in his previous patchset he > also tried to move some global stats to root ( > http://comments.gmane.org/gmane.linux.kernel.cgroups/6291). May I steal > some of your ideas here, Glauber? :P > > > > > > > Besides that, the current patch is racy. Consider what happens when: > > > > mem_cgroup_begin_update_page_stat > > arm_inuse_keys > > > mem_cgroup_move_account > > mem_cgroup_move_account_page_stat > > mem_cgroup_end_update_page_stat > > > > The race window is small of course but it is there. I guess we need > > rcu_read_lock at least. > > Yes, you're right. I'm afraid we need to take care of the racy in the next > updates as well. But mem_cgroup_begin/end_update_page_stat() already have > rcu lock, so here we maybe only need a synchronize_rcu() after changing > memcg_inuse_key? > > > > > >> If no non-root memcg comes to life, we do not need to accquire moving > >> locks, so patch them out. > >> > >> cc: Michal Hocko > >> cc: Greg Thelen > >> cc: KAMEZAWA Hiroyuki > >> cc: Andrew Morton > >> cc: Fengguang Wu > >> cc: Mel Gorman > >> --- > >> include/linux/memcontrol.h | 15 +++++++++++++++ > >> mm/memcontrol.c | 23 ++++++++++++++++++++++- > >> 2 files changed, 37 insertions(+), 1 deletion(-) > >> > >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > >> index ccd35d8..0483e1a 100644 > >> --- a/include/linux/memcontrol.h > >> +++ b/include/linux/memcontrol.h > >> @@ -55,6 +55,13 @@ struct mem_cgroup_reclaim_cookie { > >> }; > >> > >> #ifdef CONFIG_MEMCG > >> + > >> +extern struct static_key memcg_inuse_key; > >> +static inline bool mem_cgroup_in_use(void) > >> +{ > >> + return static_key_false(&memcg_inuse_key); > >> +} > >> + > >> /* > >> * All "charge" functions with gfp_mask should use GFP_KERNEL or > >> * (gfp_mask & GFP_RECLAIM_MASK). In current implementatin, memcg > doesn't > >> @@ -159,6 +166,8 @@ static inline void > mem_cgroup_begin_update_page_stat(struct page *page, > >> { > >> if (mem_cgroup_disabled()) > >> return; > >> + if (!mem_cgroup_in_use()) > >> + return; > >> rcu_read_lock(); > >> *locked = false; > >> if (atomic_read(&memcg_moving)) > >> @@ -172,6 +181,8 @@ static inline void > mem_cgroup_end_update_page_stat(struct page *page, > >> { > >> if (mem_cgroup_disabled()) > >> return; > >> + if (!mem_cgroup_in_use()) > >> + return; > >> if (*locked) > >> __mem_cgroup_end_update_page_stat(page, flags); > >> rcu_read_unlock(); > >> @@ -215,6 +226,10 @@ void mem_cgroup_print_bad_page(struct page *page); > >> #endif > >> #else /* CONFIG_MEMCG */ > >> struct mem_cgroup; > >> +static inline bool mem_cgroup_in_use(void) > >> +{ > >> + return false; > >> +} > >> > >> static inline int mem_cgroup_newpage_charge(struct page *page, > >> struct mm_struct *mm, gfp_t > gfp_mask) > >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >> index 9126abc..a85f7c5 100644 > >> --- a/mm/memcontrol.c > >> +++ b/mm/memcontrol.c > >> @@ -463,6 +463,13 @@ enum res_type { > >> #define MEM_CGROUP_RECLAIM_SHRINK_BIT 0x1 > >> #define MEM_CGROUP_RECLAIM_SHRINK (1 << > MEM_CGROUP_RECLAIM_SHRINK_BIT) > >> > >> +/* static_key used for marking memcg in use or not. We use this jump > label to > >> + * patch some memcg page stat accounting code in or out. > >> + * The key will be increased when non-root memcg is created, and be > decreased > >> + * when memcg is destroyed. > >> + */ > >> +struct static_key memcg_inuse_key; > >> + > >> /* > >> * The memcg_create_mutex will be held whenever a new cgroup is > created. > >> * As a consequence, any change that needs to protect against new > child cgroups > >> @@ -630,10 +637,22 @@ static void disarm_kmem_keys(struct mem_cgroup > *memcg) > >> } > >> #endif /* CONFIG_MEMCG_KMEM */ > >> > >> +static void disarm_inuse_keys(struct mem_cgroup *memcg) > >> +{ > >> + if (!mem_cgroup_is_root(memcg)) > >> + static_key_slow_dec(&memcg_inuse_key); > >> +} > >> + > >> +static void arm_inuse_keys(void) > >> +{ > >> + static_key_slow_inc(&memcg_inuse_key); > >> +} > >> + > >> static void disarm_static_keys(struct mem_cgroup *memcg) > >> { > >> disarm_sock_keys(memcg); > >> disarm_kmem_keys(memcg); > >> + disarm_inuse_keys(memcg); > >> } > >> > >> static void drain_all_stock_async(struct mem_cgroup *memcg); > >> @@ -2298,7 +2317,6 @@ void mem_cgroup_update_page_stat(struct page > *page, > >> { > >> struct mem_cgroup *memcg; > >> struct page_cgroup *pc = lookup_page_cgroup(page); > >> - unsigned long uninitialized_var(flags); > >> > >> if (mem_cgroup_disabled()) > >> return; > >> @@ -6293,6 +6311,9 @@ mem_cgroup_css_online(struct cgroup *cont) > >> } > >> > >> error = memcg_init_kmem(memcg, &mem_cgroup_subsys); > >> + if (!error) > >> + arm_inuse_keys(); > >> + > >> mutex_unlock(&memcg_create_mutex); > >> return error; > >> } > >> -- > >> 1.7.9.5 > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe cgroups" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > Michal Hocko > > SUSE Labs > > > > -- > Thanks, > Sha > -- Thanks, Sha --20cf302aceda7df8d604e15048fb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Ooops.... it seems unreachable, change Glauber's email= ...


O= n Fri, Jul 12, 2013 at 8:59 PM, Sha Zhengju <handai.szj@gmail.com= > wrote:
Add cc to Glauber


On Thu, Jul 11, 2013 at 10:56 PM, Michal Hocko <mhocko@suse.cz> wro= te:
> On Sat 06-07-13 01:33:43, Sha Zhengju wrote:
>> From: Sha Zhe= ngju <handai.= szj@taobao.com>
>>
>> If memcg is enabled and no non-root memcg exists, all = allocated
>> pages belongs to root_mem_cgroup and wil go through r= oot memcg
>> statistics routines. =A0So in order to reduce overhea= ds after adding
>> memcg dirty/writeback accounting in hot paths, we use jump label t= o
>> patch mem_cgroup_{begin,end}_update_page_stat() in or out whe= n not
>> used.
>
> I do not think this is enough. How = much do you save? One atomic read.
> This doesn't seem like a killer.
>
> I hoped we could = simply not account at all and move counters to the root
> cgroup once= the label gets enabled.

I have thought of this approach befor= e, but it would probably run into another issue, e.g, each zone has a percp= u stock named ->pageset to optimize the increment and decrement operatio= ns, and I haven't figure out a simpler and cheaper approach to handle t= hat stock numbers if moving global counters to root cgroup, maybe we can ju= st leave them and can afford the approximation?

Glauber have already done lots of works here, in his previous patchset = he also tried to move some global stats to root (http://commen= ts.gmane.org/gmane.linux.kernel.cgroups/6291). May I steal some of your= ideas here, Glauber? :P



>
> Besides that, the current patch is racy. Consider what= happens when:
>
> mem_cgroup_begin_update_page_stat
> = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 arm_inuse_keys
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 mem_cgroup= _move_account
> mem_cgroup_move_account_page_stat
> mem_cgroup_end_update_page_s= tat
>
> The race window is small of course but it is there. I g= uess we need
> rcu_read_lock at least.

Yes, you= 9;re right. I'm afraid we need to take care of the racy in the next upd= ates as well. But mem_cgroup_begin/end_update_page_stat() already have rcu = lock, so here we maybe only need a synchronize_rcu() after changing memcg_i= nuse_key?


>
>> If no non-root me= mcg comes to life, we do not need to accquire moving
>> locks, so = patch them out.
>>
>> cc: Michal Hocko <mhocko@suse.cz>
>> cc: Greg Thelen <gthelen@google.com>
>> cc: KAMEZAWA Hiroyuki &l= t;kamez= awa.hiroyu@jp.fujitsu.com>
>> cc: Andrew Morton <akpm@linux-foundation.org>
>> cc: Fengguang Wu <fengguang.wu@intel.com>
>> cc: Mel Gorman &l= t;mgorman@suse.de&= gt;
>> ---
>> =A0include/linux/memcontrol.h | =A0 15 +++++++++++= ++++
>> =A0mm/memcontrol.c =A0 =A0 =A0 =A0 =A0 =A0| =A0 23 +++++++++++++++= +++++++-
>> =A02 files changed, 37 insertions(+), 1 deletion(-)>>
>> diff --git a/include/linux/memcontrol.h b/include/lin= ux/memcontrol.h
>> index ccd35d8..0483e1a 100644
>> --- a/include/linux/memc= ontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -55,6 = +55,13 @@ struct mem_cgroup_reclaim_cookie {
>> =A0};
>><= br> >> =A0#ifdef CONFIG_MEMCG
>> +
>> +extern struct st= atic_key memcg_inuse_key;
>> +static inline bool mem_cgroup_in_use= (void)
>> +{
>> + =A0 =A0 return static_key_false(&me= mcg_inuse_key);
>> +}
>> +
>> =A0/*
>> =A0 * All "cha= rge" functions with gfp_mask should use GFP_KERNEL or
>> =A0 = * (gfp_mask & GFP_RECLAIM_MASK). In current implementatin, memcg doesn&= #39;t
>> @@ -159,6 +166,8 @@ static inline void mem_cgroup_begin_update_pag= e_stat(struct page *page,
>> =A0{
>> =A0 =A0 =A0 if (mem_= cgroup_disabled())
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
>&= gt; + =A0 =A0 if (!mem_cgroup_in_use())
>> + =A0 =A0 =A0 =A0 =A0 =A0 return;
>> =A0 =A0 =A0 rcu_read= _lock();
>> =A0 =A0 =A0 *locked =3D false;
>> =A0 =A0 =A0= if (atomic_read(&memcg_moving))
>> @@ -172,6 +181,8 @@ static= inline void mem_cgroup_end_update_page_stat(struct page *page,
>> =A0{
>> =A0 =A0 =A0 if (mem_cgroup_disabled())
>>= ; =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
>> + =A0 =A0 if (!mem_cgroup= _in_use())
>> + =A0 =A0 =A0 =A0 =A0 =A0 return;
>> =A0 = =A0 =A0 if (*locked)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 __mem_cgroup_e= nd_update_page_stat(page, flags);
>> =A0 =A0 =A0 rcu_read_unlock();
>> @@ -215,6 +226,10 @@ vo= id mem_cgroup_print_bad_page(struct page *page);
>> =A0#endif
&= gt;> =A0#else /* CONFIG_MEMCG */
>> =A0struct mem_cgroup;
&g= t;> +static inline bool mem_cgroup_in_use(void)
>> +{
>> + =A0 =A0 return false;
>> +}
>><= br>>> =A0static inline int mem_cgroup_newpage_charge(struct page *pag= e,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 struct mm_struct *mm, gfp_t gfp_mask)
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 9= 126abc..a85f7c5 100644
>> --- a/mm/memcontrol.c
>> +++ b/= mm/memcontrol.c
>> @@ -463,6 +463,13 @@ enum res_type {
>>= ; =A0#define MEM_CGROUP_RECLAIM_SHRINK_BIT =A0 =A0 =A0 =A00x1
>> =A0#define MEM_CGROUP_RECLAIM_SHRINK =A0 =A0(1 << MEM_CGROUP= _RECLAIM_SHRINK_BIT)
>>
>> +/* static_key used for markin= g memcg in use or not. We use this jump label to
>> + * patch some= memcg page stat accounting code in or out.
>> + * The key will be increased when non-root memcg is created, and = be decreased
>> + * when memcg is destroyed.
>> + */
&= gt;> +struct static_key memcg_inuse_key;
>> +
>> =A0/*=
>> =A0 * The memcg_create_mutex will be held whenever a new cgroup is= created.
>> =A0 * As a consequence, any change that needs to prot= ect against new child cgroups
>> @@ -630,10 +637,22 @@ static void= disarm_kmem_keys(struct mem_cgroup *memcg)
>> =A0}
>> =A0#endif /* CONFIG_MEMCG_KMEM */
>>
= >> +static void disarm_inuse_keys(struct mem_cgroup *memcg)
>&g= t; +{
>> + =A0 =A0 if (!mem_cgroup_is_root(memcg))
>> + = =A0 =A0 =A0 =A0 =A0 =A0 static_key_slow_dec(&memcg_inuse_key);
>> +}
>> +
>> +static void arm_inuse_keys(void)
= >> +{
>> + =A0 =A0 static_key_slow_inc(&memcg_inuse_key)= ;
>> +}
>> +
>> =A0static void disarm_static_key= s(struct mem_cgroup *memcg)
>> =A0{
>> =A0 =A0 =A0 disarm_sock_keys(memcg);
>> = =A0 =A0 =A0 disarm_kmem_keys(memcg);
>> + =A0 =A0 disarm_inuse_key= s(memcg);
>> =A0}
>>
>> =A0static void drain_all= _stock_async(struct mem_cgroup *memcg);
>> @@ -2298,7 +2317,6 @@ void mem_cgroup_update_page_stat(struct page= *page,
>> =A0{
>> =A0 =A0 =A0 struct mem_cgroup *memcg;<= br>>> =A0 =A0 =A0 struct page_cgroup *pc =3D lookup_page_cgroup(page)= ;
>> - =A0 =A0 unsigned long uninitialized_var(flags);
>>
>> =A0 =A0 =A0 if (mem_cgroup_disabled())
>> =A0= =A0 =A0 =A0 =A0 =A0 =A0 return;
>> @@ -6293,6 +6311,9 @@ mem_cgro= up_css_online(struct cgroup *cont)
>> =A0 =A0 =A0 }
>>>> =A0 =A0 =A0 error =3D memcg_init_kmem(memcg, &mem_cgroup_subs= ys);
>> + =A0 =A0 if (!error)
>> + =A0 =A0 =A0 =A0 =A0 =A0 arm_in= use_keys();
>> +
>> =A0 =A0 =A0 mutex_unlock(&memcg_c= reate_mutex);
>> =A0 =A0 =A0 return error;
>> =A0}
>= ;> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the= line "unsubscribe cgroups" in
>> the body of a message = to majordomo= @vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html=
>
> --
> Michal Hocko
> SUSE Labs


<= br>
--
Thanks,
Sha



--
Thanks,
Sha
--20cf302aceda7df8d604e15048fb-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org