From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 28 Aug 2008 19:44:54 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [RFC][PATCH 5/14] memcg: free page_cgroup by RCU Message-Id: <20080828194454.3fa6d0d0.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <48B678C2.8010807@linux.vnet.ibm.com> References: <20080822202720.b7977aab.kamezawa.hiroyu@jp.fujitsu.com> <20080822203457.d62e394d.kamezawa.hiroyu@jp.fujitsu.com> <48B678C2.8010807@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: balbir@linux.vnet.ibm.com Cc: "linux-mm@kvack.org" , "nishimura@mxp.nes.nec.co.jp" List-ID: On Thu, 28 Aug 2008 15:36:58 +0530 Balbir Singh wrote: > KAMEZAWA Hiroyuki wrote: > > Freeing page_cgroup by RCU. > > > > This makes access to page->page_cgroup as RCU-safe. > > > > In addition to freeing page_cgroup via RCU, we'll also need to use > rcu_assign_pointer() and rcu_dereference() to make the access RCU safe. > Yes. but it is not necessary until lockess patches. it's guarded by lock/unlock_page_cgroup(). So, I didin't mention it here. But ok, I'll write. > Oh! I just see that the next set of patches do the correct thing, could you > please write the change log correctly indicate that this patch release > page->page_cgroup via RCU. > Sorry, changelog will be fixed. > > Signed-off-by: KAMEZAWA Hiroyuki > > --- > > mm/memcontrol.c | 44 ++++++++++++++++++++++++++++++++++++-------- > > 1 file changed, 36 insertions(+), 8 deletions(-) > > > > Index: mmtom-2.6.27-rc3+/mm/memcontrol.c > > =================================================================== > > --- mmtom-2.6.27-rc3+.orig/mm/memcontrol.c > > +++ mmtom-2.6.27-rc3+/mm/memcontrol.c > > @@ -588,19 +588,23 @@ unsigned long mem_cgroup_isolate_pages(u > > * Free obsolete page_cgroups which is linked to per-cpu drop list. > > */ > > > > -static void __free_obsolete_page_cgroup(void) > > +struct page_cgroup_rcu_work { > > + struct rcu_head head; > > + struct page_cgroup *list; > > +}; > > + > > +static void __free_obsolete_page_cgroup_cb(struct rcu_head *head) > > { > > struct mem_cgroup *memcg; > > struct page_cgroup *pc, *next; > > struct mem_cgroup_per_zone *mz, *page_mz; > > - struct mem_cgroup_sink_list *mcsl; > > + struct page_cgroup_rcu_work *work; > > unsigned long flags; > > > > - mcsl = &get_cpu_var(memcg_sink_list); > > - next = mcsl->next; > > - mcsl->next = NULL; > > - mcsl->count = 0; > > - put_cpu_var(memcg_sink_list); > > + > > + work = container_of(head, struct page_cgroup_rcu_work, head); > > + next = work->list; > > What do we do with next here? I must be missing it, but where is the page_cgroup > released? > Maybe chasing diff is a bit complicated here. After this line, linked list of page_cgroup from 'next' will be freed one by one. (But this behavior may be changed in the next version.) > > + kfree(work); > > > > mz = NULL; > > > > @@ -627,6 +631,26 @@ static void __free_obsolete_page_cgroup( > > local_irq_restore(flags); > > } > > > > +static int __free_obsolete_page_cgroup(void) > > +{ > > + struct page_cgroup_rcu_work *work; > > + struct mem_cgroup_sink_list *mcsl; > > + > > + work = kmalloc(sizeof(*work), GFP_ATOMIC); > > + if (!work) > > + return -ENOMEM; > > + INIT_RCU_HEAD(&work->head); > > + > > + mcsl = &get_cpu_var(memcg_sink_list); > > + work->list = mcsl->next; > > + mcsl->next = NULL; > > + mcsl->count = 0; > > + put_cpu_var(memcg_sink_list); > > + > > + call_rcu(&work->head, __free_obsolete_page_cgroup_cb); > > + return 0; > > +} > > + > > I don't like this approach, seems complex, you allocate more memory in > GFP_ATOMIC context, so that free can be called from RCU context. > We should assume kmalloc() can return NULL. But I understand you don't like this. allow me consider more. > > static void free_obsolete_page_cgroup(struct page_cgroup *pc) > > { > > int count; > > @@ -649,13 +673,17 @@ static DEFINE_MUTEX(memcg_force_drain_mu > > > > static void mem_cgroup_local_force_drain(struct work_struct *work) > > { > > - __free_obsolete_page_cgroup(); > > + int ret; > > + do { > > + ret = __free_obsolete_page_cgroup(); > > We keep repeating till we get 0? > yes. this returns 0 or -ENOMEM. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org