* [PATCH 0/6 v4] memcg: page cgroup diet
@ 2012-02-14 3:04 KAMEZAWA Hiroyuki
2012-02-14 3:06 ` [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki
` (5 more replies)
0 siblings, 6 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-14 3:04 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins
Here is v4. I removed RFC and fixed a fatal bug in v3.
This patch series is a preparetion for removing page_cgroup->flags. To remove
it, we need to reduce flags on page_cgroup. In this set, PCG_MOVE_LOCK and
PCG_FILE_MAPPED are remvoed.
After this, we only have 3 bits.
==
enum {
/* flags for mem_cgroup */
PCG_LOCK, /* Lock for pc->mem_cgroup and following bits. */
PCG_USED, /* this object is in use. */
PCG_MIGRATION, /* under page migration */
__NR_PCG_FLAGS,
};
==
I'll make a try to merge this 3bits to lower 3bits of pointer to
memcg. Then, we can make page_cgroup 8(4)bytes per page.
Major changes since v3 are
- replaced move_lock_page_cgroup() with move_lock_mem_cgroup().
passes pointer to memcg rather than page_cgroup.
Working on linux-next feb13 and passed several tests.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) 2012-02-14 3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki @ 2012-02-14 3:06 ` KAMEZAWA Hiroyuki 2012-02-14 7:21 ` Greg Thelen 2012-02-14 3:07 ` [PATCH 2/6 v4] memcg: simplify move_account() check KAMEZAWA Hiroyuki ` (4 subsequent siblings) 5 siblings, 1 reply; 16+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-02-14 3:06 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins This is just a cleanup. == From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Date: Thu, 2 Feb 2012 12:05:41 +0900 Subject: [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) >From the log, I guess EXPORT was for preparing dirty accounting. But _now_, we don't need to export this. Remove this for now. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/memcontrol.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ab315ab..4c2b759 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1897,7 +1897,6 @@ out: move_unlock_page_cgroup(pc, &flags); rcu_read_unlock(); } -EXPORT_SYMBOL(mem_cgroup_update_page_stat); /* * size of first charge trial. "32" comes from vmscan.c's magic value. -- 1.7.4.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) 2012-02-14 3:06 ` [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki @ 2012-02-14 7:21 ` Greg Thelen 0 siblings, 0 replies; 16+ messages in thread From: Greg Thelen @ 2012-02-14 7:21 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins On Mon, Feb 13, 2012 at 7:06 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > This is just a cleanup. > == > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Date: Thu, 2 Feb 2012 12:05:41 +0900 > Subject: [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) > > From the log, I guess EXPORT was for preparing dirty accounting. > But _now_, we don't need to export this. Remove this for now. > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Looks good to me. Reviewed-by: Greg Thelen <gthelen@google.com> > --- > mm/memcontrol.c | 1 - > 1 files changed, 0 insertions(+), 1 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ab315ab..4c2b759 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1897,7 +1897,6 @@ out: > move_unlock_page_cgroup(pc, &flags); > rcu_read_unlock(); > } > -EXPORT_SYMBOL(mem_cgroup_update_page_stat); > > /* > * size of first charge trial. "32" comes from vmscan.c's magic value. > -- > 1.7.4.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/6 v4] memcg: simplify move_account() check 2012-02-14 3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki 2012-02-14 3:06 ` [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki @ 2012-02-14 3:07 ` KAMEZAWA Hiroyuki 2012-02-14 7:21 ` Greg Thelen 2012-02-14 3:13 ` [PATCH 3/6 v4] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki ` (3 subsequent siblings) 5 siblings, 1 reply; 16+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-02-14 3:07 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/6 v4] memcg: simplify move_account() check 2012-02-14 3:07 ` [PATCH 2/6 v4] memcg: simplify move_account() check KAMEZAWA Hiroyuki @ 2012-02-14 7:21 ` Greg Thelen 2012-02-14 8:44 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 16+ messages in thread From: Greg Thelen @ 2012-02-14 7:21 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins On Mon, Feb 13, 2012 at 7:07 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > From 9cdb3b63dc8d08cc2220c54c80438c13433a0d12 Mon Sep 17 00:00:00 2001 > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Date: Thu, 2 Feb 2012 10:02:39 +0900 > Subject: [PATCH 2/6] memcg: simplify move_account() check. > > In memcg, for avoiding take-lock-irq-off at accessing page_cgroup, > a logic, flag + rcu_read_lock(), is used. This works as following > > CPU-A CPU-B > rcu_read_lock() > set flag > if(flag is set) > take heavy lock > do job. > synchronize_rcu() rcu_read_unlock() I assume that CPU-A will take heavy lock after synchronize_rcu() when updating variables read by CPU-B. > memcontrol.c | 65 ++++++++++++++++++++++------------------------------------- > 1 file changed, 25 insertions(+), 40 deletions(-) > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Greg Thelen <gthelen@google.com> > --- > mm/memcontrol.c | 70 +++++++++++++++++++++++------------------------------- > 1 files changed, 30 insertions(+), 40 deletions(-) > @@ -2089,11 +2082,8 @@ static int __cpuinit memcg_cpu_hotplug_callback(struct notifier_block *nb, > struct memcg_stock_pcp *stock; > struct mem_cgroup *iter; > > - if ((action == CPU_ONLINE)) { > - for_each_mem_cgroup(iter) > - synchronize_mem_cgroup_on_move(iter, cpu); > + if ((action == CPU_ONLINE)) Extra parenthesis. I recommend: + if (action == CPU_ONLINE) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/6 v4] memcg: simplify move_account() check 2012-02-14 7:21 ` Greg Thelen @ 2012-02-14 8:44 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 16+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-02-14 8:44 UTC (permalink / raw) To: Greg Thelen Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins On Mon, 13 Feb 2012 23:21:34 -0800 Greg Thelen <gthelen@google.com> wrote: > On Mon, Feb 13, 2012 at 7:07 PM, KAMEZAWA Hiroyuki > <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > From 9cdb3b63dc8d08cc2220c54c80438c13433a0d12 Mon Sep 17 00:00:00 2001 > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > Date: Thu, 2 Feb 2012 10:02:39 +0900 > > Subject: [PATCH 2/6] memcg: simplify move_account() check. > > > > In memcg, for avoiding take-lock-irq-off at accessing page_cgroup, > > a logic, flag + rcu_read_lock(), is used. This works as following > > > > A A CPU-A A A A A A A A A A A CPU-B > > A A A A A A A A A A A A A A rcu_read_lock() > > A A set flag > > A A A A A A A A A A A A A A if(flag is set) > > A A A A A A A A A A A A A A A A A take heavy lock > > A A A A A A A A A A A A A A do job. > > A A synchronize_rcu() A A A A rcu_read_unlock() > > I assume that CPU-A will take heavy lock after synchronize_rcu() when > updating variables read by CPU-B. > Ah, yes. I should wrote that. > > A memcontrol.c | A 65 ++++++++++++++++++++++------------------------------------- > > A 1 file changed, 25 insertions(+), 40 deletions(-) > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > Acked-by: Greg Thelen <gthelen@google.com> > Thank you!. -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/6 v4] memcg: remove PCG_MOVE_LOCK flag from page_cgroup. 2012-02-14 3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki 2012-02-14 3:06 ` [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki 2012-02-14 3:07 ` [PATCH 2/6 v4] memcg: simplify move_account() check KAMEZAWA Hiroyuki @ 2012-02-14 3:13 ` KAMEZAWA Hiroyuki 2012-02-14 7:21 ` Greg Thelen 2012-02-14 3:14 ` [PATCH 4/6 v4] memcg: use new logic for page stat accounting KAMEZAWA Hiroyuki ` (2 subsequent siblings) 5 siblings, 1 reply; 16+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-02-14 3:13 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/6 v4] memcg: remove PCG_MOVE_LOCK flag from page_cgroup. 2012-02-14 3:13 ` [PATCH 3/6 v4] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki @ 2012-02-14 7:21 ` Greg Thelen 0 siblings, 0 replies; 16+ messages in thread From: Greg Thelen @ 2012-02-14 7:21 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins On Mon, Feb 13, 2012 at 7:13 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > From ffd1b013fe294a80c12e3f30e85386135a6fb284 Mon Sep 17 00:00:00 2001 > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Date: Thu, 2 Feb 2012 11:49:59 +0900 > Subject: [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup. > > PCG_MOVE_LOCK is used for bit spinlock to avoid race between overwriting > pc->mem_cgroup and page statistics accounting per memcg. > This lock helps to avoid the race but the race is very rare because moving > tasks between cgroup is not a usual job. > So, it seems using 1bit per page is too costly. > > This patch changes this lock as per-memcg spinlock and removes PCG_MOVE_LOCK. > > If smaller lock is required, we'll be able to add some hashes but > I'd like to start from this. > > Changelog: > - fixed to pass memcg as an argument rather than page_cgroup. > and renamed from move_lock_page_cgroup() to move_lock_mem_cgroup() > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Seems good. Thanks. Acked-by: Greg Thelen <gthelen@google.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 4/6 v4] memcg: use new logic for page stat accounting 2012-02-14 3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki ` (2 preceding siblings ...) 2012-02-14 3:13 ` [PATCH 3/6 v4] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki @ 2012-02-14 3:14 ` KAMEZAWA Hiroyuki 2012-02-14 7:22 ` Greg Thelen 2012-02-14 3:15 ` [PATCH 5/6 v4] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki 2012-02-14 3:16 ` [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat() KAMEZAWA Hiroyuki 5 siblings, 1 reply; 16+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-02-14 3:14 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/6 v4] memcg: use new logic for page stat accounting 2012-02-14 3:14 ` [PATCH 4/6 v4] memcg: use new logic for page stat accounting KAMEZAWA Hiroyuki @ 2012-02-14 7:22 ` Greg Thelen 2012-02-14 8:43 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 16+ messages in thread From: Greg Thelen @ 2012-02-14 7:22 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins On Mon, Feb 13, 2012 at 7:14 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > From ad2905362ef58a44d96a325193ab384739418050 Mon Sep 17 00:00:00 2001 > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Date: Thu, 2 Feb 2012 11:49:59 +0900 > Subject: [PATCH 4/6] memcg: use new logic for page stat accounting. > > Now, page-stat-per-memcg is recorded into per page_cgroup flag by > duplicating page's status into the flag. The reason is that memcg > has a feature to move a page from a group to another group and we > have race between "move" and "page stat accounting", > > Under current logic, assume CPU-A and CPU-B. CPU-A does "move" > and CPU-B does "page stat accounting". > > When CPU-A goes 1st, > > CPU-A CPU-B > update "struct page" info. > move_lock_mem_cgroup(memcg) > see flags pc->flags? > copy page stat to new group > overwrite pc->mem_cgroup. > move_unlock_mem_cgroup(memcg) > move_lock_mem_cgroup(mem) > set pc->flags > update page stat accounting > move_unlock_mem_cgroup(mem) > > stat accounting is guarded by move_lock_mem_cgroup() and "move" > logic (CPU-A) doesn't see changes in "struct page" information. > > But it's costly to have the same information both in 'struct page' and > 'struct page_cgroup'. And, there is a potential problem. > > For example, assume we have PG_dirty accounting in memcg. > PG_..is a flag for struct page. > PCG_ is a flag for struct page_cgroup. > (This is just an example. The same problem can be found in any > kind of page stat accounting.) > > CPU-A CPU-B > TestSet PG_dirty > (delay) TestClear PG_dirty_ PG_dirty > if (TestClear(PCG_dirty)) > memcg->nr_dirty-- > if (TestSet(PCG_dirty)) > memcg->nr_dirty++ > > @@ -141,6 +141,31 @@ static inline bool mem_cgroup_disabled(void) > return false; > } > > +void __mem_cgroup_begin_update_page_stat(struct page *page, > + bool *lock, unsigned long *flags); > + > +static inline void mem_cgroup_begin_update_page_stat(struct page *page, > + bool *lock, unsigned long *flags) > +{ > + if (mem_cgroup_disabled()) > + return; > + rcu_read_lock(); > + *lock = false; This seems like a strange place to set *lock=false. I think it's clearer if __mem_cgroup_begin_update_page_stat() is the only routine that sets or clears *lock. But I do see that in patch 6/6 'memcg: fix performance of mem_cgroup_begin_update_page_stat()' this position is required. > + return __mem_cgroup_begin_update_page_stat(page, lock, flags); > +} > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ecf8856..30afea5 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1877,32 +1877,54 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask) > * If there is, we take a lock. > */ > > +void __mem_cgroup_begin_update_page_stat(struct page *page, > + bool *lock, unsigned long *flags) > +{ > + struct mem_cgroup *memcg; > + struct page_cgroup *pc; > + > + pc = lookup_page_cgroup(page); > +again: > + memcg = pc->mem_cgroup; > + if (unlikely(!memcg || !PageCgroupUsed(pc))) > + return; > + if (!mem_cgroup_stealed(memcg)) > + return; > + > + move_lock_mem_cgroup(memcg, flags); > + if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) { > + move_unlock_mem_cgroup(memcg, flags); > + goto again; > + } > + *lock = true; > +} > + > +void __mem_cgroup_end_update_page_stat(struct page *page, > + bool *lock, unsigned long *flags) 'lock' looks like an unused parameter. If so, then remove it. > +{ > + struct page_cgroup *pc = lookup_page_cgroup(page); > + > + /* > + * It's guaranteed that pc->mem_cgroup never changes while > + * lock is held Please continue comment describing what provides this guarantee. I assume it is because rcu_read_lock() is held by mem_cgroup_begin_update_page_stat(). Maybe it's best to to just make small reference to the locking protocol description in mem_cgroup_start_move(). > + */ > + move_unlock_mem_cgroup(pc->mem_cgroup, flags); > +} > + > + I think it would be useful to add a small comment here declaring that all callers of this routine must be in a mem_cgroup_begin_update_page_stat(), mem_cgroup_end_update_page_stat() critical section to keep pc->mem_cgroup stable. > void mem_cgroup_update_page_stat(struct page *page, > enum mem_cgroup_page_stat_item idx, int val) > { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/6 v4] memcg: use new logic for page stat accounting 2012-02-14 7:22 ` Greg Thelen @ 2012-02-14 8:43 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 16+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-02-14 8:43 UTC (permalink / raw) To: Greg Thelen Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins On Mon, 13 Feb 2012 23:22:22 -0800 Greg Thelen <gthelen@google.com> wrote: > On Mon, Feb 13, 2012 at 7:14 PM, KAMEZAWA Hiroyuki > <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > From ad2905362ef58a44d96a325193ab384739418050 Mon Sep 17 00:00:00 2001 > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > Date: Thu, 2 Feb 2012 11:49:59 +0900 > > Subject: [PATCH 4/6] memcg: use new logic for page stat accounting. > > > > Now, page-stat-per-memcg is recorded into per page_cgroup flag by > > duplicating page's status into the flag. The reason is that memcg > > has a feature to move a page from a group to another group and we > > have race between "move" and "page stat accounting", > > > > Under current logic, assume CPU-A and CPU-B. CPU-A does "move" > > and CPU-B does "page stat accounting". > > > > When CPU-A goes 1st, > > > > A A A A A A CPU-A A A A A A A A A A A A A A CPU-B > > A A A A A A A A A A A A A A A A A A update "struct page" info. > > A A move_lock_mem_cgroup(memcg) > > A A see flags > > pc->flags? > yes. > > A A copy page stat to new group > > A A overwrite pc->mem_cgroup. > > A A move_unlock_mem_cgroup(memcg) > > A A A A A A A A A A A A A A A A A A move_lock_mem_cgroup(mem) > > A A A A A A A A A A A A A A A A A A set pc->flags > > A A A A A A A A A A A A A A A A A A update page stat accounting > > A A A A A A A A A A A A A A A A A A move_unlock_mem_cgroup(mem) > > > > stat accounting is guarded by move_lock_mem_cgroup() and "move" > > logic (CPU-A) doesn't see changes in "struct page" information. > > > > But it's costly to have the same information both in 'struct page' and > > 'struct page_cgroup'. And, there is a potential problem. > > > > For example, assume we have PG_dirty accounting in memcg. > > PG_..is a flag for struct page. > > PCG_ is a flag for struct page_cgroup. > > (This is just an example. The same problem can be found in any > > A kind of page stat accounting.) > > > > A A A A A CPU-A A A A A A A A A A A A A A A A CPU-B > > A A A TestSet PG_dirty > > A A A (delay) A A A A A A A A A A A A TestClear PG_dirty_ > > PG_dirty > > > A A A A A A A A A A A A A A A A A A if (TestClear(PCG_dirty)) > > A A A A A A A A A A A A A A A A A A A A A memcg->nr_dirty-- > > A A A if (TestSet(PCG_dirty)) > > A A A A A memcg->nr_dirty++ > > > > > @@ -141,6 +141,31 @@ static inline bool mem_cgroup_disabled(void) > > A A A A return false; > > A } > > > > +void __mem_cgroup_begin_update_page_stat(struct page *page, > > + A A A A A A A A A A A A A A A A A A A bool *lock, unsigned long *flags); > > + > > +static inline void mem_cgroup_begin_update_page_stat(struct page *page, > > + A A A A A A A A A A A A A A A A A A A bool *lock, unsigned long *flags) > > +{ > > + A A A if (mem_cgroup_disabled()) > > + A A A A A A A return; > > + A A A rcu_read_lock(); > > + A A A *lock = false; > > This seems like a strange place to set *lock=false. I think it's > clearer if __mem_cgroup_begin_update_page_stat() is the only routine > that sets or clears *lock. But I do see that in patch 6/6 'memcg: fix > performance of mem_cgroup_begin_update_page_stat()' this position is > required. > Ah, yes. Hmm, it was better to move this to the body of function. > > + A A A return __mem_cgroup_begin_update_page_stat(page, lock, flags); > > +} > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index ecf8856..30afea5 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1877,32 +1877,54 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask) > > A * If there is, we take a lock. > > A */ > > > > +void __mem_cgroup_begin_update_page_stat(struct page *page, > > + A A A A A A A A A A A A A A A bool *lock, unsigned long *flags) > > +{ > > + A A A struct mem_cgroup *memcg; > > + A A A struct page_cgroup *pc; > > + > > + A A A pc = lookup_page_cgroup(page); > > +again: > > + A A A memcg = pc->mem_cgroup; > > + A A A if (unlikely(!memcg || !PageCgroupUsed(pc))) > > + A A A A A A A return; > > + A A A if (!mem_cgroup_stealed(memcg)) > > + A A A A A A A return; > > + > > + A A A move_lock_mem_cgroup(memcg, flags); > > + A A A if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) { > > + A A A A A A A move_unlock_mem_cgroup(memcg, flags); > > + A A A A A A A goto again; > > + A A A } > > + A A A *lock = true; > > +} > > + > > +void __mem_cgroup_end_update_page_stat(struct page *page, > > + A A A A A A A A A A A A A A A bool *lock, unsigned long *flags) > > 'lock' looks like an unused parameter. If so, then remove it. > Ok. > > +{ > > + A A A struct page_cgroup *pc = lookup_page_cgroup(page); > > + > > + A A A /* > > + A A A A * It's guaranteed that pc->mem_cgroup never changes while > > + A A A A * lock is held > > Please continue comment describing what provides this guarantee. I > assume it is because rcu_read_lock() is held by > mem_cgroup_begin_update_page_stat(). Maybe it's best to to just make > small reference to the locking protocol description in > mem_cgroup_start_move(). > Ok, I will update this. > > + A A A A */ > > + A A A move_unlock_mem_cgroup(pc->mem_cgroup, flags); > > +} > > + > > + > > I think it would be useful to add a small comment here declaring that > all callers of this routine must be in a > mem_cgroup_begin_update_page_stat(), mem_cgroup_end_update_page_stat() > critical section to keep pc->mem_cgroup stable. > Sure, will do. Thank you for review. -Kame > > A void mem_cgroup_update_page_stat(struct page *page, > > A A A A A A A A A A A A A A A A enum mem_cgroup_page_stat_item idx, int val) > > A { > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 5/6 v4] memcg: remove PCG_FILE_MAPPED 2012-02-14 3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki ` (3 preceding siblings ...) 2012-02-14 3:14 ` [PATCH 4/6 v4] memcg: use new logic for page stat accounting KAMEZAWA Hiroyuki @ 2012-02-14 3:15 ` KAMEZAWA Hiroyuki 2012-02-14 7:22 ` Greg Thelen 2012-02-14 3:16 ` [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat() KAMEZAWA Hiroyuki 5 siblings, 1 reply; 16+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-02-14 3:15 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/6 v4] memcg: remove PCG_FILE_MAPPED 2012-02-14 3:15 ` [PATCH 5/6 v4] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki @ 2012-02-14 7:22 ` Greg Thelen 0 siblings, 0 replies; 16+ messages in thread From: Greg Thelen @ 2012-02-14 7:22 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins On Mon, Feb 13, 2012 at 7:15 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > From 96407a510d5212179a4768f28591b35d5c17d403 Mon Sep 17 00:00:00 2001 > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Date: Thu, 2 Feb 2012 15:02:18 +0900 > Subject: [PATCH 5/6] memcg: remove PCG_FILE_MAPPED > > with new lock scheme for updating memcg's page stat, we don't need > a flag PCG_FILE_MAPPED which was duplicated information of page_mapped(). > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Thanks! Acked-by: Greg Thelen <gthelen@google.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat() 2012-02-14 3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki ` (4 preceding siblings ...) 2012-02-14 3:15 ` [PATCH 5/6 v4] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki @ 2012-02-14 3:16 ` KAMEZAWA Hiroyuki 2012-02-14 7:22 ` Greg Thelen 5 siblings, 1 reply; 16+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-02-14 3:16 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat() 2012-02-14 3:16 ` [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat() KAMEZAWA Hiroyuki @ 2012-02-14 7:22 ` Greg Thelen 2012-02-14 11:08 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 16+ messages in thread From: Greg Thelen @ 2012-02-14 7:22 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins On Mon, Feb 13, 2012 at 7:16 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > From 3377fd7b6e23a5d2a368c078eae27e2b49c4f4aa Mon Sep 17 00:00:00 2001 > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Date: Mon, 6 Feb 2012 12:14:47 +0900 > Subject: [PATCH 6/6] memcg: fix performance of mem_cgroup_begin_update_page_stat() > > mem_cgroup_begin_update_page_stat() should be very fast because > it's called very frequently. Now, it needs to look up page_cgroup > and its memcg....this is slow. > > This patch adds a global variable to check "a memcg is moving or not". s/a memcg/any memcg/ > By this, the caller doesn't need to visit page_cgroup and memcg. s/By/With/ > Here is a test result. A test program makes page faults onto a file, > MAP_SHARED and makes each page's page_mapcount(page) > 1, and free > the range by madvise() and page fault again. This program causes > 26214400 times of page fault onto a file(size was 1G.) and shows > shows the cost of mem_cgroup_begin_update_page_stat(). Out of curiosity, what is the performance of the mmap program before this series? > Before this patch for mem_cgroup_begin_update_page_stat() > [kamezawa@bluextal test]$ time ./mmap 1G > > real 0m21.765s > user 0m5.999s > sys 0m15.434s > > 27.46% mmap mmap [.] reader > 21.15% mmap [kernel.kallsyms] [k] page_fault > 9.17% mmap [kernel.kallsyms] [k] filemap_fault > 2.96% mmap [kernel.kallsyms] [k] __do_fault > 2.83% mmap [kernel.kallsyms] [k] __mem_cgroup_begin_update_page_stat > > After this patch > [root@bluextal test]# time ./mmap 1G > > real 0m21.373s > user 0m6.113s > sys 0m15.016s > > In usual path, calls to __mem_cgroup_begin_update_page_stat() goes away. > > Note: we may be able to remove this optimization in future if > we can get pointer to memcg directly from struct page. > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Greg Thelen <gthelen@google.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat() 2012-02-14 7:22 ` Greg Thelen @ 2012-02-14 11:08 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 16+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-02-14 11:08 UTC (permalink / raw) To: Greg Thelen Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins On Mon, 13 Feb 2012 23:22:50 -0800 Greg Thelen <gthelen@google.com> wrote: > On Mon, Feb 13, 2012 at 7:16 PM, KAMEZAWA Hiroyuki > <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > From 3377fd7b6e23a5d2a368c078eae27e2b49c4f4aa Mon Sep 17 00:00:00 2001 > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > Date: Mon, 6 Feb 2012 12:14:47 +0900 > > Subject: [PATCH 6/6] memcg: fix performance of mem_cgroup_begin_update_page_stat() > > > > mem_cgroup_begin_update_page_stat() should be very fast because > > it's called very frequently. Now, it needs to look up page_cgroup > > and its memcg....this is slow. > > > > This patch adds a global variable to check "a memcg is moving or not". > > s/a memcg/any memcg/ > yes. > > By this, the caller doesn't need to visit page_cgroup and memcg. > > s/By/With/ > ok. > > Here is a test result. A test program makes page faults onto a file, > > MAP_SHARED and makes each page's page_mapcount(page) > 1, and free > > the range by madvise() and page fault again. A This program causes > > 26214400 times of page fault onto a file(size was 1G.) and shows > > shows the cost of mem_cgroup_begin_update_page_stat(). > > Out of curiosity, what is the performance of the mmap program before > this series? > Score of 3 runs underlinux-next. == [root@bluextal test]# time ./mmap 1G real 0m21.041s user 0m6.146s sys 0m14.625s [root@bluextal test]# time ./mmap 1G real 0m21.063s user 0m6.019s sys 0m14.776s [root@bluextal test]# time ./mmap 1G real 0m21.178s user 0m6.000s sys 0m14.849s == My program is attached. This program is for checking cost of updating FILE_MAPPED. I guess sys_time scores's error rate will be 0.2-0.3 sec. Thanks, -Kame == #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <sys/mman.h> #include <fcntl.h> void reader(int fd, int size) { int i, off, x; char *addr; addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); for (i = 0; i < 100; i++) { for(off = 0; off < size; off += 4096) { x += *(addr + off); } madvise(addr, size, MADV_DONTNEED); } } int main(int argc, char *argv[]) { int fd; char *addr, *c; unsigned long size; struct stat statbuf; fd = open(argv[1], O_RDONLY); if (fd < 0) { perror("cannot open file"); return 1; } if (fstat(fd, &statbuf)) { perror("fstat failed"); return 1; } size = statbuf.st_size; /* mmap in 2 place. */ addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); mlock(addr, size); reader(fd, size); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-02-14 11:10 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-02-14 3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki 2012-02-14 3:06 ` [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki 2012-02-14 7:21 ` Greg Thelen 2012-02-14 3:07 ` [PATCH 2/6 v4] memcg: simplify move_account() check KAMEZAWA Hiroyuki 2012-02-14 7:21 ` Greg Thelen 2012-02-14 8:44 ` KAMEZAWA Hiroyuki 2012-02-14 3:13 ` [PATCH 3/6 v4] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki 2012-02-14 7:21 ` Greg Thelen 2012-02-14 3:14 ` [PATCH 4/6 v4] memcg: use new logic for page stat accounting KAMEZAWA Hiroyuki 2012-02-14 7:22 ` Greg Thelen 2012-02-14 8:43 ` KAMEZAWA Hiroyuki 2012-02-14 3:15 ` [PATCH 5/6 v4] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki 2012-02-14 7:22 ` Greg Thelen 2012-02-14 3:16 ` [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat() KAMEZAWA Hiroyuki 2012-02-14 7:22 ` Greg Thelen 2012-02-14 11:08 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox