linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6 v4] memcg: page cgroup diet
@ 2012-02-14  3:04 KAMEZAWA Hiroyuki
  2012-02-14  3:06 ` [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-14  3:04 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, hannes, Michal Hocko, akpm, Ying Han, Hugh Dickins


Here is v4. I removed RFC and fixed a fatal bug in v3.

This patch series is a preparetion for removing page_cgroup->flags. To remove
it, we need to reduce flags on page_cgroup. In this set, PCG_MOVE_LOCK and
PCG_FILE_MAPPED are remvoed.

After this, we only have 3 bits.
==
enum {
        /* flags for mem_cgroup */
        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
        PCG_USED, /* this object is in use. */
        PCG_MIGRATION, /* under page migration */
        __NR_PCG_FLAGS,
};
==

I'll make a try to merge this 3bits to lower 3bits of pointer to
memcg. Then, we can make page_cgroup 8(4)bytes per page.


Major changes since v3 are
 - replaced move_lock_page_cgroup() with move_lock_mem_cgroup().
   passes pointer to memcg rather than page_cgroup.

Working on linux-next feb13 and passed several tests.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
  2012-02-14  3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki
@ 2012-02-14  3:06 ` KAMEZAWA Hiroyuki
  2012-02-14  7:21   ` Greg Thelen
  2012-02-14  3:07 ` [PATCH 2/6 v4] memcg: simplify move_account() check KAMEZAWA Hiroyuki
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-14  3:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins


This is just a cleanup.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Thu, 2 Feb 2012 12:05:41 +0900
Subject: [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)

>From the log, I guess EXPORT was for preparing dirty accounting.
But _now_, we don't need to export this. Remove this for now.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ab315ab..4c2b759 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1897,7 +1897,6 @@ out:
 		move_unlock_page_cgroup(pc, &flags);
 	rcu_read_unlock();
 }
-EXPORT_SYMBOL(mem_cgroup_update_page_stat);
 
 /*
  * size of first charge trial. "32" comes from vmscan.c's magic value.
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/6 v4] memcg: simplify move_account() check
  2012-02-14  3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki
  2012-02-14  3:06 ` [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki
@ 2012-02-14  3:07 ` KAMEZAWA Hiroyuki
  2012-02-14  7:21   ` Greg Thelen
  2012-02-14  3:13 ` [PATCH 3/6 v4] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-14  3:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/6 v4] memcg: remove PCG_MOVE_LOCK flag from page_cgroup.
  2012-02-14  3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki
  2012-02-14  3:06 ` [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki
  2012-02-14  3:07 ` [PATCH 2/6 v4] memcg: simplify move_account() check KAMEZAWA Hiroyuki
@ 2012-02-14  3:13 ` KAMEZAWA Hiroyuki
  2012-02-14  7:21   ` Greg Thelen
  2012-02-14  3:14 ` [PATCH 4/6 v4] memcg: use new logic for page stat accounting KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-14  3:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/6 v4] memcg: use new logic for page stat accounting
  2012-02-14  3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki
                   ` (2 preceding siblings ...)
  2012-02-14  3:13 ` [PATCH 3/6 v4] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki
@ 2012-02-14  3:14 ` KAMEZAWA Hiroyuki
  2012-02-14  7:22   ` Greg Thelen
  2012-02-14  3:15 ` [PATCH 5/6 v4] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki
  2012-02-14  3:16 ` [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat() KAMEZAWA Hiroyuki
  5 siblings, 1 reply; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-14  3:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 5/6 v4] memcg: remove PCG_FILE_MAPPED
  2012-02-14  3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki
                   ` (3 preceding siblings ...)
  2012-02-14  3:14 ` [PATCH 4/6 v4] memcg: use new logic for page stat accounting KAMEZAWA Hiroyuki
@ 2012-02-14  3:15 ` KAMEZAWA Hiroyuki
  2012-02-14  7:22   ` Greg Thelen
  2012-02-14  3:16 ` [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat() KAMEZAWA Hiroyuki
  5 siblings, 1 reply; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-14  3:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat()
  2012-02-14  3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki
                   ` (4 preceding siblings ...)
  2012-02-14  3:15 ` [PATCH 5/6 v4] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki
@ 2012-02-14  3:16 ` KAMEZAWA Hiroyuki
  2012-02-14  7:22   ` Greg Thelen
  5 siblings, 1 reply; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-14  3:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
  2012-02-14  3:06 ` [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki
@ 2012-02-14  7:21   ` Greg Thelen
  0 siblings, 0 replies; 16+ messages in thread
From: Greg Thelen @ 2012-02-14  7:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins

On Mon, Feb 13, 2012 at 7:06 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> This is just a cleanup.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 2 Feb 2012 12:05:41 +0900
> Subject: [PATCH 1/6] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
>
> From the log, I guess EXPORT was for preparing dirty accounting.
> But _now_, we don't need to export this. Remove this for now.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Looks good to me.

Reviewed-by: Greg Thelen <gthelen@google.com>

> ---
>  mm/memcontrol.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ab315ab..4c2b759 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1897,7 +1897,6 @@ out:
>                move_unlock_page_cgroup(pc, &flags);
>        rcu_read_unlock();
>  }
> -EXPORT_SYMBOL(mem_cgroup_update_page_stat);
>
>  /*
>  * size of first charge trial. "32" comes from vmscan.c's magic value.
> --
> 1.7.4.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6 v4] memcg: simplify move_account() check
  2012-02-14  3:07 ` [PATCH 2/6 v4] memcg: simplify move_account() check KAMEZAWA Hiroyuki
@ 2012-02-14  7:21   ` Greg Thelen
  2012-02-14  8:44     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 16+ messages in thread
From: Greg Thelen @ 2012-02-14  7:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins

On Mon, Feb 13, 2012 at 7:07 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From 9cdb3b63dc8d08cc2220c54c80438c13433a0d12 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 2 Feb 2012 10:02:39 +0900
> Subject: [PATCH 2/6] memcg: simplify move_account() check.
>
> In memcg, for avoiding take-lock-irq-off at accessing page_cgroup,
> a logic, flag + rcu_read_lock(), is used. This works as following
>
>     CPU-A                     CPU-B
>                             rcu_read_lock()
>    set flag
>                             if(flag is set)
>                                   take heavy lock
>                             do job.
>    synchronize_rcu()        rcu_read_unlock()

I assume that CPU-A will take heavy lock after synchronize_rcu() when
updating variables read by CPU-B.

>  memcontrol.c |   65 ++++++++++++++++++++++-------------------------------------
>  1 file changed, 25 insertions(+), 40 deletions(-)
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Greg Thelen <gthelen@google.com>

> ---
>  mm/memcontrol.c |   70 +++++++++++++++++++++++-------------------------------
>  1 files changed, 30 insertions(+), 40 deletions(-)

> @@ -2089,11 +2082,8 @@ static int __cpuinit memcg_cpu_hotplug_callback(struct notifier_block *nb,
>        struct memcg_stock_pcp *stock;
>        struct mem_cgroup *iter;
>
> -       if ((action == CPU_ONLINE)) {
> -               for_each_mem_cgroup(iter)
> -                       synchronize_mem_cgroup_on_move(iter, cpu);
> +       if ((action == CPU_ONLINE))

Extra parenthesis.  I recommend:
+       if (action == CPU_ONLINE)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6 v4] memcg: remove PCG_MOVE_LOCK flag from page_cgroup.
  2012-02-14  3:13 ` [PATCH 3/6 v4] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki
@ 2012-02-14  7:21   ` Greg Thelen
  0 siblings, 0 replies; 16+ messages in thread
From: Greg Thelen @ 2012-02-14  7:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins

On Mon, Feb 13, 2012 at 7:13 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From ffd1b013fe294a80c12e3f30e85386135a6fb284 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 2 Feb 2012 11:49:59 +0900
> Subject: [PATCH 3/6] memcg: remove PCG_MOVE_LOCK flag from page_cgroup.
>
> PCG_MOVE_LOCK is used for bit spinlock to avoid race between overwriting
> pc->mem_cgroup and page statistics accounting per memcg.
> This lock helps to avoid the race but the race is very rare because moving
> tasks between cgroup is not a usual job.
> So, it seems using 1bit per page is too costly.
>
> This patch changes this lock as per-memcg spinlock and removes PCG_MOVE_LOCK.
>
> If smaller lock is required, we'll be able to add some hashes but
> I'd like to start from this.
>
> Changelog:
>  - fixed to pass memcg as an argument rather than page_cgroup.
>    and renamed from move_lock_page_cgroup() to move_lock_mem_cgroup()
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Seems good.  Thanks.

Acked-by: Greg Thelen <gthelen@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6 v4] memcg: use new logic for page stat accounting
  2012-02-14  3:14 ` [PATCH 4/6 v4] memcg: use new logic for page stat accounting KAMEZAWA Hiroyuki
@ 2012-02-14  7:22   ` Greg Thelen
  2012-02-14  8:43     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 16+ messages in thread
From: Greg Thelen @ 2012-02-14  7:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins

On Mon, Feb 13, 2012 at 7:14 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From ad2905362ef58a44d96a325193ab384739418050 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 2 Feb 2012 11:49:59 +0900
> Subject: [PATCH 4/6] memcg: use new logic for page stat accounting.
>
> Now, page-stat-per-memcg is recorded into per page_cgroup flag by
> duplicating page's status into the flag. The reason is that memcg
> has a feature to move a page from a group to another group and we
> have race between "move" and "page stat accounting",
>
> Under current logic, assume CPU-A and CPU-B. CPU-A does "move"
> and CPU-B does "page stat accounting".
>
> When CPU-A goes 1st,
>
>            CPU-A                           CPU-B
>                                    update "struct page" info.
>    move_lock_mem_cgroup(memcg)
>    see flags

pc->flags?

>    copy page stat to new group
>    overwrite pc->mem_cgroup.
>    move_unlock_mem_cgroup(memcg)
>                                    move_lock_mem_cgroup(mem)
>                                    set pc->flags
>                                    update page stat accounting
>                                    move_unlock_mem_cgroup(mem)
>
> stat accounting is guarded by move_lock_mem_cgroup() and "move"
> logic (CPU-A) doesn't see changes in "struct page" information.
>
> But it's costly to have the same information both in 'struct page' and
> 'struct page_cgroup'. And, there is a potential problem.
>
> For example, assume we have PG_dirty accounting in memcg.
> PG_..is a flag for struct page.
> PCG_ is a flag for struct page_cgroup.
> (This is just an example. The same problem can be found in any
>  kind of page stat accounting.)
>
>          CPU-A                               CPU-B
>      TestSet PG_dirty
>      (delay)                        TestClear PG_dirty_

PG_dirty

>                                     if (TestClear(PCG_dirty))
>                                          memcg->nr_dirty--
>      if (TestSet(PCG_dirty))
>          memcg->nr_dirty++
>

> @@ -141,6 +141,31 @@ static inline bool mem_cgroup_disabled(void)
>        return false;
>  }
>
> +void __mem_cgroup_begin_update_page_stat(struct page *page,
> +                                       bool *lock, unsigned long *flags);
> +
> +static inline void mem_cgroup_begin_update_page_stat(struct page *page,
> +                                       bool *lock, unsigned long *flags)
> +{
> +       if (mem_cgroup_disabled())
> +               return;
> +       rcu_read_lock();
> +       *lock = false;

This seems like a strange place to set *lock=false.  I think it's
clearer if __mem_cgroup_begin_update_page_stat() is the only routine
that sets or clears *lock.  But I do see that in patch 6/6 'memcg: fix
performance of mem_cgroup_begin_update_page_stat()' this position is
required.

> +       return __mem_cgroup_begin_update_page_stat(page, lock, flags);
> +}

> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ecf8856..30afea5 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1877,32 +1877,54 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask)
>  * If there is, we take a lock.
>  */
>
> +void __mem_cgroup_begin_update_page_stat(struct page *page,
> +                               bool *lock, unsigned long *flags)
> +{
> +       struct mem_cgroup *memcg;
> +       struct page_cgroup *pc;
> +
> +       pc = lookup_page_cgroup(page);
> +again:
> +       memcg = pc->mem_cgroup;
> +       if (unlikely(!memcg || !PageCgroupUsed(pc)))
> +               return;
> +       if (!mem_cgroup_stealed(memcg))
> +               return;
> +
> +       move_lock_mem_cgroup(memcg, flags);
> +       if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) {
> +               move_unlock_mem_cgroup(memcg, flags);
> +               goto again;
> +       }
> +       *lock = true;
> +}
> +
> +void __mem_cgroup_end_update_page_stat(struct page *page,
> +                               bool *lock, unsigned long *flags)

'lock' looks like an unused parameter.  If so, then remove it.

> +{
> +       struct page_cgroup *pc = lookup_page_cgroup(page);
> +
> +       /*
> +        * It's guaranteed that pc->mem_cgroup never changes while
> +        * lock is held

Please continue comment describing what provides this guarantee.  I
assume it is because rcu_read_lock() is held by
mem_cgroup_begin_update_page_stat().  Maybe it's best to to just make
small reference to the locking protocol description in
mem_cgroup_start_move().

> +        */
> +       move_unlock_mem_cgroup(pc->mem_cgroup, flags);
> +}
> +
> +

I think it would be useful to add a small comment here declaring that
all callers of this routine must be in a
mem_cgroup_begin_update_page_stat(), mem_cgroup_end_update_page_stat()
critical section to keep pc->mem_cgroup stable.

>  void mem_cgroup_update_page_stat(struct page *page,
>                                 enum mem_cgroup_page_stat_item idx, int val)
>  {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6 v4] memcg: remove PCG_FILE_MAPPED
  2012-02-14  3:15 ` [PATCH 5/6 v4] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki
@ 2012-02-14  7:22   ` Greg Thelen
  0 siblings, 0 replies; 16+ messages in thread
From: Greg Thelen @ 2012-02-14  7:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins

On Mon, Feb 13, 2012 at 7:15 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From 96407a510d5212179a4768f28591b35d5c17d403 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 2 Feb 2012 15:02:18 +0900
> Subject: [PATCH 5/6] memcg: remove PCG_FILE_MAPPED
>
> with new lock scheme for updating memcg's page stat, we don't need
> a flag PCG_FILE_MAPPED which was duplicated information of page_mapped().
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks!

Acked-by: Greg Thelen <gthelen@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat()
  2012-02-14  3:16 ` [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat() KAMEZAWA Hiroyuki
@ 2012-02-14  7:22   ` Greg Thelen
  2012-02-14 11:08     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 16+ messages in thread
From: Greg Thelen @ 2012-02-14  7:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins

On Mon, Feb 13, 2012 at 7:16 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From 3377fd7b6e23a5d2a368c078eae27e2b49c4f4aa Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Mon, 6 Feb 2012 12:14:47 +0900
> Subject: [PATCH 6/6] memcg: fix performance of mem_cgroup_begin_update_page_stat()
>
> mem_cgroup_begin_update_page_stat() should be very fast because
> it's called very frequently. Now, it needs to look up page_cgroup
> and its memcg....this is slow.
>
> This patch adds a global variable to check "a memcg is moving or not".

s/a memcg/any memcg/

> By this, the caller doesn't need to visit page_cgroup and memcg.

s/By/With/

> Here is a test result. A test program makes page faults onto a file,
> MAP_SHARED and makes each page's page_mapcount(page) > 1, and free
> the range by madvise() and page fault again.  This program causes
> 26214400 times of page fault onto a file(size was 1G.) and shows
> shows the cost of mem_cgroup_begin_update_page_stat().

Out of curiosity, what is the performance of the mmap program before
this series?

> Before this patch for mem_cgroup_begin_update_page_stat()
> [kamezawa@bluextal test]$ time ./mmap 1G
>
> real    0m21.765s
> user    0m5.999s
> sys     0m15.434s
>
>    27.46%     mmap  mmap               [.] reader
>    21.15%     mmap  [kernel.kallsyms]  [k] page_fault
>     9.17%     mmap  [kernel.kallsyms]  [k] filemap_fault
>     2.96%     mmap  [kernel.kallsyms]  [k] __do_fault
>     2.83%     mmap  [kernel.kallsyms]  [k] __mem_cgroup_begin_update_page_stat
>
> After this patch
> [root@bluextal test]# time ./mmap 1G
>
> real    0m21.373s
> user    0m6.113s
> sys     0m15.016s
>
> In usual path, calls to __mem_cgroup_begin_update_page_stat() goes away.
>
> Note: we may be able to remove this optimization in future if
>      we can get pointer to memcg directly from struct page.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Greg Thelen <gthelen@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6 v4] memcg: use new logic for page stat accounting
  2012-02-14  7:22   ` Greg Thelen
@ 2012-02-14  8:43     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-14  8:43 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins

On Mon, 13 Feb 2012 23:22:22 -0800
Greg Thelen <gthelen@google.com> wrote:

> On Mon, Feb 13, 2012 at 7:14 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > From ad2905362ef58a44d96a325193ab384739418050 Mon Sep 17 00:00:00 2001
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Date: Thu, 2 Feb 2012 11:49:59 +0900
> > Subject: [PATCH 4/6] memcg: use new logic for page stat accounting.
> >
> > Now, page-stat-per-memcg is recorded into per page_cgroup flag by
> > duplicating page's status into the flag. The reason is that memcg
> > has a feature to move a page from a group to another group and we
> > have race between "move" and "page stat accounting",
> >
> > Under current logic, assume CPU-A and CPU-B. CPU-A does "move"
> > and CPU-B does "page stat accounting".
> >
> > When CPU-A goes 1st,
> >
> > A  A  A  A  A  A CPU-A A  A  A  A  A  A  A  A  A  A  A  A  A  CPU-B
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A update "struct page" info.
> > A  A move_lock_mem_cgroup(memcg)
> > A  A see flags
> 
> pc->flags?
> 
yes.


> > A  A copy page stat to new group
> > A  A overwrite pc->mem_cgroup.
> > A  A move_unlock_mem_cgroup(memcg)
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A move_lock_mem_cgroup(mem)
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A set pc->flags
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A update page stat accounting
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A move_unlock_mem_cgroup(mem)
> >
> > stat accounting is guarded by move_lock_mem_cgroup() and "move"
> > logic (CPU-A) doesn't see changes in "struct page" information.
> >
> > But it's costly to have the same information both in 'struct page' and
> > 'struct page_cgroup'. And, there is a potential problem.
> >
> > For example, assume we have PG_dirty accounting in memcg.
> > PG_..is a flag for struct page.
> > PCG_ is a flag for struct page_cgroup.
> > (This is just an example. The same problem can be found in any
> > A kind of page stat accounting.)
> >
> > A  A  A  A  A CPU-A A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  CPU-B
> > A  A  A TestSet PG_dirty
> > A  A  A (delay) A  A  A  A  A  A  A  A  A  A  A  A TestClear PG_dirty_
> 
> PG_dirty
> 
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  if (TestClear(PCG_dirty))
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A memcg->nr_dirty--
> > A  A  A if (TestSet(PCG_dirty))
> > A  A  A  A  A memcg->nr_dirty++
> >
> 
> > @@ -141,6 +141,31 @@ static inline bool mem_cgroup_disabled(void)
> > A  A  A  A return false;
> > A }
> >
> > +void __mem_cgroup_begin_update_page_stat(struct page *page,
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  bool *lock, unsigned long *flags);
> > +
> > +static inline void mem_cgroup_begin_update_page_stat(struct page *page,
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  bool *lock, unsigned long *flags)
> > +{
> > + A  A  A  if (mem_cgroup_disabled())
> > + A  A  A  A  A  A  A  return;
> > + A  A  A  rcu_read_lock();
> > + A  A  A  *lock = false;
> 
> This seems like a strange place to set *lock=false.  I think it's
> clearer if __mem_cgroup_begin_update_page_stat() is the only routine
> that sets or clears *lock.  But I do see that in patch 6/6 'memcg: fix
> performance of mem_cgroup_begin_update_page_stat()' this position is
> required.
> 

Ah, yes. Hmm, it was better to move this to the body of function.



> > + A  A  A  return __mem_cgroup_begin_update_page_stat(page, lock, flags);
> > +}
> 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index ecf8856..30afea5 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1877,32 +1877,54 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask)
> > A * If there is, we take a lock.
> > A */
> >
> > +void __mem_cgroup_begin_update_page_stat(struct page *page,
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  bool *lock, unsigned long *flags)
> > +{
> > + A  A  A  struct mem_cgroup *memcg;
> > + A  A  A  struct page_cgroup *pc;
> > +
> > + A  A  A  pc = lookup_page_cgroup(page);
> > +again:
> > + A  A  A  memcg = pc->mem_cgroup;
> > + A  A  A  if (unlikely(!memcg || !PageCgroupUsed(pc)))
> > + A  A  A  A  A  A  A  return;
> > + A  A  A  if (!mem_cgroup_stealed(memcg))
> > + A  A  A  A  A  A  A  return;
> > +
> > + A  A  A  move_lock_mem_cgroup(memcg, flags);
> > + A  A  A  if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) {
> > + A  A  A  A  A  A  A  move_unlock_mem_cgroup(memcg, flags);
> > + A  A  A  A  A  A  A  goto again;
> > + A  A  A  }
> > + A  A  A  *lock = true;
> > +}
> > +
> > +void __mem_cgroup_end_update_page_stat(struct page *page,
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  bool *lock, unsigned long *flags)
> 
> 'lock' looks like an unused parameter.  If so, then remove it.
> 

Ok.

> > +{
> > + A  A  A  struct page_cgroup *pc = lookup_page_cgroup(page);
> > +
> > + A  A  A  /*
> > + A  A  A  A * It's guaranteed that pc->mem_cgroup never changes while
> > + A  A  A  A * lock is held
> 
> Please continue comment describing what provides this guarantee.  I
> assume it is because rcu_read_lock() is held by
> mem_cgroup_begin_update_page_stat().  Maybe it's best to to just make
> small reference to the locking protocol description in
> mem_cgroup_start_move().
> 
Ok, I will update this.


> > + A  A  A  A */
> > + A  A  A  move_unlock_mem_cgroup(pc->mem_cgroup, flags);
> > +}
> > +
> > +
> 
> I think it would be useful to add a small comment here declaring that
> all callers of this routine must be in a
> mem_cgroup_begin_update_page_stat(), mem_cgroup_end_update_page_stat()
> critical section to keep pc->mem_cgroup stable.
> 

Sure, will do.

Thank you for review.
-Kame


> > A void mem_cgroup_update_page_stat(struct page *page,
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  enum mem_cgroup_page_stat_item idx, int val)
> > A {
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6 v4] memcg: simplify move_account() check
  2012-02-14  7:21   ` Greg Thelen
@ 2012-02-14  8:44     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-14  8:44 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins

On Mon, 13 Feb 2012 23:21:34 -0800
Greg Thelen <gthelen@google.com> wrote:

> On Mon, Feb 13, 2012 at 7:07 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > From 9cdb3b63dc8d08cc2220c54c80438c13433a0d12 Mon Sep 17 00:00:00 2001
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Date: Thu, 2 Feb 2012 10:02:39 +0900
> > Subject: [PATCH 2/6] memcg: simplify move_account() check.
> >
> > In memcg, for avoiding take-lock-irq-off at accessing page_cgroup,
> > a logic, flag + rcu_read_lock(), is used. This works as following
> >
> > A  A  CPU-A A  A  A  A  A  A  A  A  A  A  CPU-B
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  rcu_read_lock()
> > A  A set flag
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  if(flag is set)
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  take heavy lock
> > A  A  A  A  A  A  A  A  A  A  A  A  A  A  do job.
> > A  A synchronize_rcu() A  A  A  A rcu_read_unlock()
> 
> I assume that CPU-A will take heavy lock after synchronize_rcu() when
> updating variables read by CPU-B.
> 
Ah, yes. I should wrote that.

> > A memcontrol.c | A  65 ++++++++++++++++++++++-------------------------------------
> > A 1 file changed, 25 insertions(+), 40 deletions(-)
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Acked-by: Greg Thelen <gthelen@google.com>
> 

Thank you!.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat()
  2012-02-14  7:22   ` Greg Thelen
@ 2012-02-14 11:08     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-02-14 11:08 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Ying Han,
	Hugh Dickins

On Mon, 13 Feb 2012 23:22:50 -0800
Greg Thelen <gthelen@google.com> wrote:

> On Mon, Feb 13, 2012 at 7:16 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > From 3377fd7b6e23a5d2a368c078eae27e2b49c4f4aa Mon Sep 17 00:00:00 2001
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Date: Mon, 6 Feb 2012 12:14:47 +0900
> > Subject: [PATCH 6/6] memcg: fix performance of mem_cgroup_begin_update_page_stat()
> >
> > mem_cgroup_begin_update_page_stat() should be very fast because
> > it's called very frequently. Now, it needs to look up page_cgroup
> > and its memcg....this is slow.
> >
> > This patch adds a global variable to check "a memcg is moving or not".
> 
> s/a memcg/any memcg/
> 
yes.

> > By this, the caller doesn't need to visit page_cgroup and memcg.
> 
> s/By/With/
> 
ok.

> > Here is a test result. A test program makes page faults onto a file,
> > MAP_SHARED and makes each page's page_mapcount(page) > 1, and free
> > the range by madvise() and page fault again. A This program causes
> > 26214400 times of page fault onto a file(size was 1G.) and shows
> > shows the cost of mem_cgroup_begin_update_page_stat().
> 
> Out of curiosity, what is the performance of the mmap program before
> this series?
> 

Score of 3 runs underlinux-next.
==
[root@bluextal test]# time ./mmap 1G

real    0m21.041s
user    0m6.146s
sys     0m14.625s
[root@bluextal test]# time ./mmap 1G

real    0m21.063s
user    0m6.019s
sys     0m14.776s
[root@bluextal test]# time ./mmap 1G

real    0m21.178s
user    0m6.000s
sys     0m14.849s
==

My program is attached. This program is for checking cost of updating FILE_MAPPED.
I guess sys_time scores's error rate will be 0.2-0.3 sec.

Thanks,
-Kame
==
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>

void reader(int fd, int size)
{
	int i, off, x;
	char *addr;

	addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
	for (i = 0; i < 100; i++) {
		for(off = 0; off < size; off += 4096) {
			x += *(addr + off);
		}
		madvise(addr, size, MADV_DONTNEED);
	}
}

int main(int argc, char *argv[])
{
	int fd;
	char *addr, *c;
	unsigned long size;
	struct stat statbuf;

	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		perror("cannot open file");
		return 1;
	}

	if (fstat(fd, &statbuf)) {
		perror("fstat failed");
		return 1;
	}
	size = statbuf.st_size;
	/* mmap in 2 place. */
	addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
	mlock(addr, size);
	reader(fd, size);
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-02-14 11:10 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-14  3:04 [PATCH 0/6 v4] memcg: page cgroup diet KAMEZAWA Hiroyuki
2012-02-14  3:06 ` [PATCH 1/6 v4] memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) KAMEZAWA Hiroyuki
2012-02-14  7:21   ` Greg Thelen
2012-02-14  3:07 ` [PATCH 2/6 v4] memcg: simplify move_account() check KAMEZAWA Hiroyuki
2012-02-14  7:21   ` Greg Thelen
2012-02-14  8:44     ` KAMEZAWA Hiroyuki
2012-02-14  3:13 ` [PATCH 3/6 v4] memcg: remove PCG_MOVE_LOCK flag from page_cgroup KAMEZAWA Hiroyuki
2012-02-14  7:21   ` Greg Thelen
2012-02-14  3:14 ` [PATCH 4/6 v4] memcg: use new logic for page stat accounting KAMEZAWA Hiroyuki
2012-02-14  7:22   ` Greg Thelen
2012-02-14  8:43     ` KAMEZAWA Hiroyuki
2012-02-14  3:15 ` [PATCH 5/6 v4] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki
2012-02-14  7:22   ` Greg Thelen
2012-02-14  3:16 ` [PATCH 6/6 v4] memcg: fix performance of mem_cgroup_begin_update_page_stat() KAMEZAWA Hiroyuki
2012-02-14  7:22   ` Greg Thelen
2012-02-14 11:08     ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox