linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ying Han <yinghan@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"hugh.dickins@tiscali.co.uk" <hugh.dickins@tiscali.co.uk>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>,
	cgroups@vger.kernel.org,
	"bsingharora@gmail.com" <bsingharora@gmail.com>
Subject: Re: [RFC] [PATCH 4/7 v2] memcg: new scheme to update per-memcg page stat accounting.
Date: Thu, 26 Jan 2012 11:01:32 -0800	[thread overview]
Message-ID: <CALWz4izcvRGe2wBsthhhp3eW4rS=shW1wcZG3DW1=2skeaHmog@mail.gmail.com> (raw)
In-Reply-To: <20120113174138.ec7b64d9.kamezawa.hiroyu@jp.fujitsu.com>

On Fri, Jan 13, 2012 at 12:41 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> From 08a81022fa6f820a42aa5bf3a24ee07690dfff99 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 12 Jan 2012 18:13:32 +0900
> Subject: [PATCH 4/7] memcg: new scheme to update per-memcg page stat accounting.
>
> Now, page status accounting is done by a call mem_cgroup_update_page_stat()
nitpick, /by a call/by calling

> and this function set flags to page_cgroup->flags.
>
> This flag was required because the page's status and page <=> memcg
> relationship cannot be updated in atomic way.

I assume we are talking about the PCG_FILE_MAPPED flag, can we make it
specific here?

For example,
> Considering FILE_MAPPED,
>
>        CPU A                      CPU B
>                                pick up a page to be moved.
>    set page_mapcount()=0.
>                                move memcg' FILE_MAPPED stat --(*)
>                                overwrite pc->mem_cgroup
>    modify memcg's FILE_MAPPED-(**)
>
> If we don't have a flag on pc->flags, we'll not move 'FILE_MAPPED'
> account information in (*) and we'll decrease FILE_MAPPED in (**)
> from wrong cgroup. We'll see this kind of race at handling
> dirty, writeback...etc..bits. (And Dirty flag has another problem
> which cannot be handled by flag on page_cgroup.)
>
> I'd like to remove this flag because
>  - In recent discussions, removing pc->flags is our direction.
>  - This kind of duplication of flag/status is very bad and
>   it's better to use status in 'struct page'.
>
> This patch is for removing page_cgroup's special flag for
> page-state accounting and for using 'struct page's status itself.

I think this patch itself doesn't remove any pc flags. I believe it is
on the following patch, which removes the PCG_FILE_MAPPED flag.

>
> This patch adds an atomic update support of page statistics accounting
> in memcg. In short, it prevents a page from being moved from a memcg
> to another while updating page status by...
>
>        locked = mem_cgroup_begin_update_page_stat(page)
>        modify page
>        mem_cgroup_update_page_stat(page)
>        mem_cgroup_end_update_page_stat(page, locked)
>
> While begin_update_page_stat() ... end_update_page_stat(),
> the page_cgroup will never be moved to other memcg.

This is nice.

In general, the description needs some work and it isn't clear to me
what this patch does at the first glance.

--Ying

>
> In usual case, overhead is rcu_read_lock() and rcu_read_unlock(),
> lookup_page_cgroup().
>
> Note:
>  - I still now considering how to reduce overhead of this scheme.
>   Good idea is welcomed.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/memcontrol.h |   36 ++++++++++++++++++++++++++++++++++
>  mm/memcontrol.c            |   46 ++++++++++++++++++++++++++-----------------
>  mm/rmap.c                  |   14 +++++++++++-
>  3 files changed, 76 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 4d34356..976b58c 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -141,9 +141,35 @@ static inline bool mem_cgroup_disabled(void)
>        return false;
>  }
>
> +/*
> + * When we update page->flags,' we'll update some memcg's counter.
> + * Unlike vmstat, memcg has per-memcg stats and page-memcg relationship
> + * can be changed while 'struct page' information is updated.
> + * We need to prevent the race by
> + *     locked = mem_cgroup_begin_update_page_stat(page)
> + *     modify 'page'
> + *     mem_cgroup_update_page_stat(page, idx, val)
> + *     mem_cgroup_end_update_page_stat(page, locked);
> + */
> +bool __mem_cgroup_begin_update_page_stat(struct page *page);
> +static inline bool mem_cgroup_begin_update_page_stat(struct page *page)
> +{
> +       if (mem_cgroup_disabled())
> +               return false;
> +       return __mem_cgroup_begin_update_page_stat(page);
> +}
>  void mem_cgroup_update_page_stat(struct page *page,
>                                 enum mem_cgroup_page_stat_item idx,
>                                 int val);
> +void __mem_cgroup_end_update_page_stat(struct page *page, bool locked);
> +static inline void
> +mem_cgroup_end_update_page_stat(struct page *page, bool locked)
> +{
> +       if (mem_cgroup_disabled())
> +               return;
> +       __mem_cgroup_end_update_page_stat(page, locked);
> +}
> +
>
>  static inline void mem_cgroup_inc_page_stat(struct page *page,
>                                            enum mem_cgroup_page_stat_item idx)
> @@ -356,6 +382,16 @@ mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
>  {
>  }
>
> +static inline bool mem_cgroup_begin_update_page_stat(struct page *page)
> +{
> +       return false;
> +}
> +
> +static inline void
> +mem_cgroup_end_update_page_stat(struct page *page, bool locked)
> +{
> +}
> +
>  static inline void mem_cgroup_inc_page_stat(struct page *page,
>                                            enum mem_cgroup_page_stat_item idx)
>  {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 61e276f..30ef810 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1912,29 +1912,43 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask)
>  * possibility of race condition. If there is, we take a lock.
>  */
>
> +bool __mem_cgroup_begin_update_page_stat(struct page *page)
> +{
> +       struct page_cgroup *pc = lookup_page_cgroup(page);
> +       bool locked = false;
> +       struct mem_cgroup *memcg;
> +
> +       rcu_read_lock();
> +       memcg = pc->mem_cgroup;
> +
> +       if (!memcg || !PageCgroupUsed(pc))
> +               goto out;
> +       if (mem_cgroup_stealed(memcg)) {
> +               mem_cgroup_account_move_rlock(page);
> +               locked = true;
> +       }
> +out:
> +       return locked;
> +}
> +
> +void __mem_cgroup_end_update_page_stat(struct page *page, bool locked)
> +{
> +       if (locked)
> +               mem_cgroup_account_move_runlock(page);
> +       rcu_read_unlock();
> +}
> +
>  void mem_cgroup_update_page_stat(struct page *page,
>                                 enum mem_cgroup_page_stat_item idx, int val)
>  {
> -       struct mem_cgroup *memcg;
>        struct page_cgroup *pc = lookup_page_cgroup(page);
> -       bool need_unlock = false;
> +       struct mem_cgroup *memcg = pc->mem_cgroup;
>
>        if (mem_cgroup_disabled())
>                return;
>
> -       rcu_read_lock();
> -       memcg = pc->mem_cgroup;
>        if (unlikely(!memcg || !PageCgroupUsed(pc)))
> -               goto out;
> -       /* pc->mem_cgroup is unstable ? */
> -       if (unlikely(mem_cgroup_stealed(memcg))) {
> -               /* take a lock against to access pc->mem_cgroup */
> -               mem_cgroup_account_move_rlock(page);
> -               need_unlock = true;
> -               memcg = pc->mem_cgroup;
> -               if (!memcg || !PageCgroupUsed(pc))
> -                       goto out;
> -       }
> +               return;
>
>        switch (idx) {
>        case MEMCG_NR_FILE_MAPPED:
> @@ -1950,10 +1964,6 @@ void mem_cgroup_update_page_stat(struct page *page,
>
>        this_cpu_add(memcg->stat->count[idx], val);
>
> -out:
> -       if (unlikely(need_unlock))
> -               mem_cgroup_account_move_runlock(page);
> -       rcu_read_unlock();
>        return;
>  }
>  EXPORT_SYMBOL(mem_cgroup_update_page_stat);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index aa547d4..def60d1 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1150,10 +1150,13 @@ void page_add_new_anon_rmap(struct page *page,
>  */
>  void page_add_file_rmap(struct page *page)
>  {
> +       bool locked = mem_cgroup_begin_update_page_stat(page);
> +
>        if (atomic_inc_and_test(&page->_mapcount)) {
>                __inc_zone_page_state(page, NR_FILE_MAPPED);
>                mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED);
>        }
> +       mem_cgroup_end_update_page_stat(page, locked);
>  }
>
>  /**
> @@ -1164,10 +1167,14 @@ void page_add_file_rmap(struct page *page)
>  */
>  void page_remove_rmap(struct page *page)
>  {
> +       bool locked = false;
> +
> +       if (!PageAnon(page))
> +               locked = mem_cgroup_begin_update_page_stat(page);
> +
>        /* page still mapped by someone else? */
>        if (!atomic_add_negative(-1, &page->_mapcount))
> -               return;
> -
> +               goto out;
>        /*
>         * Now that the last pte has gone, s390 must transfer dirty
>         * flag from storage key to struct page.  We can usually skip
> @@ -1204,6 +1211,9 @@ void page_remove_rmap(struct page *page)
>         * Leaving it set also helps swapoff to reinstate ptes
>         * faster for those pages still in swapcache.
>         */
> +out:
> +       if (!PageAnon(page))
> +               mem_cgroup_end_update_page_stat(page, locked);
>  }
>
>  /*
> --
> 1.7.4.1
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-01-26 19:01 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-13  8:30 [RFC] [PATCH 0/7 v2] memcg: page_cgroup diet KAMEZAWA Hiroyuki
2012-01-13  8:32 ` [RFC] [PATCH 1/7 v2] memcg: remove unnecessary check in mem_cgroup_update_page_stat() KAMEZAWA Hiroyuki
2012-01-17 15:16   ` Michal Hocko
2012-01-17 23:55     ` KAMEZAWA Hiroyuki
2012-01-18 13:01       ` Michal Hocko
2012-01-19  2:18         ` KAMEZAWA Hiroyuki
2012-01-19 20:07         ` Ying Han
2012-01-20  0:48           ` KAMEZAWA Hiroyuki
2012-01-13  8:33 ` [RFC] [PATCH 2/7 v2] memcg: add memory barrier for checking account move KAMEZAWA Hiroyuki
2012-01-17 15:26   ` Michal Hocko
2012-01-18  0:06     ` KAMEZAWA Hiroyuki
2012-01-18 12:37       ` Michal Hocko
2012-01-19  2:17         ` KAMEZAWA Hiroyuki
2012-01-19  9:28           ` Michal Hocko
2012-01-19 23:57             ` KAMEZAWA Hiroyuki
2012-01-20 18:08           ` Ying Han
2012-01-23  9:04             ` Michal Hocko
2012-01-24  3:21               ` KAMEZAWA Hiroyuki
2012-01-24  8:49                 ` Michal Hocko
2012-01-24 19:04               ` Ying Han
2012-01-25 11:07                 ` Michal Hocko
2012-01-13  8:40 ` [RFC] [PATCH 3/7 v2] memcg: remove PCG_MOVE_LOCK flag from pc->flags KAMEZAWA Hiroyuki
2012-01-16 12:55   ` Kirill A. Shutemov
2012-01-17  0:22     ` KAMEZAWA Hiroyuki
2012-01-17 16:46   ` Michal Hocko
2012-01-18  0:12     ` KAMEZAWA Hiroyuki
2012-01-18 10:47       ` Michal Hocko
2012-01-18 23:53         ` KAMEZAWA Hiroyuki
2012-01-23 22:05           ` Ying Han
2012-01-24  4:59             ` KAMEZAWA Hiroyuki
2012-01-24  8:43             ` Michal Hocko
2012-01-25 23:07               ` Ying Han
2012-01-26  9:16                 ` Michal Hocko
2012-01-23 22:02   ` Ying Han
2012-01-24  4:47     ` KAMEZAWA Hiroyuki
2012-01-25 22:48       ` Ying Han
2012-01-13  8:41 ` [RFC] [PATCH 4/7 v2] memcg: new scheme to update per-memcg page stat accounting KAMEZAWA Hiroyuki
2012-01-18 16:45   ` Michal Hocko
2012-01-18 23:58     ` KAMEZAWA Hiroyuki
2012-01-26 19:01   ` Ying Han [this message]
2012-01-13  8:42 ` [RFC] [PATCH 5/7 v2] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki
2012-01-19 14:07   ` Michal Hocko
2012-01-26 19:10     ` Ying Han
2012-01-13  8:43 ` [RFC] [PATCH 6/7 v2] memcg: remove PCG_CACHE KAMEZAWA Hiroyuki
2012-01-13  8:45 ` [RFC] [PATCH 7/7 v2] memcg: make mem_cgroup_begin_update_stat to use global pcpu KAMEZAWA Hiroyuki
2012-01-19 14:47   ` Michal Hocko
2012-01-20  2:19     ` KAMEZAWA Hiroyuki
2012-01-20  8:38       ` Michal Hocko
2012-01-20  8:40   ` Greg Thelen
2012-01-24  3:18     ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALWz4izcvRGe2wBsthhhp3eW4rS=shW1wcZG3DW1=2skeaHmog@mail.gmail.com' \
    --to=yinghan@google.com \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox