From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Konstantin Khlebnikov <khlebnikov@openvz.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Ying Han <yinghan@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 8/10] mm/memcg: nest lru_lock inside page_cgroup lock
Date: Tue, 21 Feb 2012 18:48:29 +0900 [thread overview]
Message-ID: <20120221184829.78d523a8.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <alpine.LSU.2.00.1202201535460.23274@eggly.anvils>
On Mon, 20 Feb 2012 15:36:55 -0800 (PST)
Hugh Dickins <hughd@google.com> wrote:
> Cut back on some of the overhead we've added, particularly the lruvec
> locking added to every __mem_cgroup_uncharge_common(), and the page
> cgroup locking in mem_cgroup_reset_uncharged_to_root().
>
> Our hands were tied by the lock ordering (page cgroup inside lruvec)
> defined by __mem_cgroup_commit_charge_lrucare(). There is no strong
> reason for why that nesting needs to be one way or the other, and if
> we invert it, then some optimizations become possible.
>
> So delete __mem_cgroup_commit_charge_lrucare(), passing a bool lrucare
> to __mem_cgroup_commit_charge() instead, using page_lock_lruvec() there
> inside lock_page_cgroup() in the lrucare case. (I'd prefer to work it
> out internally, than rely upon an lrucare argument: but that is hard -
> certainly PageLRU is not enough, racing with pages on pagevec about to
> be flushed to lru.) Use page_relock_lruvec() after setting mem_cgroup,
> before adding to the appropriate new lruvec: so that (if lock depends
> on memcg) old lock is held across change in ownership while off lru.
>
> Delete the lruvec locking on entry to __mem_cgroup_uncharge_common();
> but if the page being uncharged is not on lru, then we do need to
> reset its ownership, and must dance very carefully with mem_cgroup_
> reset_uncharged_to_root(), to make sure that when there's a race
> between uncharging and removing from lru, one side or the other
> will see it - smp_mb__after_clear_bit() at both ends.
>
> Avoid overhead of calls to mem_cgroup_reset_uncharged_to_root() from
> release_pages() and __page_cache_release(), by doing its work inside
> page_relock_lruvec() when the page_count is 0 i.e. the page is frozen
> from other references and about to be freed. That was not possible
> with the old lock ordering, since __mem_cgroup_uncharge_common()'s
> lock then changed ownership too soon.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> mm/memcontrol.c | 142 ++++++++++++++++++++++++----------------------
> mm/swap.c | 2
> 2 files changed, 75 insertions(+), 69 deletions(-)
>
> --- mmotm.orig/mm/memcontrol.c 2012-02-18 11:57:55.551524898 -0800
> +++ mmotm/mm/memcontrol.c 2012-02-18 11:58:02.451525062 -0800
> @@ -1059,6 +1059,14 @@ void page_relock_lruvec(struct page *pag
> */
> if (unlikely(!memcg))
> memcg = pc->mem_cgroup = root_mem_cgroup;
> + /*
> + * We must reset pc->mem_cgroup back to root before freeing
> + * a page: avoid additional callouts from hot paths by doing
> + * it here when we see the page is frozen (can safely be done
> + * before taking lru_lock because the page is frozen).
> + */
> + if (memcg != root_mem_cgroup && !page_count(page))
> + pc->mem_cgroup = root_mem_cgroup;
> mz = page_cgroup_zoneinfo(memcg, page);
> lruvec = &mz->lruvec;
> }
> @@ -1083,23 +1091,20 @@ void mem_cgroup_reset_uncharged_to_root(
> return;
>
> VM_BUG_ON(PageLRU(page));
> + /*
> + * Caller just did ClearPageLRU():
> + * make sure that __mem_cgroup_uncharge_common()
> + * can see that before we test PageCgroupUsed(pc).
> + */
> + smp_mb__after_clear_bit();
>
> /*
> * Once an uncharged page is isolated from the mem_cgroup's lru,
> * it no longer protects that mem_cgroup from rmdir: reset to root.
> - *
> - * __page_cache_release() and release_pages() may be called at
> - * interrupt time: we cannot lock_page_cgroup() then (we might
> - * have interrupted a section with page_cgroup already locked),
> - * nor do we need to since the page is frozen and about to be freed.
> */
> pc = lookup_page_cgroup(page);
> - if (page_count(page))
> - lock_page_cgroup(pc);
> if (!PageCgroupUsed(pc) && pc->mem_cgroup != root_mem_cgroup)
> pc->mem_cgroup = root_mem_cgroup;
> - if (page_count(page))
> - unlock_page_cgroup(pc);
> }
>
> /**
> @@ -2422,9 +2427,11 @@ static void __mem_cgroup_commit_charge(s
> struct page *page,
> unsigned int nr_pages,
> struct page_cgroup *pc,
> - enum charge_type ctype)
> + enum charge_type ctype,
> + bool lrucare)
> {
> - bool anon;
> + struct lruvec *lruvec;
> + bool was_on_lru = false;
>
> lock_page_cgroup(pc);
> if (unlikely(PageCgroupUsed(pc))) {
> @@ -2433,28 +2440,41 @@ static void __mem_cgroup_commit_charge(s
> return;
> }
> /*
> - * we don't need page_cgroup_lock about tail pages, becase they are not
> - * accessed by any other context at this point.
> + * We don't need lock_page_cgroup on tail pages, because they are not
> + * accessible to any other context at this point.
> */
> - pc->mem_cgroup = memcg;
> +
> /*
> - * We access a page_cgroup asynchronously without lock_page_cgroup().
> - * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
> - * is accessed after testing USED bit. To make pc->mem_cgroup visible
> - * before USED bit, we need memory barrier here.
> - * See mem_cgroup_add_lru_list(), etc.
> - */
> - smp_wmb();
> + * In some cases, SwapCache and FUSE(splice_buf->radixtree), the page
> + * may already be on some other page_cgroup's LRU. Take care of it.
> + */
> + if (lrucare) {
> + lruvec = page_lock_lruvec(page);
> + if (PageLRU(page)) {
> + ClearPageLRU(page);
> + del_page_from_lru_list(page, lruvec, page_lru(page));
> + was_on_lru = true;
> + }
> + }
>
> + pc->mem_cgroup = memcg;
> SetPageCgroupUsed(pc);
> - if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
> - anon = true;
> - else
> - anon = false;
>
> - mem_cgroup_charge_statistics(memcg, anon, nr_pages);
> + if (lrucare) {
> + if (was_on_lru) {
> + page_relock_lruvec(page, &lruvec);
> + if (!PageLRU(page)) {
> + SetPageLRU(page);
> + add_page_to_lru_list(page, lruvec, page_lru(page));
> + }
> + }
> + unlock_lruvec(lruvec);
> + }
> +
> + mem_cgroup_charge_statistics(memcg,
> + ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED, nr_pages);
> unlock_page_cgroup(pc);
> - WARN_ON_ONCE(PageLRU(page));
> +
> /*
> * "charge_statistics" updated event counter. Then, check it.
> * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
> @@ -2652,7 +2672,7 @@ static int mem_cgroup_charge_common(stru
> ret = __mem_cgroup_try_charge(mm, gfp_mask, nr_pages, &memcg, oom);
> if (ret == -ENOMEM)
> return ret;
> - __mem_cgroup_commit_charge(memcg, page, nr_pages, pc, ctype);
> + __mem_cgroup_commit_charge(memcg, page, nr_pages, pc, ctype, false);
> return 0;
> }
>
> @@ -2672,34 +2692,6 @@ static void
> __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
> enum charge_type ctype);
>
> -static void
> -__mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *memcg,
> - enum charge_type ctype)
> -{
> - struct page_cgroup *pc = lookup_page_cgroup(page);
> - struct lruvec *lruvec;
> - bool removed = false;
> -
> - /*
> - * In some case, SwapCache, FUSE(splice_buf->radixtree), the page
> - * is already on LRU. It means the page may on some other page_cgroup's
> - * LRU. Take care of it.
> - */
> - lruvec = page_lock_lruvec(page);
> - if (PageLRU(page)) {
> - del_page_from_lru_list(page, lruvec, page_lru(page));
> - ClearPageLRU(page);
> - removed = true;
> - }
> - __mem_cgroup_commit_charge(memcg, page, 1, pc, ctype);
> - if (removed) {
> - page_relock_lruvec(page, &lruvec);
> - add_page_to_lru_list(page, lruvec, page_lru(page));
> - SetPageLRU(page);
> - }
> - unlock_lruvec(lruvec);
> -}
> -
> int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> gfp_t gfp_mask)
> {
> @@ -2777,13 +2769,16 @@ static void
> __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *memcg,
> enum charge_type ctype)
> {
> + struct page_cgroup *pc;
> +
> if (mem_cgroup_disabled())
> return;
> if (!memcg)
> return;
> cgroup_exclude_rmdir(&memcg->css);
>
> - __mem_cgroup_commit_charge_lrucare(page, memcg, ctype);
> + pc = lookup_page_cgroup(page);
> + __mem_cgroup_commit_charge(memcg, page, 1, pc, ctype, true);
> /*
> * Now swap is on-memory. This means this page may be
> * counted both as mem and swap....double count.
> @@ -2898,7 +2893,6 @@ __mem_cgroup_uncharge_common(struct page
> struct mem_cgroup *memcg = NULL;
> unsigned int nr_pages = 1;
> struct page_cgroup *pc;
> - struct lruvec *lruvec;
> bool anon;
>
> if (mem_cgroup_disabled())
> @@ -2918,7 +2912,6 @@ __mem_cgroup_uncharge_common(struct page
> if (unlikely(!PageCgroupUsed(pc)))
> return NULL;
>
> - lruvec = page_lock_lruvec(page);
> lock_page_cgroup(pc);
>
> memcg = pc->mem_cgroup;
> @@ -2950,16 +2943,31 @@ __mem_cgroup_uncharge_common(struct page
> mem_cgroup_charge_statistics(memcg, anon, -nr_pages);
>
> ClearPageCgroupUsed(pc);
> + /*
> + * Make sure that mem_cgroup_reset_uncharged_to_root()
> + * can see that before we test PageLRU(page).
> + */
> + smp_mb__after_clear_bit();
>
> /*
> * Once an uncharged page is isolated from the mem_cgroup's lru,
> * it no longer protects that mem_cgroup from rmdir: reset to root.
> - */
> - if (!PageLRU(page) && pc->mem_cgroup != root_mem_cgroup)
> - pc->mem_cgroup = root_mem_cgroup;
> -
> + *
> + * The page_count() test avoids the lock in the common case when
> + * shrink_page_list()'s __remove_mapping() has frozen references
> + * to 0 and the page is on its way to freedom.
> + */
> + if (!PageLRU(page) && pc->mem_cgroup != root_mem_cgroup) {
> + struct lruvec *lruvec = NULL;
> +
> + if (page_count(page))
> + lruvec = page_lock_lruvec(page);
> + if (!PageLRU(page))
> + pc->mem_cgroup = root_mem_cgroup;
> + if (lruvec)
> + unlock_lruvec(lruvec);
> + }
Hmm. ok, isoalte_lru_page() at el take care of all problems if PageLRU()==true,
right ?
I wonder which is better to delay freeing lruvec or this locking scheme...
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-02-21 9:50 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-20 23:26 [PATCH 0/10] mm/memcg: per-memcg per-zone lru locking Hugh Dickins
2012-02-20 23:28 ` [PATCH 1/10] mm/memcg: scanning_global_lru means mem_cgroup_disabled Hugh Dickins
2012-02-21 8:03 ` KAMEZAWA Hiroyuki
2012-02-20 23:29 ` [PATCH 2/10] mm/memcg: move reclaim_stat into lruvec Hugh Dickins
2012-02-21 8:05 ` KAMEZAWA Hiroyuki
2012-02-20 23:30 ` [PATCH 3/10] mm/memcg: add zone pointer " Hugh Dickins
2012-02-21 8:08 ` KAMEZAWA Hiroyuki
2012-02-20 23:32 ` [PATCH 4/10] mm/memcg: apply add/del_page to lruvec Hugh Dickins
2012-02-21 8:20 ` KAMEZAWA Hiroyuki
2012-02-21 22:25 ` Hugh Dickins
2012-02-20 23:33 ` [PATCH 5/10] mm/memcg: introduce page_relock_lruvec Hugh Dickins
2012-02-21 8:38 ` KAMEZAWA Hiroyuki
2012-02-21 22:36 ` Hugh Dickins
2012-02-20 23:34 ` [PATCH 6/10] mm/memcg: take care over pc->mem_cgroup Hugh Dickins
2012-02-21 5:55 ` Konstantin Khlebnikov
2012-02-21 19:37 ` Hugh Dickins
2012-02-21 20:40 ` Konstantin Khlebnikov
2012-02-21 22:05 ` Hugh Dickins
2012-02-21 6:05 ` Konstantin Khlebnikov
2012-02-21 20:00 ` Hugh Dickins
2012-02-21 9:13 ` KAMEZAWA Hiroyuki
2012-02-21 23:03 ` Hugh Dickins
2012-02-22 4:05 ` Konstantin Khlebnikov
2012-02-20 23:35 ` [PATCH 7/10] mm/memcg: remove mem_cgroup_reset_owner Hugh Dickins
2012-02-21 9:17 ` KAMEZAWA Hiroyuki
2012-02-20 23:36 ` [PATCH 8/10] mm/memcg: nest lru_lock inside page_cgroup lock Hugh Dickins
2012-02-21 9:48 ` KAMEZAWA Hiroyuki [this message]
2012-02-20 23:38 ` [PATCH 9/10] mm/memcg: move lru_lock into lruvec Hugh Dickins
2012-02-21 7:08 ` Konstantin Khlebnikov
2012-02-21 20:12 ` Hugh Dickins
2012-02-21 21:35 ` Konstantin Khlebnikov
2012-02-21 22:12 ` Hugh Dickins
2012-02-22 3:43 ` Konstantin Khlebnikov
2012-02-22 6:09 ` Hugh Dickins
2012-02-23 14:21 ` Konstantin Khlebnikov
2012-02-20 23:39 ` [PATCH 10/10] mm/memcg: per-memcg per-zone lru locking Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120221184829.78d523a8.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=khlebnikov@openvz.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox