linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently.
@ 2011-11-17  1:33 KAMEZAWA Hiroyuki
  2011-11-18 16:35 ` Michal Hocko
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-11-17  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: cgroups, linux-kernel, akpm, hannes, mhocko, Andrea Arcangeli,
	Balbir Singh


I'll send this again when mm is shipped.
I sometimes see mem_cgroup_split_huge_fixup() in perf report and noticed
it's very slow. This fixes it. Any comments are welcome.

==
Subject: [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently.

at split_huge_page(), mem_cgroup_split_huge_fixup() is called to
handle page_cgroup modifcations. It takes move_lock_page_cgroup()
and modify page_cgroup and LRU accounting jobs and called
HPAGE_PMD_SIZE - 1 times.

But thinking again,
  - compound_lock() is held at move_accout...then, it's not necessary
    to take move_lock_page_cgroup().
  - LRU is locked and all tail pages will go into the same LRU as
    head is now on.
  - page_cgroup is contiguous in huge page range.

This patch fixes mem_cgroup_split_huge_fixup() as to be called once per
hugepage and reduce costs for spliting.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memcontrol.h |    5 ++---
 mm/huge_memory.c           |    3 ++-
 mm/memcontrol.c            |   32 ++++++++++++++++----------------
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b87068a..0a22a19 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -154,7 +154,7 @@ u64 mem_cgroup_get_limit(struct mem_cgroup *memcg);
 
 void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx);
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-void mem_cgroup_split_huge_fixup(struct page *head, struct page *tail);
+void mem_cgroup_split_huge_fixup(struct page *head);
 #endif
 
 #ifdef CONFIG_DEBUG_VM
@@ -357,8 +357,7 @@ u64 mem_cgroup_get_limit(struct mem_cgroup *memcg)
 	return 0;
 }
 
-static inline void mem_cgroup_split_huge_fixup(struct page *head,
-						struct page *tail)
+static inline void mem_cgroup_split_huge_fixup(struct page *head)
 {
 }
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4298aba..aa6cdae 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1207,6 +1207,8 @@ static void __split_huge_page_refcount(struct page *page)
 	/* prevent PageLRU to go away from under us, and freeze lru stats */
 	spin_lock_irq(&zone->lru_lock);
 	compound_lock(page);
+	/* complete memcg works before add pages to LRU */
+	mem_cgroup_split_huge_fixup(page);
 
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
 		struct page *page_tail = page + i;
@@ -1278,7 +1280,6 @@ static void __split_huge_page_refcount(struct page *page)
 		BUG_ON(!PageDirty(page_tail));
 		BUG_ON(!PageSwapBacked(page_tail));
 
-		mem_cgroup_split_huge_fixup(page, page_tail);
 
 		lru_add_page_tail(zone, page, page_tail);
 	}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6aff93c..99101f1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2523,38 +2523,38 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 /*
  * Because tail pages are not marked as "used", set it. We're under
  * zone->lru_lock, 'splitting on pmd' and compund_lock.
+ * charge/uncharge will be never happen and move_account() is done under
+ * compound_lock(), so we don't have to take care of races.
  */
-void mem_cgroup_split_huge_fixup(struct page *head, struct page *tail)
+void mem_cgroup_split_huge_fixup(struct page *head)
 {
 	struct page_cgroup *head_pc = lookup_page_cgroup(head);
-	struct page_cgroup *tail_pc = lookup_page_cgroup(tail);
-	unsigned long flags;
+	struct page_cgroup *pc;
+	int i;
 
 	if (mem_cgroup_disabled())
 		return;
-	/*
-	 * We have no races with charge/uncharge but will have races with
-	 * page state accounting.
-	 */
-	move_lock_page_cgroup(head_pc, &flags);
+	for (i = 1; i < HPAGE_PMD_NR; i++) {
+		pc = head_pc + i;
+		pc->mem_cgroup = head_pc->mem_cgroup;
+		smp_wmb();/* see __commit_charge() */
+		/*
+		 * LRU flags cannot be copied because we need to add tail
+		 * page to LRU by generic call and our hooks will be called.
+		 */
+		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
+	}
 
-	tail_pc->mem_cgroup = head_pc->mem_cgroup;
-	smp_wmb(); /* see __commit_charge() */
 	if (PageCgroupAcctLRU(head_pc)) {
 		enum lru_list lru;
 		struct mem_cgroup_per_zone *mz;
-
 		/*
-		 * LRU flags cannot be copied because we need to add tail
-		 *.page to LRU by generic call and our hook will be called.
 		 * We hold lru_lock, then, reduce counter directly.
 		 */
 		lru = page_lru(head);
 		mz = page_cgroup_zoneinfo(head_pc->mem_cgroup, head);
-		MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+		MEM_CGROUP_ZSTAT(mz, lru) -= HPAGE_PMD_NR - 1;
 	}
-	tail_pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
-	move_unlock_page_cgroup(head_pc, &flags);
 }
 #endif
 
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently.
  2011-11-17  1:33 [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently KAMEZAWA Hiroyuki
@ 2011-11-18 16:35 ` Michal Hocko
  2011-11-21 10:42 ` Johannes Weiner
  2011-11-21 11:00 ` Balbir Singh
  2 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2011-11-18 16:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, linux-kernel, akpm, hannes, Andrea Arcangeli,
	Balbir Singh

On Thu 17-11-11 10:33:08, KAMEZAWA Hiroyuki wrote:
> 
> I'll send this again when mm is shipped.
> I sometimes see mem_cgroup_split_huge_fixup() in perf report and noticed
> it's very slow. This fixes it. Any comments are welcome.
> 
> ==
> Subject: [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently.
> 
> at split_huge_page(), mem_cgroup_split_huge_fixup() is called to
> handle page_cgroup modifcations. It takes move_lock_page_cgroup()
> and modify page_cgroup and LRU accounting jobs and called
> HPAGE_PMD_SIZE - 1 times.
> 
> But thinking again,
>   - compound_lock() is held at move_accout...then, it's not necessary
>     to take move_lock_page_cgroup().
>   - LRU is locked and all tail pages will go into the same LRU as
>     head is now on.
>   - page_cgroup is contiguous in huge page range.
> 
> This patch fixes mem_cgroup_split_huge_fixup() as to be called once per
> hugepage and reduce costs for spliting.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Yes, looks good. Andrew already took the patch, but just in case
Reviewed-by: Michal Hocko <mhocko@suse.cz>

Just one really minor comment bellow
[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6aff93c..99101f1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2523,38 +2523,38 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>  /*
>   * Because tail pages are not marked as "used", set it. We're under
>   * zone->lru_lock, 'splitting on pmd' and compund_lock.

typo that could be fixed to make grep happier

> + * charge/uncharge will be never happen and move_account() is done under
> + * compound_lock(), so we don't have to take care of races.
>   */
> -void mem_cgroup_split_huge_fixup(struct page *head, struct page *tail)
> +void mem_cgroup_split_huge_fixup(struct page *head)

Thanks!
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently.
  2011-11-17  1:33 [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently KAMEZAWA Hiroyuki
  2011-11-18 16:35 ` Michal Hocko
@ 2011-11-21 10:42 ` Johannes Weiner
  2011-11-21 10:44   ` KAMEZAWA Hiroyuki
  2011-11-21 11:00 ` Balbir Singh
  2 siblings, 1 reply; 6+ messages in thread
From: Johannes Weiner @ 2011-11-21 10:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, linux-kernel, akpm, mhocko, Andrea Arcangeli,
	Balbir Singh

On Thu, Nov 17, 2011 at 10:33:08AM +0900, KAMEZAWA Hiroyuki wrote:
> 
> I'll send this again when mm is shipped.
> I sometimes see mem_cgroup_split_huge_fixup() in perf report and noticed
> it's very slow. This fixes it. Any comments are welcome.
> 
> ==
> Subject: [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently.
> 
> at split_huge_page(), mem_cgroup_split_huge_fixup() is called to
> handle page_cgroup modifcations. It takes move_lock_page_cgroup()
> and modify page_cgroup and LRU accounting jobs and called
> HPAGE_PMD_SIZE - 1 times.
> 
> But thinking again,
>   - compound_lock() is held at move_accout...then, it's not necessary
>     to take move_lock_page_cgroup().
>   - LRU is locked and all tail pages will go into the same LRU as
>     head is now on.
>   - page_cgroup is contiguous in huge page range.
> 
> This patch fixes mem_cgroup_split_huge_fixup() as to be called once per
> hugepage and reduce costs for spliting.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

I agree with the changes, but since you are resending it anyway: I
think removing the move_lock and switching the hook to take care of
all tail pages in one go are two logical steps.  Would you mind
breaking it up into separate patches?

In any case,

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently.
  2011-11-21 10:42 ` Johannes Weiner
@ 2011-11-21 10:44   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-11-21 10:44 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, cgroups, linux-kernel, akpm, mhocko, Andrea Arcangeli,
	Balbir Singh

On Mon, 21 Nov 2011 11:42:50 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> On Thu, Nov 17, 2011 at 10:33:08AM +0900, KAMEZAWA Hiroyuki wrote:
> > 
> > I'll send this again when mm is shipped.
> > I sometimes see mem_cgroup_split_huge_fixup() in perf report and noticed
> > it's very slow. This fixes it. Any comments are welcome.
> > 
> > ==
> > Subject: [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently.
> > 
> > at split_huge_page(), mem_cgroup_split_huge_fixup() is called to
> > handle page_cgroup modifcations. It takes move_lock_page_cgroup()
> > and modify page_cgroup and LRU accounting jobs and called
> > HPAGE_PMD_SIZE - 1 times.
> > 
> > But thinking again,
> >   - compound_lock() is held at move_accout...then, it's not necessary
> >     to take move_lock_page_cgroup().
> >   - LRU is locked and all tail pages will go into the same LRU as
> >     head is now on.
> >   - page_cgroup is contiguous in huge page range.
> > 
> > This patch fixes mem_cgroup_split_huge_fixup() as to be called once per
> > hugepage and reduce costs for spliting.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> I agree with the changes, but since you are resending it anyway: I
> think removing the move_lock and switching the hook to take care of
> all tail pages in one go are two logical steps.  Would you mind
> breaking it up into separate patches?
> 
> In any case,
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Ok, I'll break this into 2 patches at resending.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently.
  2011-11-17  1:33 [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently KAMEZAWA Hiroyuki
  2011-11-18 16:35 ` Michal Hocko
  2011-11-21 10:42 ` Johannes Weiner
@ 2011-11-21 11:00 ` Balbir Singh
  2011-11-22  0:36   ` KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 6+ messages in thread
From: Balbir Singh @ 2011-11-21 11:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, linux-kernel, akpm, hannes, mhocko, Andrea Arcangeli

On Thu, Nov 17, 2011 at 7:03 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> I'll send this again when mm is shipped.
> I sometimes see mem_cgroup_split_huge_fixup() in perf report and noticed
> it's very slow. This fixes it. Any comments are welcome.
>

How do you see this - what tests?

> ==
> Subject: [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently.
>
> at split_huge_page(), mem_cgroup_split_huge_fixup() is called to
> handle page_cgroup modifcations. It takes move_lock_page_cgroup()
> and modify page_cgroup and LRU accounting jobs and called
> HPAGE_PMD_SIZE - 1 times.
>
> But thinking again,
>  - compound_lock() is held at move_accout...then, it's not necessary
>    to take move_lock_page_cgroup().
>  - LRU is locked and all tail pages will go into the same LRU as
>    head is now on.
>  - page_cgroup is contiguous in huge page range.
>
> This patch fixes mem_cgroup_split_huge_fixup() as to be called once per
> hugepage and reduce costs for spliting.

The change seems reasonable, I am working on a test setup and hope to
test it soon

Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently.
  2011-11-21 11:00 ` Balbir Singh
@ 2011-11-22  0:36   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-11-22  0:36 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, cgroups, linux-kernel, akpm, hannes, mhocko, Andrea Arcangeli

On Mon, 21 Nov 2011 16:30:11 +0530
Balbir Singh <bsingharora@gmail.com> wrote:

> On Thu, Nov 17, 2011 at 7:03 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > I'll send this again when mm is shipped.
> > I sometimes see mem_cgroup_split_huge_fixup() in perf report and noticed
> > it's very slow. This fixes it. Any comments are welcome.
> >
> 
> How do you see this - what tests?
> 
Sometimes. By applications calling fork() or mremap(), mprotect(), which
will cause split_huge_page(). 
But not 100% reporoducable, yet.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-11-22  0:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-17  1:33 [PATCH] fix mem_cgroup_split_huge_fixup to work efficiently KAMEZAWA Hiroyuki
2011-11-18 16:35 ` Michal Hocko
2011-11-21 10:42 ` Johannes Weiner
2011-11-21 10:44   ` KAMEZAWA Hiroyuki
2011-11-21 11:00 ` Balbir Singh
2011-11-22  0:36   ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox