From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 12F148D003B for ; Mon, 25 Apr 2011 13:35:56 -0400 (EDT) Received: from hpaq1.eem.corp.google.com (hpaq1.eem.corp.google.com [172.25.149.1]) by smtp-out.google.com with ESMTP id p3PHZl8j030047 for ; Mon, 25 Apr 2011 10:35:48 -0700 Received: from qwc23 (qwc23.prod.google.com [10.241.193.151]) by hpaq1.eem.corp.google.com with ESMTP id p3PHZeUP020988 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Mon, 25 Apr 2011 10:35:41 -0700 Received: by qwc23 with SMTP id 23so1508361qwc.31 for ; Mon, 25 Apr 2011 10:35:40 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110425183426.6a791ec9.kamezawa.hiroyu@jp.fujitsu.com> References: <20110425182529.c7c37bb4.kamezawa.hiroyu@jp.fujitsu.com> <20110425183426.6a791ec9.kamezawa.hiroyu@jp.fujitsu.com> Date: Mon, 25 Apr 2011 10:35:39 -0700 Message-ID: Subject: Re: [PATCH 4/7] memcg fix scan ratio with small memcg. From: Ying Han Content-Type: multipart/alternative; boundary=000e0ce008bcda1a6b04a1c1a28e Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "kosaki.motohiro@jp.fujitsu.com" , "balbir@linux.vnet.ibm.com" , "nishimura@mxp.nes.nec.co.jp" , "akpm@linux-foundation.org" , Johannes Weiner , "minchan.kim@gmail.com" , Michal Hocko --000e0ce008bcda1a6b04a1c1a28e Content-Type: text/plain; charset=ISO-8859-1 On Mon, Apr 25, 2011 at 2:34 AM, KAMEZAWA Hiroyuki < kamezawa.hiroyu@jp.fujitsu.com> wrote: > > At memcg memory reclaim, get_scan_count() may returns [0, 0, 0, 0] > and no scan was not issued at the reclaim priority. > > The reason is because memory cgroup may not be enough big to have > the number of pages, which is greater than 1 << priority. > > Because priority affects many routines in vmscan.c, it's better > to scan memory even if usage >> priority < 0. > From another point of view, if memcg's zone doesn't have enough memory > which > meets priority, it should be skipped. So, this patch creates a temporal > priority > in get_scan_count() and scan some amount of pages even when > usage is small. By this, memcg's reclaim goes smoother without > having too high priority, which will cause unnecessary congestion_wait(), > etc. > > Signed-off-by: KAMEZAWA Hiroyuki > --- > include/linux/memcontrol.h | 6 ++++++ > mm/memcontrol.c | 5 +++++ > mm/vmscan.c | 11 +++++++++++ > 3 files changed, 22 insertions(+) > > Index: memcg/include/linux/memcontrol.h > =================================================================== > --- memcg.orig/include/linux/memcontrol.h > +++ memcg/include/linux/memcontrol.h > @@ -152,6 +152,7 @@ unsigned long mem_cgroup_soft_limit_recl > gfp_t gfp_mask, > unsigned long > *total_scanned); > u64 mem_cgroup_get_limit(struct mem_cgroup *mem); > +u64 mem_cgroup_get_usage(struct mem_cgroup *mem); > > void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item > idx); > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > @@ -357,6 +358,11 @@ u64 mem_cgroup_get_limit(struct mem_cgro > return 0; > } > > +static inline u64 mem_cgroup_get_limit(struct mem_cgroup *mem) > +{ > + return 0; > +} > + > should be mem_cgroup_get_usage() static inline void mem_cgroup_split_huge_fixup(struct page *head, > struct page *tail) > { > Index: memcg/mm/memcontrol.c > =================================================================== > --- memcg.orig/mm/memcontrol.c > +++ memcg/mm/memcontrol.c > @@ -1483,6 +1483,11 @@ u64 mem_cgroup_get_limit(struct mem_cgro > return min(limit, memsw); > } > > +u64 mem_cgroup_get_usage(struct mem_cgroup *memcg) > +{ > + return res_counter_read_u64(&memcg->res, RES_USAGE); > +} > + > /* > * Visit the first child (need not be the first child as per the ordering > * of the cgroup list, since we track last_scanned_child) of @mem and use > Index: memcg/mm/vmscan.c > =================================================================== > --- memcg.orig/mm/vmscan.c > +++ memcg/mm/vmscan.c > @@ -1762,6 +1762,17 @@ static void get_scan_count(struct zone * > denominator = 1; > goto out; > } > + } else { > + u64 usage; > + /* > + * When memcg is enough small, anon+file >> priority > + * can be 0 and we'll do no scan. Adjust it to proper > + * value against its usage. If this zone's usage is enough > + * small, scan will ignore this zone until priority goes > down. > + */ > + for (usage = mem_cgroup_get_usage(sc->mem_cgroup) >> > PAGE_SHIFT; > + priority && ((usage >> priority) < SWAP_CLUSTER_MAX); > + priority--); > } > --Ying > > /* > > --000e0ce008bcda1a6b04a1c1a28e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Mon, Apr 25, 2011 at 2:34 AM, KAMEZAW= A Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

At memcg memory reclaim, get_scan_count() may returns [0, 0, 0, 0]
and no scan was not issued at the reclaim priority.

The reason is because memory cgroup may not be enough big to have
the number of pages, which is greater than 1 << priority.

Because priority affects many routines in vmscan.c, it's better
to scan memory even if usage >> priority < 0.
>>From another point of view, if memcg's zone doesn't have enough mem= ory which
meets priority, it should be skipped. So, this patch creates a temporal pri= ority
in get_scan_count() and scan some amount of pages even when
usage is small. By this, memcg's reclaim goes smoother without
having too high priority, which will cause unnecessary congestion_wait(), e= tc.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
=A0include/linux/memcontrol.h | =A0 =A06 ++++++
=A0mm/memcontrol.c =A0 =A0 =A0 =A0 =A0 =A0| =A0 =A05 +++++
=A0mm/vmscan.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0| =A0 11 +++++++++++
=A03 files changed, 22 insertions(+)

Index: memcg/include/linux/memcontrol.h
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- memcg.orig/include/linux/memcontrol.h
+++ memcg/include/linux/memcontrol.h
@@ -152,6 +152,7 @@ unsigned long mem_cgroup_soft_limit_recl
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0gfp_t gfp_mask,
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0unsigned long *total_scanned);
=A0u64 mem_cgroup_get_limit(struct mem_cgroup *mem);
+u64 mem_cgroup_get_usage(struct mem_cgroup *mem);

=A0void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item = idx);
=A0#ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -357,6 +358,11 @@ u64 mem_cgroup_get_limit(struct mem_cgro
=A0 =A0 =A0 =A0return 0;
=A0}

+static inline u64 mem_cgroup_get_limit(struct mem_cgroup *mem)
+{
+ =A0 =A0 =A0 return 0;
+}
+

should be =A0mem_cgroup_get_usage()= =A0


=A0static inline void mem_cgroup_split_huge_fixup(struct page *head,
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0struct page *tail)
=A0{
Index: memcg/mm/memcontrol.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- memcg.orig/mm/memcontrol.c
+++ memcg/mm/memcontrol.c
@@ -1483,6 +1483,11 @@ u64 mem_cgroup_get_limit(struct mem_cgro
=A0 =A0 =A0 =A0return min(limit, memsw);
=A0}

+u64 mem_cgroup_get_usage(struct mem_cgroup *memcg)
+{
+ =A0 =A0 =A0 return res_counter_read_u64(&memcg->res, RES_USAGE); +}
+
=A0/*
=A0* Visit the first child (need not be the first child as per the orderin= g
=A0* of the cgroup list, since we track last_scanned_child) of @mem and us= e
Index: memcg/mm/vmscan.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- memcg.orig/mm/vmscan.c
+++ memcg/mm/vmscan.c
@@ -1762,6 +1762,17 @@ static void get_scan_count(struct zone *
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0denominator =3D 1;
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto out;
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
+ =A0 =A0 =A0 } else {
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 u64 usage;
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 /*
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* When memcg is enough small, anon+file &g= t;> priority
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* can be 0 and we'll do no scan. Adjus= t it to proper
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* value against its usage. If this zone= 9;s usage is enough
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* small, scan will ignore this zone until = priority goes down.
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 for (usage =3D mem_cgroup_get_usage(sc->me= m_cgroup) >> PAGE_SHIFT;
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0priority && ((usage >&g= t; priority) < SWAP_CLUSTER_MAX);
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0priority--);
=A0 =A0 =A0 =A0}

--Ying=A0

=A0 =A0 =A0 =A0/*


--000e0ce008bcda1a6b04a1c1a28e-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org