On Wed, Apr 20, 2011 at 11:38 PM, KAMEZAWA Hiroyuki < kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Wed, 20 Apr 2011 23:11:42 -0700 > Ying Han wrote: > > > On Wed, Apr 20, 2011 at 8:48 PM, KAMEZAWA Hiroyuki < > > kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > > > > > > memcg-kswapd visits each memcg in round-robin. But required > > > amounts of works depends on memcg' usage and hi/low watermark > > > and taking it into account will be good. > > > > > > Signed-off-by: KAMEZAWA Hiroyuki > > > --- > > > include/linux/memcontrol.h | 1 + > > > mm/memcontrol.c | 17 +++++++++++++++++ > > > mm/vmscan.c | 2 ++ > > > 3 files changed, 20 insertions(+) > > > > > > Index: mmotm-Apr14/include/linux/memcontrol.h > > > =================================================================== > > > --- mmotm-Apr14.orig/include/linux/memcontrol.h > > > +++ mmotm-Apr14/include/linux/memcontrol.h > > > @@ -98,6 +98,7 @@ extern bool mem_cgroup_kswapd_can_sleep( > > > extern struct mem_cgroup *mem_cgroup_get_shrink_target(void); > > > extern void mem_cgroup_put_shrink_target(struct mem_cgroup *mem); > > > extern wait_queue_head_t *mem_cgroup_kswapd_waitq(void); > > > +extern int mem_cgroup_kswapd_bonus(struct mem_cgroup *mem); > > > > > > static inline > > > int mm_match_cgroup(const struct mm_struct *mm, const struct > mem_cgroup > > > *cgroup) > > > Index: mmotm-Apr14/mm/memcontrol.c > > > =================================================================== > > > --- mmotm-Apr14.orig/mm/memcontrol.c > > > +++ mmotm-Apr14/mm/memcontrol.c > > > @@ -4673,6 +4673,23 @@ struct memcg_kswapd_work > > > > > > struct memcg_kswapd_work memcg_kswapd_control; > > > > > > +int mem_cgroup_kswapd_bonus(struct mem_cgroup *mem) > > > +{ > > > + unsigned long long usage, lowat, hiwat; > > > + int rate; > > > + > > > + usage = res_counter_read_u64(&mem->res, RES_USAGE); > > > + lowat = res_counter_read_u64(&mem->res, RES_LOW_WMARK_LIMIT); > > > + hiwat = res_counter_read_u64(&mem->res, RES_HIGH_WMARK_LIMIT); > > > + if (lowat == hiwat) > > > + return 0; > > > + > > > + rate = (usage - hiwat) * 10 / (lowat - hiwat); > > > + /* If usage is big, we reclaim more */ > > > + return rate * SWAP_CLUSTER_MAX; > > This may be buggy and we should have upper limit on this 'rate'. > > > > > +} > > > + > > > > > > > > > > I understand the logic in general, which we would like to reclaim more > each > > > time if more work needs to be done. But not quite sure the calculation > here, > > > the (usage - hiwat) determines the amount of work of kswapd. And why > divide > > > by (lowat - hiwat)? My guess is because the larger the value, the later > we > > > will trigger kswapd? > > > Because memcg-kswapd will require more work on this memcg if usage-high is > large. > agree on this, and that is the idea of "rate" be proportional to (usage-high). > > At first, I'm not sure this logic is good but wanted to show there is a > chance to > do some schedule. > > We have 2 ways to implement this kind of weight > > 1. modify to select memcg logic > I think we'll see starvation easily. So, didn't this for this time. > > 2. modify the amount to nr_to_reclaim > We'll be able to determine the amount by some calculation using some > statistics. > > I selected "2" for this time. > > With HIGH/LOW watermark, the admin set LOW watermark as a kind of limit. > Then, > if usage is more than LOW watermark, its priority will be higher than other > memcg > which has lower (relative) usage. Ok, now i know a bit more of the logic behind. Here, we would like to reclaim more from the memcg which has higher (usage - low). n general, memcg-kswapd can reduce memory down to high watermak only when > the system is not busy. So, this logic tries to remove more memory from busy > cgroup to reduce 'hit limit'. > So, the "busy cgroup" here means the memcg has higher (usage - low)? --Ying > > And I wonder, a memcg containes pages which is related to each other. So, > reducing > some amount of pages larger than 32pages at once may make sense. > > > Thanks, > -Kame > >