Re: [RFC 0/3] soft reclaim rework

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ying Han <yinghan@google.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Glauber Costa <glommer@parallels.com>
Subject: Re: [RFC 0/3] soft reclaim rework
Date: Wed, 17 Apr 2013 15:52:16 -0700	[thread overview]
Message-ID: <CALWz4iwgfz4BCNdv6jukXiqgvKXOgkNr2+c2kBPAk0avJL-N=Q@mail.gmail.com> (raw)
In-Reply-To: <1365509595-665-1-git-send-email-mhocko@suse.cz>


[-- Attachment #1.1: Type: text/plain, Size: 6749 bytes --]

On Tue, Apr 9, 2013 at 5:13 AM, Michal Hocko <mhocko@suse.cz> wrote:

> Hi all,
> It's been a long when I promised my take on the $subject but I got
> permanently preempted by other tasks. I finally got it, fortunately.
>

Hi Michal,

This is on my list for a while and never get chance to get to it.  The
per-memcg softlimit reclaim is one of the key feature google uses today,
and thank you for putting the effort of move this forward.

I haven't read the patch in details, but since we chatted about this for
few iterations and it should just look familiar.


This is just a first attempt. There are still some todos but I wanted to
> post it soon to get a feedback.
>
> The basic idea is quite simple. Pull soft reclaim into shrink_zone in
> the first step and get rid of the previous soft reclaim infrastructure.
> shrink_zone is done in two passes now. First it tries to do the soft
> limit reclaim and it falls back to reclaim-all-mode if no group is over
> the limit or no pages have been scanned. The second pass happens at the
> same priority so the only time we waste is the memcg tree walk which
> shouldn't be a big deal. There is certainly room for improvements in
> that direction. But let's keep it simple for now.
> As a bonus we will get rid of a _lot_ of code by this and soft reclaim
> will not stand out like before.
>

Yes, that is the part that should have given us enough motivation to merge
this effort long time ago. However, we had difficulties of agreeing the 5%
of the code (mainly on the softlimit policy) which preventing to cleaning
up 95% of the code. I take the blame.

The second step is somehow more controversial. I am redefining meaning
> of the default soft limit value. I've not chosen 0 as we discussed
> previously because I want to preserve hierarchical property of the soft
> limit (if a parent up the hierarchy is over its limit then children are
> over as well)


This is the 5% we keep disagreeing each other. The internal patch I am
carrying has different interpretation of "hierarchical softlimit reclaim".

However, I am more incline to accept that difference this time. At least
that will get us moving forward to clean up the code first. Then we can
revisit the exact policy of that 5% if that doesn't fit for other usecase
( besides google). I am happy to backport this part into our kernel later
and then only carry that 5% of change internally.

To give more background of what I mean by different interpretation of
"hierarchical", I have some write up some time back which is attached in
this thread. This is purely to make a note for later, and as I mentioned I
will go ahead review the patch and forget about that difference at this
step.

so I have kept the default untouched - unlimited - but I
> have slightly changed the meaning of this value. I interpret it as "user
> doesn't care about soft limit". More precisely the value is ignored
> unless it has been specified by user so such groups are eligible for
> soft reclaim even though they do not reach the limit. Such groups
> do not force their children to be reclaimed of course.
>



> I guess the only possible use case where this wouldn't work as
> expected is when somebody creates a group and set its soft limit to
> a small value (e.g. 0) just to protect all other groups from being
> reclaimed. With a new scheme all groups would be reclaimed while the
> previous implementation could end up reclaiming only the "special"
> group. This configuration can be achieved by the new scheme trivially
> so I think we should be safe. Or does this sound like a big problem?
> Finally the third step is soft limit reclaim integration into targeted
> reclaim. The patch is trivial one liner.
>

Will go through the patches with details in next day or so.

Thanks

--Ying

>
> I haven't get to test it properly yet. I've tested only 2 workloads:
> 1) 1GB RAM + 128MB swap in a kvm (host 4 GB RAM)
>    - 2 memcgs (directly under root)
>         - A has soft limit 500MB and hard unlimited
>         - B both hard and soft unlimited (default values)
>    - One dd if=/dev/zero of=storage/$file bs=1024 count=1228800 per group
> 2) same setup
>    - tar -xf linux source tree + make -j2 vmlinux
>
> Results
> 1) I've checked memory.usage_in_bytes
> Base (-mm tree)
>         Group A         Group B
> median  446498816       448659456
>
> Patches applied
> median  524314624       377921536
>
> So as expected, A got more room on behalf of B and it is nicely over its
> soft limit. I wanted to compare the reclaim performance as well but we
> do not account scanned and reclaimed pages during the old soft reclaim
> (global_reclaim prevents that). But I am planning to look at it.
> Anyway it doesn't look like we are scanning/reclaiming more with the
> patched kernel:
> Base:    pgscan_kswapd_dma32 394382     pgsteal_kswapd_dma32 394372
> Patched: pgscan_kswapd_dma32 394501     pgsteal_kswapd_dma32 394491
>
> So I would assume that the soft limit reclaim scanned more in the end.
>
> Total runtime was slightly smaller for the patch version:
> Base
>                 Group A         Group B
> total time      480.087 s       480.067 s
>
> Patches applied
> total time      474.853 s       474.736 s
>
> But this could be an artifacts of the guest scheduling or related to the
> host activity so I wouldn't draw any conclusions from here.
>
> 2) kbuild test showed more or less the same results
> usage_in_bytes
> Base
>                 Group A         Group B
> Median          394817536       395634688
>
> Patches applied
> median          483481600       302131200
>
> A is kept closer to the soft limit again. There is some fluctuation
> around the limit because kbuild creates a lot of short lived processes.
> Base:    pgscan_kswapd_dma32 1648718    pgsteal_kswapd_dma32 1510749
> Patched: pgscan_kswapd_dma32 2042065    pgsteal_kswapd_dma32 1667745
>
> The differences are much bigger now so it would be interesting how much
> has been scanned/reclaimed during soft reclaim in the base kernel.
>
> I haven't included total runtime statistics here because they seemed
> even more random due to guest/host interaction.
>
> Any comments are welcome, of course.
>
> Michal Hocko (3):
>       memcg: integrate soft reclaim tighter with zone shrinking code
>       memcg: Ignore soft limit until it is explicitly specified
>       vmscan, memcg: Do softlimit reclaim also for targeted reclaim
>
> Incomplete diffstat (without node-zone soft limit tree removal etc...)
> so more deletions to come.
>  include/linux/memcontrol.h |   10 +--
>  mm/memcontrol.c            |  175
> +++++++++-----------------------------------
>  mm/vmscan.c                |   67 ++++++++++-------
>  3 files changed, 78 insertions(+), 174 deletions(-)
>

[-- Attachment #1.2: Type: text/html, Size: 9580 bytes --]

[-- Attachment #2: SoftlimitReclaimInMemcg.pdf --]
[-- Type: application/pdf, Size: 416555 bytes --]

     prev parent reply	other threads:[~2013-04-17 22:52 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-09 12:13 Michal Hocko
2013-04-09 12:13 ` [RFC 1/3] memcg: integrate soft reclaim tighter with zone shrinking code Michal Hocko
2013-04-09 13:08   ` Johannes Weiner
2013-04-09 13:31     ` Michal Hocko
2013-04-09 13:57   ` Glauber Costa
2013-04-09 14:22     ` Michal Hocko
2013-04-09 16:45   ` Kamezawa Hiroyuki
2013-04-09 17:05     ` Michal Hocko
2013-04-14  0:42   ` Mel Gorman
2013-04-14 14:34     ` Michal Hocko
2013-04-14 14:55       ` Johannes Weiner
2013-04-14 15:04         ` Michal Hocko
2013-04-14 15:11           ` Michal Hocko
2013-04-14 18:03           ` Rik van Riel
2013-04-09 12:13 ` [RFC 2/3] memcg: Ignore soft limit until it is explicitly specified Michal Hocko
2013-04-09 13:24   ` Johannes Weiner
2013-04-09 13:42     ` Michal Hocko
2013-04-09 17:10   ` Kamezawa Hiroyuki
2013-04-09 17:22     ` Michal Hocko
2013-04-09 12:13 ` [RFC 3/3] vmscan, memcg: Do softlimit reclaim also for targeted reclaim Michal Hocko
2013-04-22  2:14   ` Michal Hocko
2013-04-09 15:37 ` [RFC 0/3] soft reclaim rework Michal Hocko
2013-04-09 15:50   ` Michal Hocko
2013-04-11  8:43 ` Michal Hocko
2013-04-11  9:07   ` Michal Hocko
2013-04-11 13:04   ` Michal Hocko
2013-04-17 22:52 ` Ying Han [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALWz4iwgfz4BCNdv6jukXiqgvKXOgkNr2+c2kBPAk0avJL-N=Q@mail.gmail.com' \
    --to=yinghan@google.com \
    --cc=glommer@parallels.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox