From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>,
Balbir Singh <bsingharora@gmail.com>,
Ying Han <yinghan@google.com>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [patch 2/2] mm: memcg: hierarchical soft limit reclaim
Date: Thu, 12 Jan 2012 10:54:27 +0900 [thread overview]
Message-ID: <20120112105427.4b80437b.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <1326207772-16762-3-git-send-email-hannes@cmpxchg.org>
On Tue, 10 Jan 2012 16:02:52 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:
> Right now, memcg soft limits are implemented by having a sorted tree
> of memcgs that are in excess of their limits. Under global memory
> pressure, kswapd first reclaims from the biggest excessor and then
> proceeds to do regular global reclaim. The result of this is that
> pages are reclaimed from all memcgs, but more scanning happens against
> those above their soft limit.
>
> With global reclaim doing memcg-aware hierarchical reclaim by default,
> this is a lot easier to implement: everytime a memcg is reclaimed
> from, scan more aggressively (per tradition with a priority of 0) if
> it's above its soft limit. With the same end result of scanning
> everybody, but soft limit excessors a bit more.
>
> Advantages:
>
> o smoother reclaim: soft limit reclaim is a separate stage before
> global reclaim, whose result is not communicated down the line and
> so overreclaim of the groups in excess is very likely. After this
> patch, soft limit reclaim is fully integrated into regular reclaim
> and each memcg is considered exactly once per cycle.
>
> o true hierarchy support: soft limits are only considered when
> kswapd does global reclaim, but after this patch, targetted
> reclaim of a memcg will mind the soft limit settings of its child
> groups.
>
> o code size: soft limit reclaim requires a lot of code to maintain
> the per-node per-zone rb-trees to quickly find the biggest
> offender, dedicated paths for soft limit reclaim etc. while this
> new implementation gets away without all that.
>
> Test:
>
> The test consists of two concurrent kernel build jobs in separate
> source trees, the master and the slave. The two jobs get along nicely
> on 600MB of available memory, so this is the zero overcommit control
> case. When available memory is decreased, the overcommit is
> compensated by decreasing the soft limit of the slave by the same
> amount, in the hope that the slave takes the hit and the master stays
> unaffected.
>
> 600M-0M-vanilla 600M-0M-patched
> Master walltime (s) 552.65 ( +0.00%) 552.38 ( -0.05%)
> Master walltime (stddev) 1.25 ( +0.00%) 0.92 ( -14.66%)
> Master major faults 204.38 ( +0.00%) 205.38 ( +0.49%)
> Master major faults (stddev) 27.16 ( +0.00%) 13.80 ( -47.43%)
> Master reclaim 31.88 ( +0.00%) 37.75 ( +17.87%)
> Master reclaim (stddev) 34.01 ( +0.00%) 75.88 (+119.59%)
> Master scan 31.88 ( +0.00%) 37.75 ( +17.87%)
> Master scan (stddev) 34.01 ( +0.00%) 75.88 (+119.59%)
> Master kswapd reclaim 33922.12 ( +0.00%) 33887.12 ( -0.10%)
> Master kswapd reclaim (stddev) 969.08 ( +0.00%) 492.22 ( -49.16%)
> Master kswapd scan 34085.75 ( +0.00%) 33985.75 ( -0.29%)
> Master kswapd scan (stddev) 1101.07 ( +0.00%) 563.33 ( -48.79%)
> Slave walltime (s) 552.68 ( +0.00%) 552.12 ( -0.10%)
> Slave walltime (stddev) 0.79 ( +0.00%) 1.05 ( +14.76%)
> Slave major faults 212.50 ( +0.00%) 204.50 ( -3.75%)
> Slave major faults (stddev) 26.90 ( +0.00%) 13.17 ( -49.20%)
> Slave reclaim 26.12 ( +0.00%) 35.00 ( +32.72%)
> Slave reclaim (stddev) 29.42 ( +0.00%) 74.91 (+149.55%)
> Slave scan 31.38 ( +0.00%) 35.00 ( +11.20%)
> Slave scan (stddev) 33.31 ( +0.00%) 74.91 (+121.24%)
> Slave kswapd reclaim 34259.00 ( +0.00%) 33469.88 ( -2.30%)
> Slave kswapd reclaim (stddev) 925.15 ( +0.00%) 565.07 ( -38.88%)
> Slave kswapd scan 34354.62 ( +0.00%) 33555.75 ( -2.33%)
> Slave kswapd scan (stddev) 969.62 ( +0.00%) 581.70 ( -39.97%)
>
> In the control case, the differences in elapsed time, number of major
> faults taken, and reclaim statistics are within the noise for both the
> master and the slave job.
>
> 600M-280M-vanilla 600M-280M-patched
> Master walltime (s) 595.13 ( +0.00%) 553.19 ( -7.04%)
> Master walltime (stddev) 8.31 ( +0.00%) 2.57 ( -61.64%)
> Master major faults 3729.75 ( +0.00%) 783.25 ( -78.98%)
> Master major faults (stddev) 258.79 ( +0.00%) 226.68 ( -12.36%)
> Master reclaim 705.00 ( +0.00%) 29.50 ( -95.68%)
> Master reclaim (stddev) 232.87 ( +0.00%) 44.72 ( -80.45%)
> Master scan 714.88 ( +0.00%) 30.00 ( -95.67%)
> Master scan (stddev) 237.44 ( +0.00%) 45.39 ( -80.54%)
> Master kswapd reclaim 114.75 ( +0.00%) 50.00 ( -55.94%)
> Master kswapd reclaim (stddev) 128.51 ( +0.00%) 9.45 ( -91.93%)
> Master kswapd scan 115.75 ( +0.00%) 50.00 ( -56.32%)
> Master kswapd scan (stddev) 130.31 ( +0.00%) 9.45 ( -92.04%)
> Slave walltime (s) 631.18 ( +0.00%) 577.68 ( -8.46%)
> Slave walltime (stddev) 9.89 ( +0.00%) 3.63 ( -57.47%)
> Slave major faults 28401.75 ( +0.00%) 14656.75 ( -48.39%)
> Slave major faults (stddev) 2629.97 ( +0.00%) 1911.81 ( -27.30%)
> Slave reclaim 65400.62 ( +0.00%) 1479.62 ( -97.74%)
> Slave reclaim (stddev) 11623.02 ( +0.00%) 1482.13 ( -87.24%)
> Slave scan 9050047.88 ( +0.00%) 95968.25 ( -98.94%)
> Slave scan (stddev) 1912786.94 ( +0.00%) 93390.71 ( -95.12%)
> Slave kswapd reclaim 327894.50 ( +0.00%) 227099.88 ( -30.74%)
> Slave kswapd reclaim (stddev) 22289.43 ( +0.00%) 16113.14 ( -27.71%)
> Slave kswapd scan 34987335.75 ( +0.00%) 1362367.12 ( -96.11%)
> Slave kswapd scan (stddev) 2523642.98 ( +0.00%) 156754.74 ( -93.79%)
>
> Here, the available memory is limited to 320 MB, the machine is
> overcommitted by 280 MB. The soft limit of the master is 300 MB, that
> of the slave merely 20 MB.
>
> Looking at the slave job first, it is much better off with the patched
> kernel: direct reclaim is almost gone, kswapd reclaim is decreased by
> a third. The result is much fewer major faults taken, which in turn
> lets the job finish quicker.
>
> It would be a zero-sum game if the improvement happened at the cost of
> the master but looking at the numbers, even the master performs better
> with the patched kernel. In fact, the master job is almost unaffected
> on the patched kernel compared to the control case.
>
> This is an odd phenomenon, as the patch does not directly change how
> the master is reclaimed. An explanation for this is that the severe
> overreclaim of the slave in the unpatched kernel results in the master
> growing bigger than in the patched case. Combining the fact that
> memcgs are scanned according to their size with the increased refault
> rate of the overreclaimed slave triggering global reclaim more often
> means that overall pressure on the master job is higher in the
> unpatched kernel.
>
> At any rate, the patched kernel seems to do a much better job at both
> overall resource allocation under soft limit overcommit as well as the
> requested prioritization of the master job.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Thank you for your work and the result seems atractive and code is much
simpler. My small concerns are..
1. This approach may increase latency of direct-reclaim because of priority=0.
2. In a case numa-spread/interleave application run in its own container,
pages on a node may paged-out again and again becasue of priority=0
if some other application runs in the node.
It seems difficult to use soft-limit with numa-aware applications.
Do you have suggestions ?
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-01-12 1:55 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-10 15:02 [patch 0/2] mm: memcg reclaim integration followups Johannes Weiner
2012-01-10 15:02 ` [patch 1/2] mm: memcg: per-memcg reclaim statistics Johannes Weiner
2012-01-10 23:54 ` Ying Han
2012-01-11 0:30 ` Johannes Weiner
2012-01-11 22:33 ` Ying Han
2012-01-12 9:17 ` Johannes Weiner
2012-01-10 15:02 ` [patch 2/2] mm: memcg: hierarchical soft limit reclaim Johannes Weiner
2012-01-11 21:42 ` Ying Han
2012-01-12 8:59 ` Johannes Weiner
2012-01-13 21:31 ` Ying Han
2012-01-13 22:44 ` Johannes Weiner
2012-01-17 14:22 ` Sha
2012-01-17 14:53 ` Johannes Weiner
2012-01-17 20:25 ` Ying Han
2012-01-17 21:56 ` Johannes Weiner
2012-01-17 23:39 ` Ying Han
2012-01-18 7:17 ` Sha
2012-01-18 9:25 ` Johannes Weiner
2012-01-18 11:25 ` Sha
2012-01-18 15:27 ` Michal Hocko
2012-01-19 6:38 ` Sha
2012-01-12 1:54 ` KAMEZAWA Hiroyuki [this message]
2012-01-13 12:16 ` Johannes Weiner
2012-01-18 5:26 ` KAMEZAWA Hiroyuki
2012-01-13 12:04 ` Michal Hocko
2012-01-13 15:50 ` Johannes Weiner
2012-01-13 16:34 ` Michal Hocko
2012-01-13 21:45 ` Ying Han
2012-01-18 9:45 ` Johannes Weiner
2012-01-18 20:38 ` Ying Han
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120112105427.4b80437b.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=bsingharora@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox