From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Minchan Kim <minchan@kernel.org>
Subject: [PATCH 3/5] vmscan: prevent excessive pageout of kswapd
Date: Wed, 22 Aug 2012 16:15:15 +0900 [thread overview]
Message-ID: <1345619717-5322-4-git-send-email-minchan@kernel.org> (raw)
In-Reply-To: <1345619717-5322-1-git-send-email-minchan@kernel.org>
If higher zone is very small, priority could be raised easily
while lower zones have enough free pages. When one of lower zones
doesn't meet high watermark, the zone try to reclaim pages with
the high prioirty which is increased by higher small zone.
It ends up reclaiming excessive pages. I saw 8~16M pageout
in my KVM test although we need just a few Kbytes.
This patch decrease the priority temporally by average between
current and previous reclaim prioirty and if we can't reclaim
enough pages with the priority, we can use the big jumped high
priority continuosly.
==DRIVER mapped-file-stream mapped-file-stream(0.00, -nan%)
Name mapped-file-stream mapped-file-stream(0.00, -nan%)
Elapsed 663 665 (2.00, 0.30%)
nr_vmscan_write 1341 849 (-492.00, -36.69%)
nr_vmscan_immediate_reclaim 0 8 (8.00, 0.00%)
pgpgin 21668 30280 (8612.00, 39.75%)
pgpgout 8392 6396 (-1996.00,-23.78%)
pswpin 22 8 (-14.00, -63.64%)
pswpout 1341 849 (-492.00, -36.69%)
pgactivate 16217 15959 (-258.00, -1.59%)
pgdeactivate 15431 15303 (-128.00, -0.83%)
pgfault 204524355 204524410 (55.00, 0.00%)
pgmajfault 204472528 204472602 (74.00, 0.00%)
pgsteal_kswapd_dma 466676 475265 (8589.00, 1.84%)
pgsteal_kswapd_normal 49663877 51289479 (1625602.00,3.27%)
pgsteal_kswapd_high 138182330 135817904 (-2364426.00,-1.71%)
pgsteal_kswapd_movable 4236726 4380123 (143397.00,3.38%)
pgsteal_direct_dma 9306 11910 (2604.00, 27.98%)
pgsteal_direct_normal 123835 165012 (41177.00,33.25%)
pgsteal_direct_high 274887 309271 (34384.00,12.51%)
pgsteal_direct_movable 38011 45638 (7627.00, 20.07%)
pgscan_kswapd_dma 947813 972089 (24276.00,2.56%)
pgscan_kswapd_normal 97902722 100850050 (2947328.00,3.01%)
pgscan_kswapd_high 274337809 269039236 (-5298573.00,-1.93%)
pgscan_kswapd_movable 8496474 8774392 (277918.00,3.27%)
pgscan_direct_dma 22855 26410 (3555.00, 15.55%)
pgscan_direct_normal 3604954 4186439 (581485.00,16.13%)
pgscan_direct_high 4504909 5132110 (627201.00,13.92%)
pgscan_direct_movable 105418 122790 (17372.00,16.48%)
pgscan_direct_throttle 0 0 (0.00, 0.00%)
pginodesteal 11111 6836 (-4275.00,-38.48%)
slabs_scanned 56320 56320 (0.00, 0.00%)
kswapd_inodesteal 31121 35904 (4783.00, 15.37%)
kswapd_low_wmark_hit_quickly 4607 5193 (586.00, 12.72%)
kswapd_high_wmark_hit_quickly 432 421 (-11.00, -2.55%)
kswapd_skip_congestion_wait 10254 12375 (2121.00, 20.68%)
pageoutrun 2879697 3071912 (192215.00,6.67%)
allocstall 8222 9727 (1505.00, 18.30%)
pgrotated 1341 850 (-491.00, -36.61%)
kswapd_totalscan 381684818 379635767 (-2049051.00,-0.54%)
kswapd_totalsteal 192549609 191962771 (-586838.00,-0.30%)
Kswapd_efficiency 50.00 50.00 (0.00, 0.00%)
direct_totalscan 8238136 9467749 (1229613.00,14.93%)
direct_totalsteal 446039 531831 (85792.00,19.23%)
direct_efficiency 5.00 5.00 (0.00, 0.00%)
reclaim_velocity 588119.08 585118.06 (-3001.02,-0.51%)
Elapsed time of test program is rather increased compared to
previous patch[2/5] but the number of reclaimed pages is much decreased.
before-patch: 192995648 after-patch: 192494602 diff: 501046(about 2G)
Since kswapd reclaimed smaller pages per turn compared to old behavior,
kswapd's pageoutrun is increased and allocstall is also increased
by about 18%. Yeb. It's not good in this workload but old behavior
worked well by just *luck* which reclaimed too many pages than
necessary amount so we could avoid frequent reclaim path.
As downside of that, it might evict part of working set and this patch
will prevent that problem without big downside, I believe.
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
mm/vmscan.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d1ebe69..0e2550c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2492,6 +2492,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
int i;
int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
unsigned long total_scanned;
+ int prev_priority[MAX_NR_ZONES];
struct reclaim_state *reclaim_state = current->reclaim_state;
unsigned long nr_soft_reclaimed;
unsigned long nr_soft_scanned;
@@ -2513,6 +2514,8 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
loop_again:
total_scanned = 0;
sc.priority = DEF_PRIORITY;
+ for (i = 0; i < MAX_NR_ZONES; i++)
+ prev_priority[i] = DEF_PRIORITY;
sc.nr_reclaimed = 0;
sc.may_writepage = !laptop_mode;
count_vm_event(PAGEOUTRUN);
@@ -2635,6 +2638,21 @@ loop_again:
!zone_watermark_ok_safe(zone, testorder,
high_wmark_pages(zone) + balance_gap,
end_zone, 0)) {
+ /*
+ * If higher zone is very small, priority could
+ * be raised easily while lower zones have
+ * enough free pages. When one of lower zones
+ * doesn't meet high watermark, the zone try to
+ * reclaim pages with high prioirty which is
+ * increased by higher small zone. It ends up
+ * reclaiming excessive pages.
+ * Let's decrease the priority temporally.
+ */
+ int tmp_priority = sc.priority;
+ if ((prev_priority[i] - sc.priority) > 1)
+ sc.priority = (prev_priority[i] +
+ sc.priority) >> 1;
+
shrink_zone(zone, &sc);
reclaim_state->reclaimed_slab = 0;
@@ -2644,7 +2662,11 @@ loop_again:
if (nr_slab == 0 && !zone_reclaimable(zone))
zone->all_unreclaimable = 1;
- }
+
+ prev_priority[i] = tmp_priority;
+ sc.priority = tmp_priority;
+ } else
+ prev_priority[i] = DEF_PRIORITY;
/*
* If we've done a decent amount of scanning and
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-08-22 7:15 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-22 7:15 [RFC 0/5] Consider higher small zone and mmaped-pages stream Minchan Kim
2012-08-22 7:15 ` [PATCH 1/5] vmscan: Fix obsolete comment of balance_pgdat Minchan Kim
2012-08-23 17:37 ` Rik van Riel
2012-08-22 7:15 ` [PATCH 2/5] vmscan: sleep only if backingdev is congested Minchan Kim
2012-08-25 23:02 ` Rik van Riel
2012-08-22 7:15 ` Minchan Kim [this message]
2012-08-22 7:15 ` [PATCH 4/5] vmscan: get rid of unnecessary nr_dirty ret variable Minchan Kim
2012-08-22 7:15 ` [PATCH 5/5] vmscan: accelerate to reclaim mapped-pages stream Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1345619717-5322-4-git-send-email-minchan@kernel.org \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox