From: Andrew Morton <akpm@linux-foundation.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 0/3] mm: improve page aging fairness between zones/nodes
Date: Fri, 26 Jul 2013 15:45:33 -0700 [thread overview]
Message-ID: <20130726154533.aebd39c603ffe8de3b2c76fb@linux-foundation.org> (raw)
In-Reply-To: <1374267325-22865-1-git-send-email-hannes@cmpxchg.org>
On Fri, 19 Jul 2013 16:55:22 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> The way the page allocator interacts with kswapd creates aging
> imbalances, where the amount of time a userspace page gets in memory
> under reclaim pressure is dependent on which zone, which node the
> allocator took the page frame from.
>
> #1 fixes missed kswapd wakeups on NUMA systems, which lead to some
> nodes falling behind for a full reclaim cycle relative to the other
> nodes in the system
>
> #3 fixes an interaction where kswapd and a continuous stream of page
> allocations keep the preferred zone of a task between the high and
> low watermark (allocations succeed + kswapd does not go to sleep)
> indefinitely, completely underutilizing the lower zones and
> thrashing on the preferred zone
>
> These patches are the aging fairness part of the thrash-detection
> based file LRU balancing. Andrea recommended to submit them
> separately as they are bugfixes in their own right.
>
> The following test ran a foreground workload (memcachetest) with
> background IO of various sizes on a 4 node 8G system (similar results
> were observed with single-node 4G systems):
>
> parallelio
> BAS FAIRALLO
> BASE FAIRALLOC
> Ops memcachetest-0M 5170.00 ( 0.00%) 5283.00 ( 2.19%)
> Ops memcachetest-791M 4740.00 ( 0.00%) 5293.00 ( 11.67%)
> Ops memcachetest-2639M 2551.00 ( 0.00%) 4950.00 ( 94.04%)
> Ops memcachetest-4487M 2606.00 ( 0.00%) 3922.00 ( 50.50%)
> Ops io-duration-0M 0.00 ( 0.00%) 0.00 ( 0.00%)
> Ops io-duration-791M 55.00 ( 0.00%) 18.00 ( 67.27%)
> Ops io-duration-2639M 235.00 ( 0.00%) 103.00 ( 56.17%)
> Ops io-duration-4487M 278.00 ( 0.00%) 173.00 ( 37.77%)
> Ops swaptotal-0M 0.00 ( 0.00%) 0.00 ( 0.00%)
> Ops swaptotal-791M 245184.00 ( 0.00%) 0.00 ( 0.00%)
> Ops swaptotal-2639M 468069.00 ( 0.00%) 108778.00 ( 76.76%)
> Ops swaptotal-4487M 452529.00 ( 0.00%) 76623.00 ( 83.07%)
> Ops swapin-0M 0.00 ( 0.00%) 0.00 ( 0.00%)
> Ops swapin-791M 108297.00 ( 0.00%) 0.00 ( 0.00%)
> Ops swapin-2639M 169537.00 ( 0.00%) 50031.00 ( 70.49%)
> Ops swapin-4487M 167435.00 ( 0.00%) 34178.00 ( 79.59%)
> Ops minorfaults-0M 1518666.00 ( 0.00%) 1503993.00 ( 0.97%)
> Ops minorfaults-791M 1676963.00 ( 0.00%) 1520115.00 ( 9.35%)
> Ops minorfaults-2639M 1606035.00 ( 0.00%) 1799717.00 (-12.06%)
> Ops minorfaults-4487M 1612118.00 ( 0.00%) 1583825.00 ( 1.76%)
> Ops majorfaults-0M 6.00 ( 0.00%) 0.00 ( 0.00%)
> Ops majorfaults-791M 13836.00 ( 0.00%) 10.00 ( 99.93%)
> Ops majorfaults-2639M 22307.00 ( 0.00%) 6490.00 ( 70.91%)
> Ops majorfaults-4487M 21631.00 ( 0.00%) 4380.00 ( 79.75%)
A reminder whether positive numbers are good or bad would be useful ;)
> BAS FAIRALLO
> BASE FAIRALLOC
> User 287.78 460.97
> System 2151.67 3142.51
> Elapsed 9737.00 8879.34
Confused. Why would the amount of user time increase so much?
And that's a tremendous increase in system time. Am I interpreting
this correctly?
> BAS FAIRALLO
> BASE FAIRALLOC
> Minor Faults 53721925 57188551
> Major Faults 392195 15157
> Swap Ins 2994854 112770
> Swap Outs 4907092 134982
> Direct pages scanned 0 41824
> Kswapd pages scanned 32975063 8128269
> Kswapd pages reclaimed 6323069 7093495
> Direct pages reclaimed 0 41824
> Kswapd efficiency 19% 87%
> Kswapd velocity 3386.573 915.414
> Direct efficiency 100% 100%
> Direct velocity 0.000 4.710
> Percentage direct scans 0% 0%
> Zone normal velocity 2011.338 550.661
> Zone dma32 velocity 1365.623 369.221
> Zone dma velocity 9.612 0.242
> Page writes by reclaim 18732404.000 614807.000
> Page writes file 13825312 479825
> Page writes anon 4907092 134982
> Page reclaim immediate 85490 5647
> Sector Reads 12080532 483244
> Sector Writes 88740508 65438876
> Page rescued immediate 0 0
> Slabs scanned 82560 12160
> Direct inode steals 0 0
> Kswapd inode steals 24401 40013
> Kswapd skipped wait 0 0
> THP fault alloc 6 8
> THP collapse alloc 5481 5812
> THP splits 75 22
> THP fault fallback 0 0
> THP collapse fail 0 0
> Compaction stalls 0 54
> Compaction success 0 45
> Compaction failures 0 9
> Page migrate success 881492 82278
> Page migrate failure 0 0
> Compaction pages isolated 0 60334
> Compaction migrate scanned 0 53505
> Compaction free scanned 0 1537605
> Compaction cost 914 86
> NUMA PTE updates 46738231 41988419
> NUMA hint faults 31175564 24213387
> NUMA hint local faults 10427393 6411593
> NUMA pages migrated 881492 55344
> AutoNUMA cost 156221 121361
Some nice numbers there.
> The overall runtime was reduced, throughput for both the foreground
> workload as well as the background IO improved, major faults, swapping
> and reclaim activity shrunk significantly, reclaim efficiency more
> than quadrupled.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-07-26 22:45 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-19 20:55 Johannes Weiner
2013-07-19 20:55 ` [patch 1/3] mm: vmscan: fix numa reclaim balance problem in kswapd Johannes Weiner
2013-07-22 19:47 ` Rik van Riel
2013-07-22 20:14 ` Johannes Weiner
2013-07-26 22:53 ` Andrew Morton
2013-07-30 17:45 ` Johannes Weiner
2013-07-31 12:43 ` Johannes Weiner
2013-07-19 20:55 ` [patch 2/3] mm: page_alloc: rearrange watermark checking in get_page_from_freelist Johannes Weiner
2013-07-22 19:51 ` Rik van Riel
2013-07-19 20:55 ` [patch 3/3] mm: page_alloc: fair zone allocator policy Johannes Weiner
2013-07-22 20:21 ` Rik van Riel
2013-07-22 21:04 ` Johannes Weiner
2013-07-22 22:48 ` Rik van Riel
2013-07-25 6:50 ` Paul Bolle
2013-07-25 15:10 ` Johannes Weiner
2013-07-25 15:20 ` Paul Bolle
2013-07-29 17:48 ` Andrea Arcangeli
2013-07-29 22:24 ` Johannes Weiner
2013-08-01 2:56 ` Minchan Kim
2013-08-01 4:31 ` Rik van Riel
2013-08-01 15:51 ` Andrea Arcangeli
2013-08-01 19:58 ` Johannes Weiner
2013-08-01 22:16 ` Andrea Arcangeli
2013-08-02 6:22 ` Johannes Weiner
2013-08-02 7:32 ` Minchan Kim
2013-07-22 16:48 ` [patch 0/3] mm: improve page aging fairness between zones/nodes Zlatko Calusic
2013-07-22 17:01 ` Johannes Weiner
2013-07-22 17:14 ` Zlatko Calusic
2013-07-24 11:18 ` Zlatko Calusic
2013-07-24 12:46 ` Hush Bensen
2013-07-24 13:59 ` Zlatko Calusic
2013-07-31 9:33 ` Zlatko Calusic
2013-07-26 22:45 ` Andrew Morton [this message]
2013-07-26 23:14 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130726154533.aebd39c603ffe8de3b2c76fb@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=aarcange@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox