From: Minchan Kim <minchan.kim@gmail.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Simon Kirby <sim@hostway.ca>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Shaohua Li <shaohua.li@intel.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
linux-mm <linux-mm@kvack.org>,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/5] mm: kswapd: Stop high-order balancing when any suitable zone is balanced
Date: Tue, 7 Dec 2010 10:32:45 +0900 [thread overview]
Message-ID: <AANLkTimvmbvZ-9RcLsefTqbq1ktm6=-XD1N6z4JHBh=v@mail.gmail.com> (raw)
In-Reply-To: <20101206105558.GA21406@csn.ul.ie>
On Mon, Dec 6, 2010 at 7:55 PM, Mel Gorman <mel@csn.ul.ie> wrote:
> On Mon, Dec 06, 2010 at 08:35:18AM +0900, Minchan Kim wrote:
>> Hi Mel,
>>
>> On Fri, Dec 3, 2010 at 8:45 PM, Mel Gorman <mel@csn.ul.ie> wrote:
>> > When the allocator enters its slow path, kswapd is woken up to balance the
>> > node. It continues working until all zones within the node are balanced. For
>> > order-0 allocations, this makes perfect sense but for higher orders it can
>> > have unintended side-effects. If the zone sizes are imbalanced, kswapd may
>> > reclaim heavily within a smaller zone discarding an excessive number of
>> > pages. The user-visible behaviour is that kswapd is awake and reclaiming
>> > even though plenty of pages are free from a suitable zone.
>> >
>> > This patch alters the "balance" logic for high-order reclaim allowing kswapd
>> > to stop if any suitable zone becomes balanced to reduce the number of pages
>> > it reclaims from other zones. kswapd still tries to ensure that order-0
>> > watermarks for all zones are met before sleeping.
>> >
>> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
>>
>> <snip>
>>
>> > - if (!all_zones_ok) {
>> > + if (!(all_zones_ok || (order && any_zone_ok))) {
>> > cond_resched();
>> >
>> > try_to_freeze();
>> > @@ -2361,6 +2366,31 @@ out:
>> > goto loop_again;
>> > }
>> >
>> > + /*
>> > + * If kswapd was reclaiming at a higher order, it has the option of
>> > + * sleeping without all zones being balanced. Before it does, it must
>> > + * ensure that the watermarks for order-0 on *all* zones are met and
>> > + * that the congestion flags are cleared
>> > + */
>> > + if (order) {
>> > + for (i = 0; i <= end_zone; i++) {
>> > + struct zone *zone = pgdat->node_zones + i;
>> > +
>> > + if (!populated_zone(zone))
>> > + continue;
>> > +
>> > + if (zone->all_unreclaimable && priority != DEF_PRIORITY)
>> > + continue;
>> > +
>> > + zone_clear_flag(zone, ZONE_CONGESTED);
>>
>> Why clear ZONE_CONGESTED?
>> If you have a cause, please, write down the comment.
>>
>
> It's because kswapd is the only mechanism that clears the congestion
> flag. If it's not cleared and kswapd goes to sleep, the flag could be
> left set causing hard-to-diagnose stalls. I'll add a comment.
Seems good.
>
>> <snip>
>>
>> First impression on this patch is that it changes scanning behavior as
>> well as reclaiming on high order reclaim.
>
> It does affect scanning behaviour for high-order reclaim. Specifically,
> it may stop scanning once a zone is balanced within the node. Previously
> it would continue scanning until all zones were balanced. Is this what
> you are thinking of or something else?
Yes. I mean page aging of high zones.
>
>> I can't say old behavior is right but we can't say this behavior is
>> right, too although this patch solves the problem. At least, we might
>> need some data that shows this patch doesn't have a regression.
>
> How do you suggest it be tested and this data be gathered? I tested a number of
> workloads that keep kswapd awake but found no differences of major significant
> even though it was using high-order allocations. The problem with identifying
> small regressions for high-order allocations is that the state of the system
> when lumpy reclaim starts is very important as it determines how much work
> has to be done. I did not find major regressions in performance.
>
> For the tests I did run;
>
> fsmark showed nothing useful. iozone showed nothing useful either as it didn't
> even wake kswapd. sysbench showed minor performance gains and losses but it
> is not useful as it typically does not wake kswapd unless the database is
> badly configured.
>
> I ran postmark because it was the closest benchmark to a mail simulator I
> had access to. This sucks because it's no longer representative of a mail
> server and is more like a crappy filesystem benchmark. To get it closer to a
> real server, there was also a program running in the background that mapped
> a large anonymous segment and scanned it in blocks.
>
> POSTMARK
> postmark-traceonly-v3r1-postmarkpostmark-kanyzone-v2r6-postmark
> traceonly-v3r1 kanyzone-v2r6
> Transactions per second: 2.00 ( 0.00%) 2.00 ( 0.00%)
> Data megabytes read per second: 8.14 ( 0.00%) 8.59 ( 5.24%)
> Data megabytes written per second: 18.94 ( 0.00%) 19.98 ( 5.21%)
> Files created alone per second: 4.00 ( 0.00%) 4.00 ( 0.00%)
> Files create/transact per second: 1.00 ( 0.00%) 1.00 ( 0.00%)
> Files deleted alone per second: 34.00 ( 0.00%) 30.00 (-13.33%)
Do you know the reason only file deletion has a big regression?
> Files delete/transact per second: 1.00 ( 0.00%) 1.00 ( 0.00%)
>
> MMTests Statistics: duration
> User/Sys Time Running Test (seconds) 152.4 152.92
> Total Elapsed Time (seconds) 5110.96 4847.22
>
> FTrace Reclaim Statistics: vmscan
> postmark-traceonly-v3r1-postmarkpostmark-kanyzone-v2r6-postmark
> traceonly-v3r1 kanyzone-v2r6
> Direct reclaims 0 0
> Direct reclaim pages scanned 0 0
> Direct reclaim pages reclaimed 0 0
> Direct reclaim write file async I/O 0 0
> Direct reclaim write anon async I/O 0 0
> Direct reclaim write file sync I/O 0 0
> Direct reclaim write anon sync I/O 0 0
> Wake kswapd requests 0 0
> Kswapd wakeups 2177 2174
> Kswapd pages scanned 34690766 34691473
Perhaps, in your workload, any_zone is highest zone.
If any_zone became low zone, kswapd pages scanned would have a big
difference because old behavior try to balance all zones.
Could we evaluate this situation? but I have no idea how we set up the
situation. :(
> Kswapd pages reclaimed 34511965 34513478
> Kswapd reclaim write file async I/O 32 0
> Kswapd reclaim write anon async I/O 2357 2561
> Kswapd reclaim write file sync I/O 0 0
> Kswapd reclaim write anon sync I/O 0 0
> Time stalled direct reclaim (seconds) 0.00 0.00
> Time kswapd awake (seconds) 632.10 683.34
>
> Total pages scanned 34690766 34691473
> Total pages reclaimed 34511965 34513478
> %age total pages scanned/reclaimed 99.48% 99.49%
> %age total pages scanned/written 0.01% 0.01%
> %age file pages scanned/written 0.00% 0.00%
> Percentage Time Spent Direct Reclaim 0.00% 0.00%
> Percentage Time kswapd Awake 12.37% 14.10%
Is "kswapd Awake" correct?
AFAIR, In your implementation, you seems to account kswapd time even
though kswapd are schedule out.
I mean, for example,
kswapd
-> time stamp start
-> balance_pgdat
-> cond_resched(kswapd schedule out)
-> app 1 start
-> app 2 start
-> kswapd schedule in
-> time stamp end.
If it's right, kswapd awake doesn't have a big meaning.
>
> proc vmstat: Faults
> postmark-traceonly-v3r1-postmarkpostmark-kanyzone-v2r6-postmark
> traceonly-v3r1 kanyzone-v2r6
> Major Faults 1979 1741
> Minor Faults 13660834 13587939
> Page ins 89060 74704
> Page outs 69800 58884
> Swap ins 1193 1499
> Swap outs 2403 2562
>
> Still, IO performance was improved (higher rates of read/write) and the test
> completed significantly faster with this patch series applied. kswapd was
> awake for longer and reclaimed marginally more pages with more swap-ins and
Longer wake may be due to wrong gathering of time as I said.
> swap-outs which is unfortunate but it's somewhat balanced by fewer faults
> and fewer page-ins. Basically, in terms of reclaim the figures are so close
> that it is within the performance variations lumpy reclaim has depending on
> the exact state of the system when reclaim starts.
What I wanted to see is that when if zones above any_zone isn't aging
how it affect system performance.
This patch is changing balancing mechanism of kswapd so I think the
experiment is valuable.
I don't want to make contributors to be tired by bad reviewer.
What do you think about that?
>
>> It's
>> not easy but I believe you can do very well as like having done until
>> now. I didn't see whole series so I might miss something.
>>
>
> --
> Mel Gorman
> Part-time Phd Student Linux Technology Center
> University of Limerick IBM Dublin Software Lab
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-12-07 1:32 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-03 11:45 [PATCH 0/5] Prevent kswapd dumping excessive amounts of memory in response to high-order allocations V2 Mel Gorman
2010-12-03 11:45 ` [PATCH 1/5] mm: kswapd: Stop high-order balancing when any suitable zone is balanced Mel Gorman
2010-12-05 23:35 ` Minchan Kim
2010-12-06 10:55 ` Mel Gorman
2010-12-07 1:32 ` Minchan Kim [this message]
2010-12-07 9:49 ` Mel Gorman
2010-12-06 2:35 ` KAMEZAWA Hiroyuki
2010-12-06 11:32 ` Mel Gorman
2010-12-06 23:51 ` KAMEZAWA Hiroyuki
2010-12-03 11:45 ` [PATCH 2/5] mm: kswapd: Use the order that kswapd was reclaiming at for sleeping_prematurely() Mel Gorman
2010-12-03 11:45 ` [PATCH 3/5] mm: kswapd: Use the classzone idx that kswapd was using " Mel Gorman
2010-12-03 11:45 ` [PATCH 4/5] mm: kswapd: Reset kswapd_max_order and classzone_idx after reading Mel Gorman
2010-12-03 11:45 ` [PATCH 5/5] mm: kswapd: Keep kswapd awake for high-order allocations until a percentage of the node is balanced Mel Gorman
2010-12-09 1:18 ` [PATCH 0/5] Prevent kswapd dumping excessive amounts of memory in response to high-order allocations V2 Simon Kirby
2010-12-09 12:13 ` Mel Gorman
2010-12-09 1:55 ` Simon Kirby
2010-12-09 11:45 ` Mel Gorman
2010-12-10 0:06 ` Simon Kirby
2010-12-10 11:28 ` Mel Gorman
2010-12-11 1:33 ` Simon Kirby
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='AANLkTimvmbvZ-9RcLsefTqbq1ktm6=-XD1N6z4JHBh=v@mail.gmail.com' \
--to=minchan.kim@gmail.com \
--cc=dave@linux.vnet.ibm.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=shaohua.li@intel.com \
--cc=sim@hostway.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox