From: Andrew Lutomirski <luto@mit.edu>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
P?draig Brady <P@draigbrady.com>,
James Bottomley <James.Bottomley@hansenpartnership.com>,
Colin King <colin.king@canonical.com>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-mm <linux-mm@kvack.org>,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small
Date: Fri, 22 Jul 2011 09:21:47 -0400 [thread overview]
Message-ID: <CAObL_7ES+6xcLCewtOaZby5uYnT3F91TuKPVZc_aOWSpRjNg3A@mail.gmail.com> (raw)
In-Reply-To: <CAEwNFnD2ZTARC1Yw2uEYVSctBo7wsmA7rmQOaFH2rwOKoo3YjA@mail.gmail.com>
On Thu, Jul 21, 2011 at 8:30 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> On Fri, Jul 22, 2011 at 1:58 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>> On Thu, Jul 21, 2011 at 12:42 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
>>> On Thu, Jul 21, 2011 at 12:36:11PM -0400, Andrew Lutomirski wrote:
>>>> On Thu, Jul 21, 2011 at 12:24 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
>>>> > On Thu, Jul 21, 2011 at 05:09:59PM +0100, Mel Gorman wrote:
>>>> >> On Fri, Jul 22, 2011 at 12:37:22AM +0900, Minchan Kim wrote:
>>>> >> > On Fri, Jun 24, 2011 at 03:44:53PM +0100, Mel Gorman wrote:
>>>> >> > > (Built this time and passed a basic sniff-test.)
>>>> >> > >
>>>> >> > > During allocator-intensive workloads, kswapd will be woken frequently
>>>> >> > > causing free memory to oscillate between the high and min watermark.
>>>> >> > > This is expected behaviour. Unfortunately, if the highest zone is
>>>> >> > > small, a problem occurs.
>>>> >> > >
>>>> >> > > This seems to happen most with recent sandybridge laptops but it's
>>>> >> > > probably a co-incidence as some of these laptops just happen to have
>>>> >> > > a small Normal zone. The reproduction case is almost always during
>>>> >> > > copying large files that kswapd pegs at 100% CPU until the file is
>>>> >> > > deleted or cache is dropped.
>>>> >> > >
>>>> >> > > The problem is mostly down to sleeping_prematurely() keeping kswapd
>>>> >> > > awake when the highest zone is small and unreclaimable and compounded
>>>> >> > > by the fact we shrink slabs even when not shrinking zones causing a lot
>>>> >> > > of time to be spent in shrinkers and a lot of memory to be reclaimed.
>>>> >> > >
>>>> >> > > Patch 1 corrects sleeping_prematurely to check the zones matching
>>>> >> > > the classzone_idx instead of all zones.
>>>> >> > >
>>>> >> > > Patch 2 avoids shrinking slab when we are not shrinking a zone.
>>>> >> > >
>>>> >> > > Patch 3 notes that sleeping_prematurely is checking lower zones against
>>>> >> > > a high classzone which is not what allocators or balance_pgdat()
>>>> >> > > is doing leading to an artifical believe that kswapd should be
>>>> >> > > still awake.
>>>> >> > >
>>>> >> > > Patch 4 notes that when balance_pgdat() gives up on a high zone that the
>>>> >> > > decision is not communicated to sleeping_prematurely()
>>>> >> > >
>>>> >> > > This problem affects 2.6.38.8 for certain and is expected to affect
>>>> >> > > 2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable
>>>> >> > > to be picked up by distros and this series is against 3.0-rc4. I've
>>>> >> > > cc'd people that reported similar problems recently to see if they
>>>> >> > > still suffer from the problem and if this fixes it.
>>>> >> > >
>>>> >> >
>>>> >> > Good!
>>>> >> > This patch solved the problem.
>>>> >> > But there is still a mystery.
>>>> >> >
>>>> >> > In log, we could see excessive shrink_slab calls.
>>>> >>
>>>> >> Yes, because shrink_slab() was called on each loop through
>>>> >> balance_pgdat() even if the zone was balanced.
>>>> >>
>>>> >>
>>>> >> > And as you know, we had merged patch which adds cond_resched where last of the function
>>>> >> > in shrink_slab. So other task should get the CPU and we should not see
>>>> >> > 100% CPU of kswapd, I think.
>>>> >> >
>>>> >>
>>>> >> cond_resched() is not a substitute for going to sleep.
>>>> >
>>>> > Of course, it's not equal with sleep but other task should get CPU and conusme their time slice
>>>> > So we should never see 100% CPU consumption of kswapd.
>>>> > No?
>>>>
>>>> If the rest of the system is idle, then kswapd will happily use 100%
>>>> CPU. (Or on a multi-core system, kswapd will use close to 100% of one
>>>
>>> Of course. But at least, we have a test program and I think it's not idle.
>>
>> The test program I used was 'top', which is pretty close to idle.
>>
>>>
>>>> CPU even if another task is using the other one. This is bad enough
>>>> on a desktop, but on a laptop you start to notice when your battery
>>>> dies.)
>>>
>>> Of course it's bad. :)
>>> What I want to know is just what's exact cause of 100% CPU usage.
>>> It might be not 100% but we might use the word sloppily.
>>>
>>
>> Well, if you want to pedantic, my laptop can, in theory, demonstrate
>> true 100% CPU usage. Trigger the bug, suspend every other thread, and
>> listen to the laptop fan spin and feel the laptop get hot. (The fan
>> is controlled by the EC and takes no CPU.)
>>
>> In practice, the usage was close enough to 100% that it got rounded.
>>
>> The cond_resched was enough to at least make the system responsive
>> instead of the hard freeze I used to get.
>
> I don't want to be pedantic. :)
> What I have a thought about 100% CPU usage was that it doesn't yield
> CPU and spins on the CPU but as I heard your example(ie, cond_resched
> makes the system responsive), it's not the case. It was just to use
> most of time in kswapd, not 100%. It seems I was paranoid about the
> word, sorry for that.
Ah, sorry. I must have been unclear in my original email.
In 2.6.39, it made my system unresponsive. With your cond_resched and
pgdat_balanced fixes, it just made kswapd eat all available CPU, but
the system still worked.
--Andy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-07-22 13:22 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-24 14:44 Mel Gorman
2011-06-24 14:44 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman
2011-06-25 21:33 ` Rik van Riel
2011-06-27 6:10 ` Minchan Kim
2011-06-28 21:49 ` Andrew Morton
2011-06-29 10:57 ` Pádraig Brady
2011-06-30 9:39 ` Mel Gorman
2011-06-30 2:23 ` KOSAKI Motohiro
2011-06-24 14:44 ` [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone Mel Gorman
2011-06-25 21:40 ` Rik van Riel
2011-06-28 23:38 ` Minchan Kim
2011-06-30 2:37 ` KOSAKI Motohiro
2011-06-24 14:44 ` [PATCH 3/4] mm: vmscan: Evaluate the watermarks against the correct classzone Mel Gorman
2011-06-25 21:42 ` Rik van Riel
2011-06-27 6:53 ` Minchan Kim
2011-06-28 12:52 ` Mel Gorman
2011-06-28 23:23 ` Minchan Kim
2011-06-28 23:23 ` Minchan Kim
2011-06-24 14:44 ` [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully Mel Gorman
2011-06-25 23:17 ` Rik van Riel
2011-06-30 9:05 ` KOSAKI Motohiro
2011-06-30 10:19 ` Mel Gorman
2011-07-19 16:09 ` Minchan Kim
2011-07-20 10:48 ` Mel Gorman
2011-07-21 15:30 ` Minchan Kim
2011-07-21 16:07 ` Mel Gorman
2011-07-21 16:36 ` Minchan Kim
2011-07-21 17:01 ` Mel Gorman
2011-07-22 0:21 ` Minchan Kim
2011-07-22 7:42 ` Mel Gorman
2011-06-25 14:23 ` [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Andrew Lutomirski
2011-07-21 15:37 ` Minchan Kim
2011-07-21 16:09 ` Mel Gorman
2011-07-21 16:24 ` Minchan Kim
2011-07-21 16:36 ` Andrew Lutomirski
2011-07-21 16:42 ` Minchan Kim
2011-07-21 16:58 ` Andrew Lutomirski
2011-07-22 0:30 ` Minchan Kim
2011-07-22 13:21 ` Andrew Lutomirski [this message]
-- strict thread matches above, loose matches on Subject: below --
2011-06-24 13:43 Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAObL_7ES+6xcLCewtOaZby5uYnT3F91TuKPVZc_aOWSpRjNg3A@mail.gmail.com \
--to=luto@mit.edu \
--cc=James.Bottomley@hansenpartnership.com \
--cc=P@draigbrady.com \
--cc=akpm@linux-foundation.org \
--cc=colin.king@canonical.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=minchan.kim@gmail.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox