From: David Rientjes <rientjes@google.com>
To: marc@offline.be, Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org
Subject: Re: [Bug 49361] New: configuring TRANSPARENT_HUGEPAGE_ALWAYS can make system unresponsive and reboot
Date: Mon, 29 Oct 2012 13:33:06 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.00.1210291216330.15340@chino.kir.corp.google.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1210232242590.22652@chino.kir.corp.google.com>
On Tue, 23 Oct 2012, David Rientjes wrote:
> We'll need to collect some information before we can figure out what the
> problem is with 3.5.2.
>
> First, let's take a look at khugepaged. By default, it's supposed to wake
> up rarely (10s at minimum) and only scan 4K pages before going back to
> sleep. Having a consistent and very high cpu usage suggests the settings
> aren't the default. Can you do
>
> cat /sys/kernel/mm/transparent_hugepage/khugepaged/{alloc,scan}_sleep_millisecs
>
> The defaults should be 60000 and 10000, respectively. Then can you do
>
> cat /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan
>
> which should be 4096. If those are your settings, then it seems like
> khugepaged in 3.5.2 is going crazy and we'll need to look into that. Try
> collecting
>
> grep -e "thp|compact" /proc/vmstat
>
> and
>
> cat /proc/$(pidof khugepaged)/stack
>
> appended to a logfile at regular intervals after your start the build with
> transparent hugepages enabled always. After the machine becomes
> unresponsive and reboots, post that log.
>
This looks like an overly aggressive memory compaction issue; consider
from your "49361.1" attachment:
Sat Oct 27 02:39:05 CEST 2012
compact_blocks_moved 488381
compact_pages_moved 581856
compact_pagemigrate_failed 52533
compact_stall 59
compact_fail 36
compact_success 23
Sat Oct 27 02:39:15 CEST 2012
compact_blocks_moved 7797480
compact_pages_moved 589996
compact_pagemigrate_failed 53507
compact_stall 90
compact_fail 56
compact_success 24
Sat Oct 27 02:43:07 CEST 2012
compact_blocks_moved 276422153
compact_pages_moved 597836
compact_pagemigrate_failed 53886
compact_stall 109
compact_fail 76
compact_success 26
In four minutes, transparent hugepage allocation has scanned 275933772 2MB
pageblocks and only been successful three times in defragmenting enough
memory for the allocation to succeed. It's scanning on average 5518675
pageblocks each time it is invoked.
And then, from your "49361.2" attachment:
Sat Oct 27 02:48:30 CEST 2012
compact_blocks_moved 504039382
compact_pages_moved 776820
compact_pagemigrate_failed 58437
compact_stall 209
compact_fail 163
compact_success 36
...
Sat Oct 27 02:51:50 CEST 2012
compact_blocks_moved 722746600
compact_pages_moved 776820
compact_pagemigrate_failed 58437
compact_stall 209
compact_fail 173
compact_success 36
For more than three minutes, compact_stall does not increase but
compact_fail does (and compact_blocks_moved increases 43%), which suggests
deferred compaction is kicking in but for some reason we are still
scanning like crazy.
Reading the code, the only way this can happen is if nr_remaining is
always 0 (compact_pagemigrate_failed never increases), but also nr_migrate
is always 0 (compact_pages_moved never increases). So I think we're stuck
in the while loop in compact_zone() and are constantly calling
migrate_pages(). compact_finished() must be returning COMPACT_CONTINUE
even though cc->nr_migratepages == 0?
Adding Mel Gorman to the cc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-10-29 20:33 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-49361-27@https.bugzilla.kernel.org/>
2012-10-23 19:36 ` Andrew Morton
2012-10-24 0:34 ` Ni zhan Chen
2012-10-24 0:40 ` Andrew Morton
2012-10-24 5:53 ` David Rientjes
2012-10-29 20:33 ` David Rientjes [this message]
2012-11-01 17:14 ` Mel Gorman
2012-11-02 18:00 ` Marc Duponcheel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.00.1210291216330.15340@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bugzilla-daemon@bugzilla.kernel.org \
--cc=linux-mm@kvack.org \
--cc=marc@offline.be \
--cc=mgorman@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox