From: Michal Hocko <mhocko@kernel.org>
To: Hugh Dickins <hughd@google.com>, Vlastimil Babka <vbabka@suse.cz>,
Joonsoo Kim <js1304@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Mel Gorman <mgorman@suse.de>,
David Rientjes <rientjes@google.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Hillf Danton <hillf.zj@alibaba-inc.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/3] OOM detection rework v4
Date: Tue, 1 Mar 2016 14:38:46 +0100 [thread overview]
Message-ID: <20160301133846.GF9461@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.LSU.2.11.1602292251170.7563@eggly.anvils>
[Adding Vlastimil and Joonsoo for compaction related things - this was a
large thread but the more interesting part starts with
http://lkml.kernel.org/r/alpine.LSU.2.11.1602241832160.15564@eggly.anvils]
On Mon 29-02-16 23:29:06, Hugh Dickins wrote:
> On Mon, 29 Feb 2016, Michal Hocko wrote:
> > On Wed 24-02-16 19:47:06, Hugh Dickins wrote:
> > [...]
> > > Boot with mem=1G (or boot your usual way, and do something to occupy
> > > most of the memory: I think /proc/sys/vm/nr_hugepages provides a great
> > > way to gobble up most of the memory, though it's not how I've done it).
> > >
> > > Make sure you have swap: 2G is more than enough. Copy the v4.5-rc5
> > > kernel source tree into a tmpfs: size=2G is more than enough.
> > > make defconfig there, then make -j20.
> > >
> > > On a v4.5-rc5 kernel that builds fine, on mmotm it is soon OOM-killed.
> > >
> > > Except that you'll probably need to fiddle around with that j20,
> > > it's true for my laptop but not for my workstation. j20 just happens
> > > to be what I've had there for years, that I now see breaking down
> > > (I can lower to j6 to proceed, perhaps could go a bit higher,
> > > but it still doesn't exercise swap very much).
> >
> > I have tried to reproduce and failed in a virtual on my laptop. I
> > will try with another host with more CPUs (because my laptop has only
> > two). Just for the record I did: boot 1G machine in kvm, I have 2G swap
> > and reserve 800M for hugetlb pages (I got 445 of them). Then I extract
> > the kernel source to tmpfs (-o size=2G), make defconfig and make -j20
> > (16, 10 no difference really). I was also collecting vmstat in the
> > background. The compilation takes ages but the behavior seems consistent
> > and stable.
>
> Thanks a lot for giving it a go.
>
> I'm puzzled. 445 hugetlb pages in 800M surprises me: some of them
> are less than 2M big?? But probably that's just a misunderstanding
> or typo somewhere.
A typo. 445 was from 900M test which I was doing while writing the
email. Sorry about the confusion.
> Ignoring that, you're successfully doing a make -20 defconfig build
> in tmpfs, with only 224M of RAM available, plus 2G of swap? I'm not
> at all surprised that it takes ages, but I am very surprised that it
> does not OOM. I suppose by rights it ought not to OOM, the built
> tree occupies only a little more than 1G, so you do have enough swap;
> but I wouldn't get anywhere near that myself without OOMing - I give
> myself 1G of RAM (well, minus whatever the booted system takes up)
> to do that build in, four times your RAM, yet in my case it OOMs.
>
> That source tree alone occupies more than 700M, so just copying it
> into your tmpfs would take a long time.
OK, I just found out that I was cheating a bit. I was building
linux-3.7-rc5.tar.bz2 which is smaller:
$ du -sh /mnt/tmpfs/linux-3.7-rc5/
537M /mnt/tmpfs/linux-3.7-rc5/
and after the defconfig build:
$ free
total used free shared buffers cached
Mem: 1008460 941904 66556 0 5092 806760
-/+ buffers/cache: 130052 878408
Swap: 2097148 42648 2054500
$ du -sh linux-3.7-rc5/
799M linux-3.7-rc5/
Sorry about that but this is what my other tests were using and I forgot
to check. Now let's try the same with the current linus tree:
host $ git archive v4.5-rc6 --prefix=linux-4.5-rc6/ | bzip2 > linux-4.5-rc6.tar.bz2
$ du -sh /mnt/tmpfs/linux-4.5-rc6/
707M /mnt/tmpfs/linux-4.5-rc6/
$ free
total used free shared buffers cached
Mem: 1008460 962976 45484 0 7236 820064
-/+ buffers/cache: 135676 872784
Swap: 2097148 16 2097132
$ time make -j20 > /dev/null
drivers/acpi/property.c: In function a??acpi_data_prop_reada??:
drivers/acpi/property.c:745:8: warning: a??obja?? may be used uninitialized in this function [-Wmaybe-uninitialized]
real 8m36.621s
user 14m1.642s
sys 2m45.238s
so I wasn't cheating all that much...
> I'd expect a build in 224M
> RAM plus 2G of swap to take so long, that I'd be very grateful to be
> OOM killed, even if there is technically enough space. Unless
> perhaps it's some superfast swap that you have?
the swap partition is a standard qcow image stored on my SSD disk. So
I guess the IO should be quite fast. This smells like a potential
contributor because my reclaim seems to be much faster and that should
lead to a more efficient reclaim (in the scanned/reclaimed sense).
I realize I might be boring already when blaming compaction but let me
try again ;)
$ grep compact /proc/vmstat
compact_migrate_scanned 113983
compact_free_scanned 1433503
compact_isolated 134307
compact_stall 128
compact_fail 26
compact_success 102
compact_kcompatd_wake 0
So the whole load has done the direct compaction only 128 times during
that test. This doesn't sound much to me
$ grep allocstall /proc/vmstat
allocstall 1061
we entered the direct reclaim much more but most of the load will be
order-0 so this might be still ok. So I've tried the following:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1993894b4219..107d444afdb1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2910,6 +2910,9 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
mode, contended_compaction);
current->flags &= ~PF_MEMALLOC;
+ if (order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER)
+ trace_printk("order:%d gfp_mask:%pGg compact_result:%lu\n", order, &gfp_mask, compact_result);
+
switch (compact_result) {
case COMPACT_DEFERRED:
*deferred_compaction = true;
And the result was:
$ cat /debug/tracing/trace_pipe | tee ~/trace.log
gcc-8707 [001] .... 137.946370: __alloc_pages_direct_compact: order:2 gfp_mask:GFP_KERNEL_ACCOUNT|__GFP_NOTRACK compact_result:1
gcc-8726 [000] .... 138.528571: __alloc_pages_direct_compact: order:2 gfp_mask:GFP_KERNEL_ACCOUNT|__GFP_NOTRACK compact_result:1
this shows that order-2 memory pressure is not overly high in my
setup. Both attempts ended up COMPACT_SKIPPED which is interesting.
So I went back to 800M of hugetlb pages and tried again. It took ages
so I have interrupted that after one hour (there was still no OOM). The
trace log is quite interesting regardless:
$ wc -l ~/trace.log
371 /root/trace.log
$ grep compact_stall /proc/vmstat
compact_stall 190
so the compaction was still ignored more than actually invoked for
!costly allocations:
sed 's@.*order:\([[:digit:]]\).* compact_result:\([[:digit:]]\)@\1 \2@' ~/trace.log | sort | uniq -c
190 2 1
122 2 3
59 2 4
#define COMPACT_SKIPPED 1
#define COMPACT_PARTIAL 3
#define COMPACT_COMPLETE 4
that means that compaction is even not tried in half cases! This
doesn't sounds right to me, especially when we are talking about
<= PAGE_ALLOC_COSTLY_ORDER requests which are implicitly nofail, because
then we simply rely on the order-0 reclaim to automagically form higher
blocks. This might indeed work when we retry many times but I guess this
is not a good approach. It leads to a excessive reclaim and the stall
for allocation can be really large.
One of the suspicious places is __compaction_suitable which does order-0
watermark check (increased by 2<<order). I have put another trace_printk
there and it clearly pointed out this was the case.
So I have tried the following:
diff --git a/mm/compaction.c b/mm/compaction.c
index 4d99e1f5055c..7364e48cf69a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1276,6 +1276,9 @@ static unsigned long __compaction_suitable(struct zone *zone, int order,
alloc_flags))
return COMPACT_PARTIAL;
+ if (order <= PAGE_ALLOC_COSTLY_ORDER)
+ return COMPACT_CONTINUE;
+
/*
* Watermarks for order-0 must be met for compaction. Note the 2UL.
* This is because during migration, copies of pages need to be
and retried the same test (without huge pages):
$ time make -j20 > /dev/null
real 8m46.626s
user 14m15.823s
sys 2m45.471s
the time increased but I haven't checked how stable the result is.
$ grep compact /proc/vmstat
compact_migrate_scanned 139822
compact_free_scanned 1661642
compact_isolated 139407
compact_stall 129
compact_fail 58
compact_success 71
compact_kcompatd_wake 1
$ grep allocstall /proc/vmstat
allocstall 1665
this is worse because we have scanned more pages for migration but the
overall success rate was much smaller and the direct reclaim was invoked
more. I do not have a good theory for that and will play with this some
more. Maybe other changes are needed deeper in the compaction code.
I will play with this some more but I would be really interested to hear
whether this helped Hugh with his setup. Vlastimi, Joonsoo does this
even make sense to you?
> I was only suggesting to allocate hugetlb pages, if you preferred
> not to reboot with artificially reduced RAM. Not an issue if you're
> booting VMs.
Ohh, I see.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-03-01 13:38 UTC|newest]
Thread overview: 152+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-15 18:19 Michal Hocko
2015-12-15 18:19 ` [PATCH 1/3] mm, oom: rework oom detection Michal Hocko
2016-01-14 22:58 ` David Rientjes
2016-01-16 1:07 ` Tetsuo Handa
2016-01-19 22:48 ` David Rientjes
2016-01-20 11:13 ` Tetsuo Handa
2016-01-20 13:13 ` Michal Hocko
2016-04-04 8:23 ` Vladimir Davydov
2016-04-04 9:42 ` Michal Hocko
2015-12-15 18:19 ` [PATCH 2/3] mm: throttle on IO only when there are too many dirty and writeback pages Michal Hocko
2016-03-17 11:35 ` Tetsuo Handa
2016-03-17 12:01 ` Michal Hocko
2015-12-15 18:19 ` [PATCH 3/3] mm: use watermak checks for __GFP_REPEAT high order allocations Michal Hocko
2015-12-16 23:35 ` [PATCH 0/3] OOM detection rework v4 Andrew Morton
2015-12-18 12:12 ` Michal Hocko
2015-12-16 23:58 ` Andrew Morton
2015-12-18 13:15 ` Michal Hocko
2015-12-18 16:35 ` Johannes Weiner
2015-12-24 12:41 ` Tetsuo Handa
2015-12-28 12:08 ` Tetsuo Handa
2015-12-28 14:13 ` Tetsuo Handa
2016-01-06 12:44 ` Vlastimil Babka
2016-01-08 12:37 ` Michal Hocko
2015-12-29 16:32 ` Michal Hocko
2015-12-30 15:05 ` Tetsuo Handa
2016-01-02 15:47 ` Tetsuo Handa
2016-01-20 12:24 ` Michal Hocko
2016-01-27 23:18 ` David Rientjes
2016-01-28 21:19 ` Michal Hocko
2015-12-29 16:27 ` Michal Hocko
2016-01-28 20:40 ` [PATCH 4/3] mm, oom: drop the last allocation attempt before out_of_memory Michal Hocko
2016-01-28 21:36 ` Johannes Weiner
2016-01-28 23:19 ` David Rientjes
2016-01-28 23:51 ` Johannes Weiner
2016-01-29 10:39 ` Tetsuo Handa
2016-01-29 15:32 ` Michal Hocko
2016-01-30 12:18 ` Tetsuo Handa
2016-01-29 15:23 ` Michal Hocko
2016-01-29 15:24 ` Michal Hocko
2016-01-28 21:19 ` [PATCH 5/3] mm, vmscan: make zone_reclaimable_pages more precise Michal Hocko
2016-01-28 23:20 ` David Rientjes
2016-01-29 3:41 ` Hillf Danton
2016-01-29 10:35 ` Tetsuo Handa
2016-01-29 15:17 ` Michal Hocko
2016-01-29 21:30 ` Tetsuo Handa
2016-02-03 13:27 ` [PATCH 0/3] OOM detection rework v4 Michal Hocko
2016-02-03 22:58 ` David Rientjes
2016-02-04 12:57 ` Michal Hocko
2016-02-04 13:10 ` Tetsuo Handa
2016-02-04 13:39 ` Michal Hocko
2016-02-04 14:24 ` Michal Hocko
2016-02-07 4:09 ` Tetsuo Handa
2016-02-15 20:06 ` Michal Hocko
2016-02-16 13:10 ` Tetsuo Handa
2016-02-16 15:19 ` Michal Hocko
2016-02-25 3:47 ` Hugh Dickins
2016-02-25 6:48 ` Sergey Senozhatsky
2016-02-25 9:17 ` Hillf Danton
2016-02-25 9:27 ` Michal Hocko
2016-02-25 9:48 ` Hillf Danton
2016-02-25 11:02 ` Sergey Senozhatsky
2016-02-25 9:23 ` Michal Hocko
2016-02-26 6:32 ` Hugh Dickins
2016-02-26 7:54 ` Hillf Danton
2016-02-26 9:24 ` Michal Hocko
2016-02-26 10:27 ` Hillf Danton
2016-02-26 13:49 ` Michal Hocko
2016-02-26 9:33 ` Michal Hocko
2016-02-29 21:02 ` Michal Hocko
2016-03-02 2:19 ` Joonsoo Kim
2016-03-02 9:50 ` Michal Hocko
2016-03-02 13:32 ` Joonsoo Kim
2016-03-02 14:06 ` Michal Hocko
2016-03-02 14:34 ` Joonsoo Kim
2016-03-03 9:26 ` Michal Hocko
2016-03-03 10:29 ` Tetsuo Handa
2016-03-03 14:10 ` Joonsoo Kim
2016-03-03 15:25 ` Michal Hocko
2016-03-04 5:23 ` Joonsoo Kim
2016-03-04 15:15 ` Michal Hocko
2016-03-04 17:39 ` Michal Hocko
2016-03-07 5:23 ` Joonsoo Kim
2016-03-03 15:50 ` Vlastimil Babka
2016-03-03 16:26 ` Michal Hocko
2016-03-04 7:10 ` Joonsoo Kim
2016-03-02 15:01 ` Minchan Kim
2016-03-07 16:08 ` [PATCH] mm, oom: protect !costly allocations some more (was: Re: [PATCH 0/3] OOM detection rework v4) Michal Hocko
2016-03-08 3:51 ` Sergey Senozhatsky
2016-03-08 9:08 ` Michal Hocko
2016-03-08 9:24 ` Sergey Senozhatsky
2016-03-08 9:24 ` [PATCH] mm, oom: protect !costly allocations some more Vlastimil Babka
2016-03-08 9:32 ` Sergey Senozhatsky
2016-03-08 9:46 ` Michal Hocko
2016-03-08 9:52 ` Vlastimil Babka
2016-03-08 10:10 ` Michal Hocko
2016-03-08 11:12 ` Vlastimil Babka
2016-03-08 12:22 ` Michal Hocko
2016-03-08 12:29 ` Vlastimil Babka
2016-03-08 9:58 ` [PATCH] mm, oom: protect !costly allocations some more (was: Re: [PATCH 0/3] OOM detection rework v4) Sergey Senozhatsky
2016-03-08 13:57 ` Michal Hocko
2016-03-08 10:36 ` Hugh Dickins
2016-03-08 13:42 ` [PATCH 0/2] oom rework: high order enahncements Michal Hocko
2016-03-08 13:42 ` [PATCH 1/3] mm, compaction: change COMPACT_ constants into enum Michal Hocko
2016-03-08 14:19 ` Vlastimil Babka
2016-03-09 3:55 ` Hillf Danton
2016-03-08 13:42 ` [PATCH 2/3] mm, compaction: cover all compaction mode in compact_zone Michal Hocko
2016-03-08 14:22 ` Vlastimil Babka
2016-03-09 3:57 ` Hillf Danton
2016-03-08 13:42 ` [PATCH 3/3] mm, oom: protect !costly allocations some more Michal Hocko
2016-03-08 14:34 ` Vlastimil Babka
2016-03-08 14:48 ` Michal Hocko
2016-03-08 15:03 ` Vlastimil Babka
2016-03-09 11:11 ` Michal Hocko
2016-03-09 14:07 ` Vlastimil Babka
2016-03-11 12:17 ` Hugh Dickins
2016-03-11 13:06 ` Michal Hocko
2016-03-11 19:08 ` Hugh Dickins
2016-03-14 16:21 ` Michal Hocko
2016-03-08 15:19 ` [PATCH] mm, oom: protect !costly allocations some more (was: Re: [PATCH 0/3] OOM detection rework v4) Joonsoo Kim
2016-03-08 16:05 ` Michal Hocko
2016-03-08 17:03 ` Joonsoo Kim
2016-03-09 10:41 ` Michal Hocko
2016-03-11 14:53 ` Joonsoo Kim
2016-03-11 15:20 ` Michal Hocko
2016-02-29 20:35 ` [PATCH 0/3] OOM detection rework v4 Michal Hocko
2016-03-01 7:29 ` Hugh Dickins
2016-03-01 13:38 ` Michal Hocko [this message]
2016-03-01 14:40 ` Michal Hocko
2016-03-01 18:14 ` Vlastimil Babka
2016-03-02 2:55 ` Joonsoo Kim
2016-03-02 12:37 ` Michal Hocko
2016-03-02 14:06 ` Joonsoo Kim
2016-03-02 12:24 ` Michal Hocko
2016-03-02 13:00 ` Michal Hocko
2016-03-02 13:22 ` Vlastimil Babka
2016-03-02 2:28 ` Joonsoo Kim
2016-03-02 12:39 ` Michal Hocko
2016-03-03 9:54 ` Hugh Dickins
2016-03-03 12:32 ` Michal Hocko
2016-03-03 20:57 ` Hugh Dickins
2016-03-04 7:41 ` Vlastimil Babka
2016-03-04 7:53 ` Joonsoo Kim
2016-03-04 12:28 ` Michal Hocko
2016-03-11 10:45 ` Tetsuo Handa
2016-03-11 13:08 ` Michal Hocko
2016-03-11 13:32 ` Tetsuo Handa
2016-03-11 15:28 ` Michal Hocko
2016-03-11 16:49 ` Tetsuo Handa
2016-03-11 17:00 ` Michal Hocko
2016-03-11 17:20 ` Tetsuo Handa
2016-03-12 4:08 ` Tetsuo Handa
2016-03-13 14:41 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160301133846.GF9461@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hillf.zj@alibaba-inc.com \
--cc=hughd@google.com \
--cc=js1304@gmail.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox