linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Hugh Dickins <hughd@google.com>, Vlastimil Babka <vbabka@suse.cz>,
	Joonsoo Kim <js1304@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/3] OOM detection rework v4
Date: Tue, 1 Mar 2016 14:38:46 +0100	[thread overview]
Message-ID: <20160301133846.GF9461@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.LSU.2.11.1602292251170.7563@eggly.anvils>

[Adding Vlastimil and Joonsoo for compaction related things - this was a
large thread but the more interesting part starts with
http://lkml.kernel.org/r/alpine.LSU.2.11.1602241832160.15564@eggly.anvils]

On Mon 29-02-16 23:29:06, Hugh Dickins wrote:
> On Mon, 29 Feb 2016, Michal Hocko wrote:
> > On Wed 24-02-16 19:47:06, Hugh Dickins wrote:
> > [...]
> > > Boot with mem=1G (or boot your usual way, and do something to occupy
> > > most of the memory: I think /proc/sys/vm/nr_hugepages provides a great
> > > way to gobble up most of the memory, though it's not how I've done it).
> > > 
> > > Make sure you have swap: 2G is more than enough.  Copy the v4.5-rc5
> > > kernel source tree into a tmpfs: size=2G is more than enough.
> > > make defconfig there, then make -j20.
> > > 
> > > On a v4.5-rc5 kernel that builds fine, on mmotm it is soon OOM-killed.
> > > 
> > > Except that you'll probably need to fiddle around with that j20,
> > > it's true for my laptop but not for my workstation.  j20 just happens
> > > to be what I've had there for years, that I now see breaking down
> > > (I can lower to j6 to proceed, perhaps could go a bit higher,
> > > but it still doesn't exercise swap very much).
> > 
> > I have tried to reproduce and failed in a virtual on my laptop. I
> > will try with another host with more CPUs (because my laptop has only
> > two). Just for the record I did: boot 1G machine in kvm, I have 2G swap
> > and reserve 800M for hugetlb pages (I got 445 of them). Then I extract
> > the kernel source to tmpfs (-o size=2G), make defconfig and make -j20
> > (16, 10 no difference really). I was also collecting vmstat in the
> > background. The compilation takes ages but the behavior seems consistent
> > and stable.
> 
> Thanks a lot for giving it a go.
> 
> I'm puzzled.  445 hugetlb pages in 800M surprises me: some of them
> are less than 2M big??  But probably that's just a misunderstanding
> or typo somewhere.

A typo. 445 was from 900M test which I was doing while writing the
email. Sorry about the confusion.

> Ignoring that, you're successfully doing a make -20 defconfig build
> in tmpfs, with only 224M of RAM available, plus 2G of swap?  I'm not
> at all surprised that it takes ages, but I am very surprised that it
> does not OOM.  I suppose by rights it ought not to OOM, the built
> tree occupies only a little more than 1G, so you do have enough swap;
> but I wouldn't get anywhere near that myself without OOMing - I give
> myself 1G of RAM (well, minus whatever the booted system takes up)
> to do that build in, four times your RAM, yet in my case it OOMs.
>
> That source tree alone occupies more than 700M, so just copying it
> into your tmpfs would take a long time. 

OK, I just found out that I was cheating a bit. I was building
linux-3.7-rc5.tar.bz2 which is smaller:
$ du -sh /mnt/tmpfs/linux-3.7-rc5/
537M    /mnt/tmpfs/linux-3.7-rc5/

and after the defconfig build:
$ free
             total       used       free     shared    buffers     cached
Mem:       1008460     941904      66556          0       5092     806760
-/+ buffers/cache:     130052     878408
Swap:      2097148      42648    2054500
$ du -sh linux-3.7-rc5/
799M    linux-3.7-rc5/

Sorry about that but this is what my other tests were using and I forgot
to check. Now let's try the same with the current linus tree:
host $ git archive v4.5-rc6 --prefix=linux-4.5-rc6/ | bzip2 > linux-4.5-rc6.tar.bz2
$ du -sh /mnt/tmpfs/linux-4.5-rc6/
707M    /mnt/tmpfs/linux-4.5-rc6/
$ free
             total       used       free     shared    buffers     cached
Mem:       1008460     962976      45484          0       7236     820064
-/+ buffers/cache:     135676     872784
Swap:      2097148         16    2097132
$ time make -j20 > /dev/null
drivers/acpi/property.c: In function a??acpi_data_prop_reada??:
drivers/acpi/property.c:745:8: warning: a??obja?? may be used uninitialized in this function [-Wmaybe-uninitialized]

real    8m36.621s
user    14m1.642s
sys     2m45.238s

so I wasn't cheating all that much...

> I'd expect a build in 224M
> RAM plus 2G of swap to take so long, that I'd be very grateful to be
> OOM killed, even if there is technically enough space.  Unless
> perhaps it's some superfast swap that you have?

the swap partition is a standard qcow image stored on my SSD disk. So
I guess the IO should be quite fast. This smells like a potential
contributor because my reclaim seems to be much faster and that should
lead to a more efficient reclaim (in the scanned/reclaimed sense).
I realize I might be boring already when blaming compaction but let me
try again ;)
$ grep compact /proc/vmstat 
compact_migrate_scanned 113983
compact_free_scanned 1433503
compact_isolated 134307
compact_stall 128
compact_fail 26
compact_success 102
compact_kcompatd_wake 0

So the whole load has done the direct compaction only 128 times during
that test. This doesn't sound much to me
$ grep allocstall /proc/vmstat
allocstall 1061

we entered the direct reclaim much more but most of the load will be
order-0 so this might be still ok. So I've tried the following:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1993894b4219..107d444afdb1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2910,6 +2910,9 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 						mode, contended_compaction);
 	current->flags &= ~PF_MEMALLOC;
 
+	if (order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER)
+		trace_printk("order:%d gfp_mask:%pGg compact_result:%lu\n", order, &gfp_mask, compact_result);
+
 	switch (compact_result) {
 	case COMPACT_DEFERRED:
 		*deferred_compaction = true;

And the result was:
$ cat /debug/tracing/trace_pipe | tee ~/trace.log
             gcc-8707  [001] ....   137.946370: __alloc_pages_direct_compact: order:2 gfp_mask:GFP_KERNEL_ACCOUNT|__GFP_NOTRACK compact_result:1
             gcc-8726  [000] ....   138.528571: __alloc_pages_direct_compact: order:2 gfp_mask:GFP_KERNEL_ACCOUNT|__GFP_NOTRACK compact_result:1

this shows that order-2 memory pressure is not overly high in my
setup. Both attempts ended up COMPACT_SKIPPED which is interesting.

So I went back to 800M of hugetlb pages and tried again. It took ages
so I have interrupted that after one hour (there was still no OOM). The
trace log is quite interesting regardless:
$ wc -l ~/trace.log
371 /root/trace.log

$ grep compact_stall /proc/vmstat 
compact_stall 190

so the compaction was still ignored more than actually invoked for
!costly allocations:
sed 's@.*order:\([[:digit:]]\).* compact_result:\([[:digit:]]\)@\1 \2@' ~/trace.log | sort | uniq -c 
    190 2 1
    122 2 3
     59 2 4

#define COMPACT_SKIPPED         1               
#define COMPACT_PARTIAL         3
#define COMPACT_COMPLETE        4

that means that compaction is even not tried in half cases! This
doesn't sounds right to me, especially when we are talking about
<= PAGE_ALLOC_COSTLY_ORDER requests which are implicitly nofail, because
then we simply rely on the order-0 reclaim to automagically form higher
blocks. This might indeed work when we retry many times but I guess this
is not a good approach. It leads to a excessive reclaim and the stall
for allocation can be really large.

One of the suspicious places is __compaction_suitable which does order-0
watermark check (increased by 2<<order). I have put another trace_printk
there and it clearly pointed out this was the case.

So I have tried the following:
diff --git a/mm/compaction.c b/mm/compaction.c
index 4d99e1f5055c..7364e48cf69a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1276,6 +1276,9 @@ static unsigned long __compaction_suitable(struct zone *zone, int order,
 								alloc_flags))
 		return COMPACT_PARTIAL;
 
+	if (order <= PAGE_ALLOC_COSTLY_ORDER)
+		return COMPACT_CONTINUE;
+
 	/*
 	 * Watermarks for order-0 must be met for compaction. Note the 2UL.
 	 * This is because during migration, copies of pages need to be

and retried the same test (without huge pages):
$ time make -j20 > /dev/null

real    8m46.626s
user    14m15.823s
sys     2m45.471s

the time increased but I haven't checked how stable the result is. 

$ grep compact /proc/vmstat
compact_migrate_scanned 139822
compact_free_scanned 1661642
compact_isolated 139407
compact_stall 129
compact_fail 58
compact_success 71
compact_kcompatd_wake 1

$ grep allocstall /proc/vmstat
allocstall 1665

this is worse because we have scanned more pages for migration but the
overall success rate was much smaller and the direct reclaim was invoked
more. I do not have a good theory for that and will play with this some
more. Maybe other changes are needed deeper in the compaction code.

I will play with this some more but I would be really interested to hear
whether this helped Hugh with his setup. Vlastimi, Joonsoo does this
even make sense to you?

> I was only suggesting to allocate hugetlb pages, if you preferred
> not to reboot with artificially reduced RAM.  Not an issue if you're
> booting VMs.

Ohh, I see.
 
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-03-01 13:38 UTC|newest]

Thread overview: 152+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-15 18:19 Michal Hocko
2015-12-15 18:19 ` [PATCH 1/3] mm, oom: rework oom detection Michal Hocko
2016-01-14 22:58   ` David Rientjes
2016-01-16  1:07     ` Tetsuo Handa
2016-01-19 22:48       ` David Rientjes
2016-01-20 11:13         ` Tetsuo Handa
2016-01-20 13:13           ` Michal Hocko
2016-04-04  8:23   ` Vladimir Davydov
2016-04-04  9:42     ` Michal Hocko
2015-12-15 18:19 ` [PATCH 2/3] mm: throttle on IO only when there are too many dirty and writeback pages Michal Hocko
2016-03-17 11:35   ` Tetsuo Handa
2016-03-17 12:01     ` Michal Hocko
2015-12-15 18:19 ` [PATCH 3/3] mm: use watermak checks for __GFP_REPEAT high order allocations Michal Hocko
2015-12-16 23:35 ` [PATCH 0/3] OOM detection rework v4 Andrew Morton
2015-12-18 12:12   ` Michal Hocko
2015-12-16 23:58 ` Andrew Morton
2015-12-18 13:15   ` Michal Hocko
2015-12-18 16:35     ` Johannes Weiner
2015-12-24 12:41 ` Tetsuo Handa
2015-12-28 12:08   ` Tetsuo Handa
2015-12-28 14:13     ` Tetsuo Handa
2016-01-06 12:44       ` Vlastimil Babka
2016-01-08 12:37       ` Michal Hocko
2015-12-29 16:32     ` Michal Hocko
2015-12-30 15:05       ` Tetsuo Handa
2016-01-02 15:47         ` Tetsuo Handa
2016-01-20 12:24           ` Michal Hocko
2016-01-27 23:18             ` David Rientjes
2016-01-28 21:19               ` Michal Hocko
2015-12-29 16:27   ` Michal Hocko
2016-01-28 20:40 ` [PATCH 4/3] mm, oom: drop the last allocation attempt before out_of_memory Michal Hocko
2016-01-28 21:36   ` Johannes Weiner
2016-01-28 23:19     ` David Rientjes
2016-01-28 23:51       ` Johannes Weiner
2016-01-29 10:39         ` Tetsuo Handa
2016-01-29 15:32         ` Michal Hocko
2016-01-30 12:18           ` Tetsuo Handa
2016-01-29 15:23       ` Michal Hocko
2016-01-29 15:24     ` Michal Hocko
2016-01-28 21:19 ` [PATCH 5/3] mm, vmscan: make zone_reclaimable_pages more precise Michal Hocko
2016-01-28 23:20   ` David Rientjes
2016-01-29  3:41   ` Hillf Danton
2016-01-29 10:35   ` Tetsuo Handa
2016-01-29 15:17     ` Michal Hocko
2016-01-29 21:30       ` Tetsuo Handa
2016-02-03 13:27 ` [PATCH 0/3] OOM detection rework v4 Michal Hocko
2016-02-03 22:58   ` David Rientjes
2016-02-04 12:57     ` Michal Hocko
2016-02-04 13:10       ` Tetsuo Handa
2016-02-04 13:39         ` Michal Hocko
2016-02-04 14:24           ` Michal Hocko
2016-02-07  4:09           ` Tetsuo Handa
2016-02-15 20:06             ` Michal Hocko
2016-02-16 13:10               ` Tetsuo Handa
2016-02-16 15:19                 ` Michal Hocko
2016-02-25  3:47   ` Hugh Dickins
2016-02-25  6:48     ` Sergey Senozhatsky
2016-02-25  9:17       ` Hillf Danton
2016-02-25  9:27         ` Michal Hocko
2016-02-25  9:48           ` Hillf Danton
2016-02-25 11:02             ` Sergey Senozhatsky
2016-02-25  9:23     ` Michal Hocko
2016-02-26  6:32       ` Hugh Dickins
2016-02-26  7:54         ` Hillf Danton
2016-02-26  9:24           ` Michal Hocko
2016-02-26 10:27             ` Hillf Danton
2016-02-26 13:49               ` Michal Hocko
2016-02-26  9:33         ` Michal Hocko
2016-02-29 21:02       ` Michal Hocko
2016-03-02  2:19         ` Joonsoo Kim
2016-03-02  9:50           ` Michal Hocko
2016-03-02 13:32             ` Joonsoo Kim
2016-03-02 14:06               ` Michal Hocko
2016-03-02 14:34                 ` Joonsoo Kim
2016-03-03  9:26                   ` Michal Hocko
2016-03-03 10:29                     ` Tetsuo Handa
2016-03-03 14:10                     ` Joonsoo Kim
2016-03-03 15:25                       ` Michal Hocko
2016-03-04  5:23                         ` Joonsoo Kim
2016-03-04 15:15                           ` Michal Hocko
2016-03-04 17:39                             ` Michal Hocko
2016-03-07  5:23                             ` Joonsoo Kim
2016-03-03 15:50                       ` Vlastimil Babka
2016-03-03 16:26                         ` Michal Hocko
2016-03-04  7:10                         ` Joonsoo Kim
2016-03-02 15:01             ` Minchan Kim
2016-03-07 16:08         ` [PATCH] mm, oom: protect !costly allocations some more (was: Re: [PATCH 0/3] OOM detection rework v4) Michal Hocko
2016-03-08  3:51           ` Sergey Senozhatsky
2016-03-08  9:08             ` Michal Hocko
2016-03-08  9:24               ` Sergey Senozhatsky
2016-03-08  9:24           ` [PATCH] mm, oom: protect !costly allocations some more Vlastimil Babka
2016-03-08  9:32             ` Sergey Senozhatsky
2016-03-08  9:46             ` Michal Hocko
2016-03-08  9:52               ` Vlastimil Babka
2016-03-08 10:10                 ` Michal Hocko
2016-03-08 11:12                   ` Vlastimil Babka
2016-03-08 12:22                     ` Michal Hocko
2016-03-08 12:29                       ` Vlastimil Babka
2016-03-08  9:58           ` [PATCH] mm, oom: protect !costly allocations some more (was: Re: [PATCH 0/3] OOM detection rework v4) Sergey Senozhatsky
2016-03-08 13:57             ` Michal Hocko
2016-03-08 10:36           ` Hugh Dickins
2016-03-08 13:42           ` [PATCH 0/2] oom rework: high order enahncements Michal Hocko
2016-03-08 13:42             ` [PATCH 1/3] mm, compaction: change COMPACT_ constants into enum Michal Hocko
2016-03-08 14:19               ` Vlastimil Babka
2016-03-09  3:55               ` Hillf Danton
2016-03-08 13:42             ` [PATCH 2/3] mm, compaction: cover all compaction mode in compact_zone Michal Hocko
2016-03-08 14:22               ` Vlastimil Babka
2016-03-09  3:57               ` Hillf Danton
2016-03-08 13:42             ` [PATCH 3/3] mm, oom: protect !costly allocations some more Michal Hocko
2016-03-08 14:34               ` Vlastimil Babka
2016-03-08 14:48                 ` Michal Hocko
2016-03-08 15:03                   ` Vlastimil Babka
2016-03-09 11:11               ` Michal Hocko
2016-03-09 14:07                 ` Vlastimil Babka
2016-03-11 12:17                 ` Hugh Dickins
2016-03-11 13:06                   ` Michal Hocko
2016-03-11 19:08                     ` Hugh Dickins
2016-03-14 16:21                       ` Michal Hocko
2016-03-08 15:19           ` [PATCH] mm, oom: protect !costly allocations some more (was: Re: [PATCH 0/3] OOM detection rework v4) Joonsoo Kim
2016-03-08 16:05             ` Michal Hocko
2016-03-08 17:03               ` Joonsoo Kim
2016-03-09 10:41                 ` Michal Hocko
2016-03-11 14:53                   ` Joonsoo Kim
2016-03-11 15:20                     ` Michal Hocko
2016-02-29 20:35     ` [PATCH 0/3] OOM detection rework v4 Michal Hocko
2016-03-01  7:29       ` Hugh Dickins
2016-03-01 13:38         ` Michal Hocko [this message]
2016-03-01 14:40           ` Michal Hocko
2016-03-01 18:14           ` Vlastimil Babka
2016-03-02  2:55             ` Joonsoo Kim
2016-03-02 12:37               ` Michal Hocko
2016-03-02 14:06                 ` Joonsoo Kim
2016-03-02 12:24             ` Michal Hocko
2016-03-02 13:00               ` Michal Hocko
2016-03-02 13:22               ` Vlastimil Babka
2016-03-02  2:28           ` Joonsoo Kim
2016-03-02 12:39             ` Michal Hocko
2016-03-03  9:54           ` Hugh Dickins
2016-03-03 12:32             ` Michal Hocko
2016-03-03 20:57               ` Hugh Dickins
2016-03-04  7:41                 ` Vlastimil Babka
2016-03-04  7:53             ` Joonsoo Kim
2016-03-04 12:28             ` Michal Hocko
2016-03-11 10:45 ` Tetsuo Handa
2016-03-11 13:08   ` Michal Hocko
2016-03-11 13:32     ` Tetsuo Handa
2016-03-11 15:28       ` Michal Hocko
2016-03-11 16:49         ` Tetsuo Handa
2016-03-11 17:00           ` Michal Hocko
2016-03-11 17:20             ` Tetsuo Handa
2016-03-12  4:08               ` Tetsuo Handa
2016-03-13 14:41                 ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160301133846.GF9461@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=hughd@google.com \
    --cc=js1304@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox