On Tue 01-03-16 19:14:08, Vlastimil Babka wrote:
> On 03/01/2016 02:38 PM, Michal Hocko wrote:
[...]
> >that means that compaction is even not tried in half cases! This
> >doesn't sounds right to me, especially when we are talking about
> ><= PAGE_ALLOC_COSTLY_ORDER requests which are implicitly nofail, because
> >then we simply rely on the order-0 reclaim to automagically form higher
> >blocks. This might indeed work when we retry many times but I guess this
> >is not a good approach. It leads to a excessive reclaim and the stall
> >for allocation can be really large.
> >
> >One of the suspicious places is __compaction_suitable which does order-0
> >watermark check (increased by 2<<order). I have put another trace_printk
> >there and it clearly pointed out this was the case.
> 
> Yes, compaction is historically quite careful to avoid making low memory
> conditions worse, and to prevent work if it doesn't look like it can
> ultimately succeed the allocation (so having not enough base pages means
> that compacting them is considered pointless).

The compaction is running in PF_MEMALLOC context so it shouldn't fail
the allocation. Moreover the additional memory is only temporal until
the migration finishes. Or am I missing something?

> This aspect of preventing non-zero-order OOMs is somewhat unexpected
> :)

I hope we can do something about it then...
 
[...]
> >this is worse because we have scanned more pages for migration but the
> >overall success rate was much smaller and the direct reclaim was invoked
> >more. I do not have a good theory for that and will play with this some
> >more. Maybe other changes are needed deeper in the compaction code.
> 
> I was under impression that similar checks to compaction_suitable() were
> done also in compact_finished(), to stop compacting if memory got low due to
> parallel activity. But I guess it was a patch from Joonsoo that didn't get
> merged.
> 
> My only other theory so far is that watermark checks fail in
> __isolate_free_page() when we want to grab page(s) as migration targets.

yes this certainly contributes to the problem and triggered in my case a
lot:
$ grep __isolate_free_page trace.log | wc -l
181
$ grep __alloc_pages_direct_compact: trace.log | wc -l
7

> I would suggest enabling all compaction tracepoint and the migration
> tracepoint. Looking at the trace could hopefully help faster than
> going one trace_printk() per attempt.

OK, here we go with both watermarks checks removed and hopefully all the
compaction related tracepoints enabled:
echo 1 > /debug/tracing/events/compaction/enable
echo 1 > /debug/tracing/events/migrate/mm_migrate_pages/enable

this was without the hugetlb handicap. See the trace log and vmstat
after the run attached.

Thanks
-- 
Michal Hocko
SUSE Labs