On Tue 01-03-16 19:14:08, Vlastimil Babka wrote: > On 03/01/2016 02:38 PM, Michal Hocko wrote: [...] > >that means that compaction is even not tried in half cases! This > >doesn't sounds right to me, especially when we are talking about > ><= PAGE_ALLOC_COSTLY_ORDER requests which are implicitly nofail, because > >then we simply rely on the order-0 reclaim to automagically form higher > >blocks. This might indeed work when we retry many times but I guess this > >is not a good approach. It leads to a excessive reclaim and the stall > >for allocation can be really large. > > > >One of the suspicious places is __compaction_suitable which does order-0 > >watermark check (increased by 2< >there and it clearly pointed out this was the case. > > Yes, compaction is historically quite careful to avoid making low memory > conditions worse, and to prevent work if it doesn't look like it can > ultimately succeed the allocation (so having not enough base pages means > that compacting them is considered pointless). The compaction is running in PF_MEMALLOC context so it shouldn't fail the allocation. Moreover the additional memory is only temporal until the migration finishes. Or am I missing something? > This aspect of preventing non-zero-order OOMs is somewhat unexpected > :) I hope we can do something about it then... [...] > >this is worse because we have scanned more pages for migration but the > >overall success rate was much smaller and the direct reclaim was invoked > >more. I do not have a good theory for that and will play with this some > >more. Maybe other changes are needed deeper in the compaction code. > > I was under impression that similar checks to compaction_suitable() were > done also in compact_finished(), to stop compacting if memory got low due to > parallel activity. But I guess it was a patch from Joonsoo that didn't get > merged. > > My only other theory so far is that watermark checks fail in > __isolate_free_page() when we want to grab page(s) as migration targets. yes this certainly contributes to the problem and triggered in my case a lot: $ grep __isolate_free_page trace.log | wc -l 181 $ grep __alloc_pages_direct_compact: trace.log | wc -l 7 > I would suggest enabling all compaction tracepoint and the migration > tracepoint. Looking at the trace could hopefully help faster than > going one trace_printk() per attempt. OK, here we go with both watermarks checks removed and hopefully all the compaction related tracepoints enabled: echo 1 > /debug/tracing/events/compaction/enable echo 1 > /debug/tracing/events/migrate/mm_migrate_pages/enable this was without the hugetlb handicap. See the trace log and vmstat after the run attached. Thanks -- Michal Hocko SUSE Labs