From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199])
	by kanga.kvack.org (Postfix) with ESMTP id 5101B6B0033
	for <linux-mm@kvack.org>; Thu,  2 Nov 2017 11:06:04 -0400 (EDT)
Received: by mail-wr0-f199.google.com with SMTP id v105so3214377wrc.11
        for <linux-mm@kvack.org>; Thu, 02 Nov 2017 08:06:04 -0700 (PDT)
Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15])
        by mx.google.com with ESMTPS id i92si3353078edc.472.2017.11.02.08.06.02
        for <linux-mm@kvack.org>
        (version=TLS1 cipher=AES128-SHA bits=128/128);
        Thu, 02 Nov 2017 08:06:02 -0700 (PDT)
Date: Thu, 2 Nov 2017 16:06:00 +0100
From: Michal Hocko <mhocko@kernel.org>
Subject: Re: swapper/0: page allocation failure: order:0,
 mode:0x1204010(GFP_NOWAIT|__GFP_COMP|__GFP_RECLAIMABLE|__GFP_NOTRACK),
 nodemask=(null)
Message-ID: <20171102150600.yyy2amjaeu2mu2u4@dhcp22.suse.cz>
References: <20171019035641.GB23773@intel.com>
 <CABXGCsPL0pUHo_M-KxB3mabfdGMSHPC0uchLBBt0JCzF2BYBww@mail.gmail.com>
 <20171020064305.GA13688@intel.com>
 <20171020091239.cfwapdkx5g7afyp7@dhcp22.suse.cz>
 <CABXGCsMZ0hFFJyPU2cu+JDLHZ+5eO5i=8FOv71biwpY5neyofA@mail.gmail.com>
 <20171024200639.2pyxkw2cucwxrtlb@dhcp22.suse.cz>
 <CABXGCsPukABMx40dGz7NSjKsWVsz_USFFeHdEY-ZMdgRLCfuwQ@mail.gmail.com>
 <CABXGCsMVsn44xHH6SZxb6jrKv4S_GQFSqHNddAyDKOqNEpP6Ow@mail.gmail.com>
 <a6eab5f2-7ce5-d4fc-5524-0f6b3449742d@I-love.SAKURA.ne.jp>
 <20171102150120.fb5qgrvmebbup64g@dhcp22.suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20171102150120.fb5qgrvmebbup64g@dhcp22.suse.cz>
Sender: owner-linux-mm@kvack.org
List-ID: <linux-mm.kvack.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: =?utf-8?B?0JzQuNGF0LDQuNC7INCT0LDQstGA0LjQu9C+0LI=?= <mikhail.v.gavrilov@gmail.com>, "Du, Changbin" <changbin.du@intel.com>, linux-mm@kvack.org

On Thu 02-11-17 16:01:20, Michal Hocko wrote:
> On Thu 02-11-17 22:15:06, Tetsuo Handa wrote:
> > I was waiting for Michal's comment, but it seems that he is too busy now.
> > Thus, I post non-authoritative comment here. (I'm not a tracepoints user.)
> 
> yes, that is the case. Thanks for looking into this Tetsuo.
> 
> > Two stalls were found in dmesg but only PID = 2798 part was recorded in the trace logs.
> > 
> >   [ 6109.502115] chrome: page allocation stalls for 10321ms, order:0, mode:0x14000d2(GFP_TEMPORARY|__GFP_HIGHMEM), nodemask=(null)
> >   [ 6109.502179] chrome cpuset=/ mems_allowed=0
> >   [ 6109.502570] CPU: 0 PID: 2798 Comm: chrome Not tainted 4.13.9-300.fc27.x86_64+debug #1
> > 
> 
> I have only glanced through the trace data.
> 
> > So, trying to analyze this one. 
> > 
> > Since 10 seconds of blank was found between mm_shrink_slab_start and
> > mm_shrink_slab_end, this alone can cause stall warning messages.
> > 
> >   # tracer: nop
> >   #
> >   #                              _-----=> irqs-off
> >   #                             / _----=> need-resched
> >   #                            | / _---=> hardirq/softirq
> >   #                            || / _--=> preempt-depth
> >   #                            ||| /     delay
> >   #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> >   #              | |       |   ||||       |         |
> >             chrome-2798  [000] .N.1  6099.188540: mm_shrink_slab_start: super_cache_scan+0x0/0x1b0 ffff8eefa4651830: nid: 0 objects to shrink 5895 gfp_flags GFP_TEMPORARY|__GFP_HIGHMEM pgs_scanned 90 lru_pgs 6992959 cache items 5049 delta 0 total_scan 2524
> >             chrome-2798  [000] ...1  6109.494205: mm_shrink_slab_end: super_cache_scan+0x0/0x1b0 ffff8eefa4651830: nid: 0 unused scan count 5895 new scan count 941785 total_scan 476 last shrinker return val 1959
> 
> Yeah, the direct reclaim has started
> chrome-2798  [000] ...1  6099.187991: mm_vmscan_direct_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_TEMPORARY|__GFP_HIGHMEM classzone_idx=2
> and finished
> chrome-2798  [000] ...1  6109.509445: mm_vmscan_direct_reclaim_end: nr_reclaimed=51
> 
> the only notable hole in logging was the one pointed by Tetsuo. There is
> a lot of activity on that CPU during that time wrt. reclaim
> $ grep -v '\-2798' trace.txt | grep '\[000\]' | awk '{val=$4+0; if (val > 6099 && val < 6109) print}' | wc -l
> 744
> 
> And there were more processes involved
> $ grep -v '\-2798' trace.txt | grep '\[000\]' | awk '{val=$4+0; if (val > 6099 && val < 6109) print $1}' | sort | uniq -c
>      74 <...>-10654
>      43 <...>-13862
>      82 <...>-17624
>       2 <...>-27318
>       1 <...>-3518
>      37 <...>-5331
>     180 <...>-6602
>      38 chrome-3482
>      40 Chrome_IOThread-2773
>       3 DedicatedWorker-19604
>       1 gmain-10668
>     139 qemu-system-x86-13763
>     104 TaskSchedulerBa-6011
> 
> So I agree that it looks like your system seems to be overloaded.

That being said, though, it makes sense to double check whether we
cond_resched sufficiently because being stuck for 10s while a handful
number of processes is reclaiming sounds still too much. Maybe there is
more than just the reclaim going on. This would require a deeper
inspection of the trace data and maybe even gather other tracepoints.
I am sorry but I cannot promis I will find time to look at this anytime
soon.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>