linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	"yuwang.yuwang" <yuwang.yuwang@alibaba-inc.com>
Subject: Re: [PATCH] mm: don't warn about allocations which stall for too long
Date: Thu, 26 Oct 2017 13:41:00 +0200	[thread overview]
Message-ID: <20171026114100.tfb3xemvumg2a7su@dhcp22.suse.cz> (raw)
In-Reply-To: <1509017339-4802-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>

On Thu 26-10-17 20:28:59, Tetsuo Handa wrote:
> Commit 63f53dea0c9866e9 ("mm: warn about allocations which stall for too
> long") was a great step for reducing possibility of silent hang up problem
> caused by memory allocation stalls. But this commit reverts it, for it is
> possible to trigger OOM lockup and/or soft lockups when many threads
> concurrently called warn_alloc() (in order to warn about memory allocation
> stalls) due to current implementation of printk(), and it is difficult to
> obtain useful information due to limitation of synchronous warning
> approach.
> 
> Current printk() implementation flushes all pending logs using the context
> of a thread which called console_unlock(). printk() should be able to flush
> all pending logs eventually unless somebody continues appending to printk()
> buffer.
> 
> Since warn_alloc() started appending to printk() buffer while waiting for
> oom_kill_process() to make forward progress when oom_kill_process() is
> processing pending logs, it became possible for warn_alloc() to force
> oom_kill_process() loop inside printk(). As a result, warn_alloc()
> significantly increased possibility of preventing oom_kill_process() from
> making forward progress.
> 
> ---------- Pseudo code start ----------
> Before warn_alloc() was introduced:
> 
>   retry:
>     if (mutex_trylock(&oom_lock)) {
>       while (atomic_read(&printk_pending_logs) > 0) {
>         atomic_dec(&printk_pending_logs);
>         print_one_log();
>       }
>       // Send SIGKILL here.
>       mutex_unlock(&oom_lock)
>     }
>     goto retry;
> 
> After warn_alloc() was introduced:
> 
>   retry:
>     if (mutex_trylock(&oom_lock)) {
>       while (atomic_read(&printk_pending_logs) > 0) {
>         atomic_dec(&printk_pending_logs);
>         print_one_log();
>       }
>       // Send SIGKILL here.
>       mutex_unlock(&oom_lock)
>     } else if (waited_for_10seconds()) {
>       atomic_inc(&printk_pending_logs);
>     }
>     goto retry;
> ---------- Pseudo code end ----------
> 
> Although waited_for_10seconds() becomes true once per 10 seconds, unbounded
> number of threads can call waited_for_10seconds() at the same time. Also,
> since threads doing waited_for_10seconds() keep doing almost busy loop, the
> thread doing print_one_log() can use little CPU resource. Therefore, this
> situation can be simplified like
> 
> ---------- Pseudo code start ----------
>   retry:
>     if (mutex_trylock(&oom_lock)) {
>       while (atomic_read(&printk_pending_logs) > 0) {
>         atomic_dec(&printk_pending_logs);
>         print_one_log();
>       }
>       // Send SIGKILL here.
>       mutex_unlock(&oom_lock)
>     } else {
>       atomic_inc(&printk_pending_logs);
>     }
>     goto retry;
> ---------- Pseudo code end ----------
> 
> when printk() is called faster than print_one_log() can process a log.
> 
> One of possible mitigation would be to introduce a new lock in order to
> make sure that no other series of printk() (either oom_kill_process() or
> warn_alloc()) can append to printk() buffer when one series of printk()
> (either oom_kill_process() or warn_alloc()) is already in progress. Such
> serialization will also help obtaining kernel messages in readable form.
> 
> ---------- Pseudo code start ----------
>   retry:
>     if (mutex_trylock(&oom_lock)) {
>       mutex_lock(&oom_printk_lock);
>       while (atomic_read(&printk_pending_logs) > 0) {
>         atomic_dec(&printk_pending_logs);
>         print_one_log();
>       }
>       // Send SIGKILL here.
>       mutex_unlock(&oom_printk_lock);
>       mutex_unlock(&oom_lock)
>     } else {
>       if (mutex_trylock(&oom_printk_lock)) {
>         atomic_inc(&printk_pending_logs);
>         mutex_unlock(&oom_printk_lock);
>       }
>     }
>     goto retry;
> ---------- Pseudo code end ----------
> 
> But this commit does not go that direction, for we don't want to introduce
> a new lock dependency, and we unlikely be able to obtain useful information
> even if we serialized oom_kill_process() and warn_alloc().
> 
> Synchronous approach is prone to unexpected results (e.g. too late [1], too
> frequent [2], overlooked [3]). As far as I know, warn_alloc() never helped
> with providing information other than "something is going wrong".
> I want to consider asynchronous approach which can obtain information
> during stalls with possibly relevant threads (e.g. the owner of oom_lock
> and kswapd-like threads) and serve as a trigger for actions (e.g. turn
> on/off tracepoints, ask libvirt daemon to take a memory dump of stalling
> KVM guest for diagnostic purpose).
> 
> This commit temporarily looses ability to report e.g. OOM lockup due to
> unable to invoke the OOM killer due to !__GFP_FS allocation request.
> But asynchronous approach will be able to detect such situation and emit
> warning. Thus, let's remove warn_alloc().
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=192981
> [2] http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@mail.gmail.com
> [3] commit db73ee0d46379922 ("mm, vmscan: do not loop on too_many_isolated for ever"))
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
> Reported-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Petr Mladek <pmladek@suse.com>

The changelog is a bit excessive but it points out the real problem is
that printk is simply not ready for the sync warning which is good to
document and I hope that this will get addressed in a foreseeable future.
For the mean time it is simply better to remove the warning rather than
risk more issues.

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/page_alloc.c | 10 ----------
>  1 file changed, 10 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 97687b3..a4edfba 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3856,8 +3856,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
>  	enum compact_result compact_result;
>  	int compaction_retries;
>  	int no_progress_loops;
> -	unsigned long alloc_start = jiffies;
> -	unsigned int stall_timeout = 10 * HZ;
>  	unsigned int cpuset_mems_cookie;
>  	int reserve_flags;
>  
> @@ -3989,14 +3987,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
>  	if (!can_direct_reclaim)
>  		goto nopage;
>  
> -	/* Make sure we know about allocations which stall for too long */
> -	if (time_after(jiffies, alloc_start + stall_timeout)) {
> -		warn_alloc(gfp_mask & ~__GFP_NOWARN, ac->nodemask,
> -			"page allocation stalls for %ums, order:%u",
> -			jiffies_to_msecs(jiffies-alloc_start), order);
> -		stall_timeout += 10 * HZ;
> -	}
> -
>  	/* Avoid recursion of direct reclaim */
>  	if (current->flags & PF_MEMALLOC)
>  		goto nopage;
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-10-26 11:41 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-26 11:28 Tetsuo Handa
2017-10-26 11:41 ` Michal Hocko [this message]
2017-11-08 10:30   ` peter enderborg
2017-11-09  8:52     ` Michal Hocko
2017-11-09  9:34       ` peter enderborg
2017-11-09 10:09         ` Michal Hocko
2017-11-09 10:19           ` Tetsuo Handa
2017-10-26 14:37 ` Johannes Weiner
2017-10-31 19:32 ` Steven Rostedt
2017-11-01  8:30   ` Vlastimil Babka
2017-11-01 13:38     ` Petr Mladek
2017-11-01 15:36       ` Steven Rostedt
2017-11-02 11:46         ` Petr Mladek
2017-11-02 14:49           ` Steven Rostedt
2017-11-01 15:33     ` Steven Rostedt
2017-11-01 17:42       ` Vlastimil Babka
2017-11-01 17:54         ` Steven Rostedt
2017-11-02  8:53   ` Sergey Senozhatsky
2017-11-02  9:14     ` Sergey Senozhatsky
2017-11-02 14:55     ` Steven Rostedt
2017-11-02 12:55   ` Michal Hocko
2017-11-02 15:56 ` Steven Rostedt
2017-11-02 17:06   ` [PATCH v2] printk: Add console owner and waiter logic to load balance console writes Steven Rostedt
2017-11-02 17:10     ` Steven Rostedt
2017-11-02 17:38     ` Steven Rostedt
2017-11-03 10:19     ` Jan Kara
2017-11-03 11:18       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171026114100.tfb3xemvumg2a7su@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=pmladek@suse.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yuwang.yuwang@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox