From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Cong Wang <xiyou.wangcong@gmail.com>,
Dave Hansen <dave.hansen@intel.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Mel Gorman <mgorman@suse.de>, Petr Mladek <pmladek@suse.com>,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Vlastimil Babka <vbabka@suse.cz>,
"yuwang.yuwang" <yuwang.yuwang@alibaba-inc.com>
Subject: Re: [PATCH] mm: don't warn about allocations which stall for too long
Date: Thu, 26 Oct 2017 13:41:00 +0200 [thread overview]
Message-ID: <20171026114100.tfb3xemvumg2a7su@dhcp22.suse.cz> (raw)
In-Reply-To: <1509017339-4802-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
On Thu 26-10-17 20:28:59, Tetsuo Handa wrote:
> Commit 63f53dea0c9866e9 ("mm: warn about allocations which stall for too
> long") was a great step for reducing possibility of silent hang up problem
> caused by memory allocation stalls. But this commit reverts it, for it is
> possible to trigger OOM lockup and/or soft lockups when many threads
> concurrently called warn_alloc() (in order to warn about memory allocation
> stalls) due to current implementation of printk(), and it is difficult to
> obtain useful information due to limitation of synchronous warning
> approach.
>
> Current printk() implementation flushes all pending logs using the context
> of a thread which called console_unlock(). printk() should be able to flush
> all pending logs eventually unless somebody continues appending to printk()
> buffer.
>
> Since warn_alloc() started appending to printk() buffer while waiting for
> oom_kill_process() to make forward progress when oom_kill_process() is
> processing pending logs, it became possible for warn_alloc() to force
> oom_kill_process() loop inside printk(). As a result, warn_alloc()
> significantly increased possibility of preventing oom_kill_process() from
> making forward progress.
>
> ---------- Pseudo code start ----------
> Before warn_alloc() was introduced:
>
> retry:
> if (mutex_trylock(&oom_lock)) {
> while (atomic_read(&printk_pending_logs) > 0) {
> atomic_dec(&printk_pending_logs);
> print_one_log();
> }
> // Send SIGKILL here.
> mutex_unlock(&oom_lock)
> }
> goto retry;
>
> After warn_alloc() was introduced:
>
> retry:
> if (mutex_trylock(&oom_lock)) {
> while (atomic_read(&printk_pending_logs) > 0) {
> atomic_dec(&printk_pending_logs);
> print_one_log();
> }
> // Send SIGKILL here.
> mutex_unlock(&oom_lock)
> } else if (waited_for_10seconds()) {
> atomic_inc(&printk_pending_logs);
> }
> goto retry;
> ---------- Pseudo code end ----------
>
> Although waited_for_10seconds() becomes true once per 10 seconds, unbounded
> number of threads can call waited_for_10seconds() at the same time. Also,
> since threads doing waited_for_10seconds() keep doing almost busy loop, the
> thread doing print_one_log() can use little CPU resource. Therefore, this
> situation can be simplified like
>
> ---------- Pseudo code start ----------
> retry:
> if (mutex_trylock(&oom_lock)) {
> while (atomic_read(&printk_pending_logs) > 0) {
> atomic_dec(&printk_pending_logs);
> print_one_log();
> }
> // Send SIGKILL here.
> mutex_unlock(&oom_lock)
> } else {
> atomic_inc(&printk_pending_logs);
> }
> goto retry;
> ---------- Pseudo code end ----------
>
> when printk() is called faster than print_one_log() can process a log.
>
> One of possible mitigation would be to introduce a new lock in order to
> make sure that no other series of printk() (either oom_kill_process() or
> warn_alloc()) can append to printk() buffer when one series of printk()
> (either oom_kill_process() or warn_alloc()) is already in progress. Such
> serialization will also help obtaining kernel messages in readable form.
>
> ---------- Pseudo code start ----------
> retry:
> if (mutex_trylock(&oom_lock)) {
> mutex_lock(&oom_printk_lock);
> while (atomic_read(&printk_pending_logs) > 0) {
> atomic_dec(&printk_pending_logs);
> print_one_log();
> }
> // Send SIGKILL here.
> mutex_unlock(&oom_printk_lock);
> mutex_unlock(&oom_lock)
> } else {
> if (mutex_trylock(&oom_printk_lock)) {
> atomic_inc(&printk_pending_logs);
> mutex_unlock(&oom_printk_lock);
> }
> }
> goto retry;
> ---------- Pseudo code end ----------
>
> But this commit does not go that direction, for we don't want to introduce
> a new lock dependency, and we unlikely be able to obtain useful information
> even if we serialized oom_kill_process() and warn_alloc().
>
> Synchronous approach is prone to unexpected results (e.g. too late [1], too
> frequent [2], overlooked [3]). As far as I know, warn_alloc() never helped
> with providing information other than "something is going wrong".
> I want to consider asynchronous approach which can obtain information
> during stalls with possibly relevant threads (e.g. the owner of oom_lock
> and kswapd-like threads) and serve as a trigger for actions (e.g. turn
> on/off tracepoints, ask libvirt daemon to take a memory dump of stalling
> KVM guest for diagnostic purpose).
>
> This commit temporarily looses ability to report e.g. OOM lockup due to
> unable to invoke the OOM killer due to !__GFP_FS allocation request.
> But asynchronous approach will be able to detect such situation and emit
> warning. Thus, let's remove warn_alloc().
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=192981
> [2] http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@mail.gmail.com
> [3] commit db73ee0d46379922 ("mm, vmscan: do not loop on too_many_isolated for ever"))
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
> Reported-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Petr Mladek <pmladek@suse.com>
The changelog is a bit excessive but it points out the real problem is
that printk is simply not ready for the sync warning which is good to
document and I hope that this will get addressed in a foreseeable future.
For the mean time it is simply better to remove the warning rather than
risk more issues.
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/page_alloc.c | 10 ----------
> 1 file changed, 10 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 97687b3..a4edfba 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3856,8 +3856,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> enum compact_result compact_result;
> int compaction_retries;
> int no_progress_loops;
> - unsigned long alloc_start = jiffies;
> - unsigned int stall_timeout = 10 * HZ;
> unsigned int cpuset_mems_cookie;
> int reserve_flags;
>
> @@ -3989,14 +3987,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> if (!can_direct_reclaim)
> goto nopage;
>
> - /* Make sure we know about allocations which stall for too long */
> - if (time_after(jiffies, alloc_start + stall_timeout)) {
> - warn_alloc(gfp_mask & ~__GFP_NOWARN, ac->nodemask,
> - "page allocation stalls for %ums, order:%u",
> - jiffies_to_msecs(jiffies-alloc_start), order);
> - stall_timeout += 10 * HZ;
> - }
> -
> /* Avoid recursion of direct reclaim */
> if (current->flags & PF_MEMALLOC)
> goto nopage;
> --
> 1.8.3.1
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-26 11:41 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-26 11:28 Tetsuo Handa
2017-10-26 11:41 ` Michal Hocko [this message]
2017-11-08 10:30 ` peter enderborg
2017-11-09 8:52 ` Michal Hocko
2017-11-09 9:34 ` peter enderborg
2017-11-09 10:09 ` Michal Hocko
2017-11-09 10:19 ` Tetsuo Handa
2017-10-26 14:37 ` Johannes Weiner
2017-10-31 19:32 ` Steven Rostedt
2017-11-01 8:30 ` Vlastimil Babka
2017-11-01 13:38 ` Petr Mladek
2017-11-01 15:36 ` Steven Rostedt
2017-11-02 11:46 ` Petr Mladek
2017-11-02 14:49 ` Steven Rostedt
2017-11-01 15:33 ` Steven Rostedt
2017-11-01 17:42 ` Vlastimil Babka
2017-11-01 17:54 ` Steven Rostedt
2017-11-02 8:53 ` Sergey Senozhatsky
2017-11-02 9:14 ` Sergey Senozhatsky
2017-11-02 14:55 ` Steven Rostedt
2017-11-02 12:55 ` Michal Hocko
2017-11-02 15:56 ` Steven Rostedt
2017-11-02 17:06 ` [PATCH v2] printk: Add console owner and waiter logic to load balance console writes Steven Rostedt
2017-11-02 17:10 ` Steven Rostedt
2017-11-02 17:38 ` Steven Rostedt
2017-11-03 10:19 ` Jan Kara
2017-11-03 11:18 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171026114100.tfb3xemvumg2a7su@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=pmladek@suse.com \
--cc=sergey.senozhatsky@gmail.com \
--cc=vbabka@suse.cz \
--cc=xiyou.wangcong@gmail.com \
--cc=yuwang.yuwang@alibaba-inc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox