* [PATCH] mm,page_alloc: Serialize out_of_memory() and allocation stall messages.
@ 2017-10-19 10:51 Tetsuo Handa
2017-10-19 11:44 ` Michal Hocko
0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2017-10-19 10:51 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, Tetsuo Handa, Cong Wang, Johannes Weiner,
Michal Hocko, Petr Mladek, Sergey Senozhatsky, yuwang.yuwang
The printk() flooding problem caused by concurrent warn_alloc() calls was
already pointed out by me, and there are reports of soft lockups caused by
warn_alloc(). But this problem is left unhandled because Michal does not
like serialization from allocation path because he is worrying about
unexpected side effects and is asking to identify the root cause of soft
lockups and fix it. But at least consuming CPU resource by not serializing
concurrent printk() plays some role in the soft lockups, for currently
printk() can consume CPU resource forever as long as somebody is appending
to printk() buffer, and writing to consoles also needs CPU resource. That
is, needlessly consuming CPU resource when calling printk() has unexpected
side effects.
Although a proposal for offloading writing to consoles to a dedicated
kernel thread is in progress, it is not yet accepted. And, even after
the proposal is accepted, writing to printk() buffer faster than the
kernel thread can write to consoles will result in loss of messages.
We should refrain from "appending to printk() buffer" and "consuming CPU
resource" at the same time if possible. We should try to (and we can)
avoid appending to printk() buffer when printk() is concurrently called
for reporting the OOM killer and allocation stalls, in order to reduce
possibility of hitting soft lockups and getting unreadably-jumbled
messages.
Although avoid mixing both memory allocation stall/failure messages and
the OOM killer messages would be nice, oom_lock mutex should not be used
for this purpose, for waiting for oom_lock mutex at warn_alloc() can
prevent anybody from calling out_of_memory() from __alloc_pages_may_oom()
because currently __alloc_pages_may_oom() does not wait for oom_lock
(i.e. causes OOM lockups after all). Therefore, this patch adds a mutex
named "oom_printk_lock". Although using mutex_lock() in order to allow
printk() to use CPU resource for writing to consoles is better from the
point of view of flushing printk(), this patch uses mutex_trylock() for
allocation stall messages because Michal does not like serialization.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Michal Hocko <mhocko@kernel.org>
---
include/linux/oom.h | 1 +
mm/oom_kill.c | 5 +++++
mm/page_alloc.c | 4 +++-
3 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 76aac4c..1425767 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -44,6 +44,7 @@ struct oom_control {
};
extern struct mutex oom_lock;
+extern struct mutex oom_printk_lock;
static inline void set_current_oom_origin(void)
{
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 26add8a..5aef9a6 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -54,6 +54,7 @@
int sysctl_oom_dump_tasks = 1;
DEFINE_MUTEX(oom_lock);
+DEFINE_MUTEX(oom_printk_lock);
#ifdef CONFIG_NUMA
/**
@@ -1074,7 +1075,9 @@ bool out_of_memory(struct oom_control *oc)
current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
get_task_struct(current);
oc->chosen = current;
+ mutex_lock(&oom_printk_lock);
oom_kill_process(oc, "Out of memory (oom_kill_allocating_task)");
+ mutex_unlock(&oom_printk_lock);
return true;
}
@@ -1085,8 +1088,10 @@ bool out_of_memory(struct oom_control *oc)
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
+ mutex_lock(&oom_printk_lock);
oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" :
"Memory cgroup out of memory");
+ mutex_unlock(&oom_printk_lock);
/*
* Give the killed process a good chance to exit before trying
* to allocate memory again.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 97687b3..3766687 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3990,11 +3990,13 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
goto nopage;
/* Make sure we know about allocations which stall for too long */
- if (time_after(jiffies, alloc_start + stall_timeout)) {
+ if (time_after(jiffies, alloc_start + stall_timeout) &&
+ mutex_trylock(&oom_printk_lock)) {
warn_alloc(gfp_mask & ~__GFP_NOWARN, ac->nodemask,
"page allocation stalls for %ums, order:%u",
jiffies_to_msecs(jiffies-alloc_start), order);
stall_timeout += 10 * HZ;
+ mutex_unlock(&oom_printk_lock);
}
/* Avoid recursion of direct reclaim */
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm,page_alloc: Serialize out_of_memory() and allocation stall messages.
2017-10-19 10:51 [PATCH] mm,page_alloc: Serialize out_of_memory() and allocation stall messages Tetsuo Handa
@ 2017-10-19 11:44 ` Michal Hocko
2017-10-20 10:20 ` Tetsuo Handa
0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2017-10-19 11:44 UTC (permalink / raw)
To: Tetsuo Handa
Cc: akpm, linux-mm, linux-kernel, Cong Wang, Johannes Weiner,
Petr Mladek, Sergey Senozhatsky, yuwang.yuwang
On Thu 19-10-17 19:51:02, Tetsuo Handa wrote:
> The printk() flooding problem caused by concurrent warn_alloc() calls was
> already pointed out by me, and there are reports of soft lockups caused by
> warn_alloc(). But this problem is left unhandled because Michal does not
> like serialization from allocation path because he is worrying about
> unexpected side effects and is asking to identify the root cause of soft
> lockups and fix it. But at least consuming CPU resource by not serializing
> concurrent printk() plays some role in the soft lockups, for currently
> printk() can consume CPU resource forever as long as somebody is appending
> to printk() buffer, and writing to consoles also needs CPU resource. That
> is, needlessly consuming CPU resource when calling printk() has unexpected
> side effects.
>
> Although a proposal for offloading writing to consoles to a dedicated
> kernel thread is in progress, it is not yet accepted. And, even after
> the proposal is accepted, writing to printk() buffer faster than the
> kernel thread can write to consoles will result in loss of messages.
> We should refrain from "appending to printk() buffer" and "consuming CPU
> resource" at the same time if possible. We should try to (and we can)
> avoid appending to printk() buffer when printk() is concurrently called
> for reporting the OOM killer and allocation stalls, in order to reduce
> possibility of hitting soft lockups and getting unreadably-jumbled
> messages.
>
> Although avoid mixing both memory allocation stall/failure messages and
> the OOM killer messages would be nice, oom_lock mutex should not be used
> for this purpose, for waiting for oom_lock mutex at warn_alloc() can
> prevent anybody from calling out_of_memory() from __alloc_pages_may_oom()
> because currently __alloc_pages_may_oom() does not wait for oom_lock
> (i.e. causes OOM lockups after all). Therefore, this patch adds a mutex
> named "oom_printk_lock". Although using mutex_lock() in order to allow
> printk() to use CPU resource for writing to consoles is better from the
> point of view of flushing printk(), this patch uses mutex_trylock() for
> allocation stall messages because Michal does not like serialization.
Hell no! I've tried to be patient with you but it seems that is just
pointless waste of time. Such an approach is absolutely not acceptable.
You are adding an additional lock dependency into the picture. Say that
there is somebody stuck in warn_alloc path and cannot make a further
progress because printk got stuck. Now you are blocking oom_kill_process
as well. So the cure might be even worse than the problem.
If the warn_alloc is really causing issues and you do not want to spend
energy into identifying _what_ is the actual problem then I would rather
remove the stall warning than add a fishy code.
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
> Reported-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> ---
> include/linux/oom.h | 1 +
> mm/oom_kill.c | 5 +++++
> mm/page_alloc.c | 4 +++-
> 3 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 76aac4c..1425767 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -44,6 +44,7 @@ struct oom_control {
> };
>
> extern struct mutex oom_lock;
> +extern struct mutex oom_printk_lock;
>
> static inline void set_current_oom_origin(void)
> {
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 26add8a..5aef9a6 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -54,6 +54,7 @@
> int sysctl_oom_dump_tasks = 1;
>
> DEFINE_MUTEX(oom_lock);
> +DEFINE_MUTEX(oom_printk_lock);
>
> #ifdef CONFIG_NUMA
> /**
> @@ -1074,7 +1075,9 @@ bool out_of_memory(struct oom_control *oc)
> current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
> get_task_struct(current);
> oc->chosen = current;
> + mutex_lock(&oom_printk_lock);
> oom_kill_process(oc, "Out of memory (oom_kill_allocating_task)");
> + mutex_unlock(&oom_printk_lock);
> return true;
> }
>
> @@ -1085,8 +1088,10 @@ bool out_of_memory(struct oom_control *oc)
> panic("Out of memory and no killable processes...\n");
> }
> if (oc->chosen && oc->chosen != (void *)-1UL) {
> + mutex_lock(&oom_printk_lock);
> oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" :
> "Memory cgroup out of memory");
> + mutex_unlock(&oom_printk_lock);
> /*
> * Give the killed process a good chance to exit before trying
> * to allocate memory again.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 97687b3..3766687 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3990,11 +3990,13 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> goto nopage;
>
> /* Make sure we know about allocations which stall for too long */
> - if (time_after(jiffies, alloc_start + stall_timeout)) {
> + if (time_after(jiffies, alloc_start + stall_timeout) &&
> + mutex_trylock(&oom_printk_lock)) {
> warn_alloc(gfp_mask & ~__GFP_NOWARN, ac->nodemask,
> "page allocation stalls for %ums, order:%u",
> jiffies_to_msecs(jiffies-alloc_start), order);
> stall_timeout += 10 * HZ;
> + mutex_unlock(&oom_printk_lock);
> }
>
> /* Avoid recursion of direct reclaim */
> --
> 1.8.3.1
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm,page_alloc: Serialize out_of_memory() and allocation stall messages.
2017-10-19 11:44 ` Michal Hocko
@ 2017-10-20 10:20 ` Tetsuo Handa
2017-10-24 11:23 ` Tetsuo Handa
0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2017-10-20 10:20 UTC (permalink / raw)
To: mhocko
Cc: akpm, linux-mm, linux-kernel, xiyou.wangcong, hannes, pmladek,
sergey.senozhatsky, yuwang.yuwang
Michal Hocko wrote:
> On Thu 19-10-17 19:51:02, Tetsuo Handa wrote:
> > The printk() flooding problem caused by concurrent warn_alloc() calls was
> > already pointed out by me, and there are reports of soft lockups caused by
> > warn_alloc(). But this problem is left unhandled because Michal does not
> > like serialization from allocation path because he is worrying about
> > unexpected side effects and is asking to identify the root cause of soft
> > lockups and fix it. But at least consuming CPU resource by not serializing
> > concurrent printk() plays some role in the soft lockups, for currently
> > printk() can consume CPU resource forever as long as somebody is appending
> > to printk() buffer, and writing to consoles also needs CPU resource. That
> > is, needlessly consuming CPU resource when calling printk() has unexpected
> > side effects.
> >
> > Although a proposal for offloading writing to consoles to a dedicated
> > kernel thread is in progress, it is not yet accepted. And, even after
> > the proposal is accepted, writing to printk() buffer faster than the
> > kernel thread can write to consoles will result in loss of messages.
> > We should refrain from "appending to printk() buffer" and "consuming CPU
> > resource" at the same time if possible. We should try to (and we can)
> > avoid appending to printk() buffer when printk() is concurrently called
> > for reporting the OOM killer and allocation stalls, in order to reduce
> > possibility of hitting soft lockups and getting unreadably-jumbled
> > messages.
> >
> > Although avoid mixing both memory allocation stall/failure messages and
> > the OOM killer messages would be nice, oom_lock mutex should not be used
> > for this purpose, for waiting for oom_lock mutex at warn_alloc() can
> > prevent anybody from calling out_of_memory() from __alloc_pages_may_oom()
> > because currently __alloc_pages_may_oom() does not wait for oom_lock
> > (i.e. causes OOM lockups after all). Therefore, this patch adds a mutex
> > named "oom_printk_lock". Although using mutex_lock() in order to allow
> > printk() to use CPU resource for writing to consoles is better from the
> > point of view of flushing printk(), this patch uses mutex_trylock() for
> > allocation stall messages because Michal does not like serialization.
>
> Hell no! I've tried to be patient with you but it seems that is just
> pointless waste of time. Such an approach is absolutely not acceptable.
> You are adding an additional lock dependency into the picture. Say that
> there is somebody stuck in warn_alloc path and cannot make a further
> progress because printk got stuck. Now you are blocking oom_kill_process
> as well. So the cure might be even worse than the problem.
Sigh... printk() can't get stuck unless somebody continues appending to
printk() buffer. Otherwise, printk() cannot be used from arbitrary context.
You had better stop calling printk() with oom_lock held if you consider that
printk() can get stuck.
I will say "Say that there is somebody stuck in oom_kill_process() path and
cannot make a further progress because printk() got stuck. Now you are keeping
the mutex_trylock(&oom_lock) thread who invoked the OOM killer defunctional by
forcing the !mutex_trylock(&oom_lock) threads to keep calling warn_alloc().
So calling warn_alloc() might be even worse than not calling warn_alloc()."
This is known as what we call printk() v.s. oom_lock deadlock which I can
observe with my stress tests.
If somebody continues appending to printk() buffer, such user has to be fixed.
And it is warn_alloc() who continues appending to printk() buffer. This patch
is for breaking the printk() continuation dependency by isolating each thread's
transaction. Despite this patch introduces a lock dependency, this patch is for
mitigating printk() v.s. oom_lock deadlock described above. (I said "mitigate"
rather than "remove", for other printk() sources if any could still preserve
printk() v.s. oom_lock deadlock.)
---------- Pseudo code start ----------
Before warn_alloc() was introduced:
retry:
if (mutex_trylock(&oom_lock)) {
while (atomic_read(&printk_pending_logs) > 0) {
atomic_dec(&printk_pending_logs);
print_one_log();
}
// Send SIGKILL here.
mutex_unlock(&oom_lock)
}
goto retry;
After warn_alloc() was introduced:
retry:
if (mutex_trylock(&oom_lock)) {
while (atomic_read(&printk_pending_logs) > 0) {
atomic_dec(&printk_pending_logs);
print_one_log();
}
// Send SIGKILL here.
mutex_unlock(&oom_lock)
} else if (waited_for_10seconds()) {
atomic_inc(&printk_pending_logs);
}
goto retry;
Note that although waited_for_10seconds() becomes true once for 10 seconds,
unbounded number of threads can call waited_for_10seconds() at the same time.
Also, since threads doing waited_for_10seconds() keep doing almost busy loop,
the thread doing print_one_log() can use little CPU resource. As a result,
it is possible to keep the thread doing print_one_log() looping forever.
After this patch is applied:
retry:
if (mutex_trylock(&oom_lock)) {
mutex_lock(&oom_printk_lock);
while (atomic_read(&printk_pending_logs) > 0) {
atomic_dec(&printk_pending_logs);
print_one_log();
}
// Send SIGKILL here.
mutex_unlock(&oom_printk_lock);
mutex_unlock(&oom_lock)
} else if (waited_for_10seconds()) {
if (mutex_trylock(&oom_printk_lock)) {
atomic_inc(&printk_pending_logs);
mutex_unlock(&oom_printk_lock);
}
}
goto retry;
---------- Pseudo code end ----------
>
> If the warn_alloc is really causing issues and you do not want to spend
> energy into identifying _what_ is the actual problem then I would rather
> remove the stall warning than add a fishy code.
That's limitation of synchronous watchdog. Synchronous watchdog is prone to
unexpected results (e.g. too late or frequent reports) and overlooks (e.g.
now-fixed infinite too_many_isolated() loop in shrink_inactive_list() and
now-triaged deadlock at virtballoon_oom_notify() inside out_of_memory()).
I do welcome removing warn_alloc() if it is replaced with a better approach.
As far as I know, warn_alloc() never helped with providing information other
than "something is going wrong".
For example, warn_alloc() was called after 30+ minutes of stall
( https://bugzilla.kernel.org/show_bug.cgi?id=192981 ) though watchdog
should provide information during stalling rather than post stalling.
For another example, Cong Wang's case was just ignored without investigation
( https://bugzilla.redhat.com/show_bug.cgi?id=1492664 ) though Johannes
Weiner's case was not during stress testing. This example is guessed as
"In this case it seems the cgroup operations are being abused.", but I believe
that there is a possibility that system-wide OOM was in progress and cgroup
was irrelevant. That is, oom_lock was held at __alloc_pages_may_oom() by
somebody, and printk() was in progress inside oom_kill_process(). But since
concurrent printk() from warn_alloc() kept appending to printk() buffer,
printk() from oom_kill_process() was not able to make the further progress
unless somebody releases memory. But it is impossible to confirm it because
OOM killer messages were not found within partially written output.
I'm happy to spend energy into identifying _what_ is the actual problem is, but
the actual problem cannot be identified without a lot of trial and error (e.g.
reporting more information including other threads such as the owner of oom_lock
and kswapd-like threads). But infrastructure for doing such trial and error is
so far ignored because it needs serialization in order to get useful information
(basically SysRq-t + SysRq-m, and possibly more depending on the output) where
asynchronous watchdog can do better.
>
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
> > Reported-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
> > Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > Cc: Petr Mladek <pmladek@suse.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > ---
> > include/linux/oom.h | 1 +
> > mm/oom_kill.c | 5 +++++
> > mm/page_alloc.c | 4 +++-
> > 3 files changed, 9 insertions(+), 1 deletion(-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm,page_alloc: Serialize out_of_memory() and allocation stall messages.
2017-10-20 10:20 ` Tetsuo Handa
@ 2017-10-24 11:23 ` Tetsuo Handa
2017-10-25 10:47 ` Tetsuo Handa
0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2017-10-24 11:23 UTC (permalink / raw)
To: mhocko
Cc: akpm, linux-mm, linux-kernel, xiyou.wangcong, hannes, pmladek,
sergey.senozhatsky, yuwang.yuwang
Tetsuo Handa wrote:
> Michal Hocko wrote:
> > Hell no! I've tried to be patient with you but it seems that is just
> > pointless waste of time. Such an approach is absolutely not acceptable.
> > You are adding an additional lock dependency into the picture. Say that
> > there is somebody stuck in warn_alloc path and cannot make a further
> > progress because printk got stuck. Now you are blocking oom_kill_process
> > as well. So the cure might be even worse than the problem.
>
> Sigh... printk() can't get stuck unless somebody continues appending to
> printk() buffer. Otherwise, printk() cannot be used from arbitrary context.
>
> You had better stop calling printk() with oom_lock held if you consider that
> printk() can get stuck.
>
For explaining how stupid the printk() versus oom_lock dependency is, here is a
patch for reproducing soft lockup caused by uncontrolled warn_alloc().
Below patch is for 6cff0a118f23b98c ("Merge tag 'platform-drivers-x86-v4.14-3' of
git://git.infradead.org/linux-platform-drivers-x86"), which intentionally try to
let the thread holding oom_lock to get stuck at printk(). This patch does not change
functionality. This patch changes only frequency/timing for emulating worst situation.
----------
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 77e4d3c..4c43f83 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3207,7 +3207,7 @@ static void warn_alloc_show_mem(gfp_t gfp_mask, nodemask_t *nodemask)
unsigned int filter = SHOW_MEM_FILTER_NODES;
static DEFINE_RATELIMIT_STATE(show_mem_rs, HZ, 1);
- if (should_suppress_show_mem() || !__ratelimit(&show_mem_rs))
+ if (should_suppress_show_mem())
return;
/*
@@ -3232,7 +3232,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
static DEFINE_RATELIMIT_STATE(nopage_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
- if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
+ if ((gfp_mask & __GFP_NOWARN))
return;
pr_warn("%s: ", current->comm);
@@ -4002,7 +4002,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
goto nopage;
/* Make sure we know about allocations which stall for too long */
- if (time_after(jiffies, alloc_start + stall_timeout)) {
+ if (__mutex_owner(&oom_lock)) {
warn_alloc(gfp_mask & ~__GFP_NOWARN, ac->nodemask,
"page allocation stalls for %ums, order:%u",
jiffies_to_msecs(jiffies-alloc_start), order);
----------
Enable softlockup_panic so that we can know where the thread got stuck.
----------
echo 9 > /proc/sys/kernel/printk
echo 1 > /proc/sys/kernel/sysrq
echo 1 > /proc/sys/kernel/softlockup_panic
----------
Memory stressor is shown below. All processes run on CPU 0.
----------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sched.h>
#include <signal.h>
#include <sys/prctl.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
struct sched_param sp = { 0 };
cpu_set_t cpu = { { 1 } };
static int pipe_fd[2] = { EOF, EOF };
char *buf = NULL;
unsigned long size = 0;
unsigned int i;
int fd;
pipe(pipe_fd);
signal(SIGCLD, SIG_IGN);
if (fork() == 0) {
prctl(PR_SET_NAME, (unsigned long) "first-victim", 0, 0, 0);
while (1)
pause();
}
close(pipe_fd[1]);
sched_setaffinity(0, sizeof(cpu), &cpu);
prctl(PR_SET_NAME, (unsigned long) "normal-priority", 0, 0, 0);
for (i = 0; i < 1024; i++)
if (fork() == 0) {
char c;
/* Wait until the first-victim is OOM-killed. */
read(pipe_fd[0], &c, 1);
/* Try to consume as much CPU time as possible. */
while(1) {
void *ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, EOF, 0);
munmap(ptr, 4096);
}
_exit(0);
}
close(pipe_fd[0]);
fd = open("/dev/zero", O_RDONLY);
for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
char *cp = realloc(buf, size);
if (!cp) {
size >>= 1;
break;
}
buf = cp;
}
sched_setscheduler(0, SCHED_IDLE, &sp);
prctl(PR_SET_NAME, (unsigned long) "idle-priority", 0, 0, 0);
while (size) {
int ret = read(fd, buf, size); /* Will cause OOM due to overcommit */
if (ret <= 0)
break;
buf += ret;
size -= ret;
}
kill(-1, SIGKILL);
return 0; /* Not reached. */
}
----------
Below is a crash dump obtained using qemu-kvm hosted on 3.10.0-693.2.2.el7.x86_64 kernel.
----------
KERNEL: linux/vmlinux
DUMPFILE: /tmp/dump2 [PARTIAL DUMP]
CPUS: 4 [OFFLINE: 3]
DATE: Tue Oct 24 16:01:19 2017
UPTIME: 00:01:08
LOAD AVERAGE: 3.88, 0.92, 0.30
TASKS: 1151
NODENAME: localhost.localdomain
RELEASE: 4.14.0-rc6+
VERSION: #308 SMP Tue Oct 24 14:53:38 JST 2017
MACHINE: x86_64 (3192 Mhz)
MEMORY: 8 GB
PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
PID: 1046
COMMAND: "idle-priority"
TASK: ffff88022c390000 [THREAD_INFO: ffff88022c390000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> ps -l
[68006009038] [IN] PID: 20 TASK: ffff88018ee62e80 CPU: 2 COMMAND: "watchdog/2"
[68005972514] [UN] PID: 66 TASK: ffff88023019c5c0 CPU: 3 COMMAND: "kworker/3:1"
[68005818859] [UN] PID: 293 TASK: ffff88022fdfc5c0 CPU: 1 COMMAND: "kworker/1:3"
[68005793814] [UN] PID: 986 TASK: ffff880230209740 CPU: 2 COMMAND: "master"
[68005528566] [UN] PID: 217 TASK: ffff88023023ae80 CPU: 1 COMMAND: "kworker/1:2"
[68005444464] [UN] PID: 416 TASK: ffff88022fd85d00 CPU: 2 COMMAND: "systemd-journal"
[68005442636] [UN] PID: 891 TASK: ffff880230239740 CPU: 2 COMMAND: "kworker/2:4"
[68005362065] [UN] PID: 577 TASK: ffff88022fde0000 CPU: 3 COMMAND: "in:imjournal"
[68005026323] [UN] PID: 928 TASK: ffff8802311a0000 CPU: 1 COMMAND: "tuned"
[68005017104] [IN] PID: 14 TASK: ffff88018ee39740 CPU: 1 COMMAND: "watchdog/1"
[68005015266] [UN] PID: 589 TASK: ffff88023675ae80 CPU: 2 COMMAND: "crond"
[68005013129] [UN] PID: 605 TASK: ffff88022fdf9740 CPU: 3 COMMAND: "gmain"
[68003013386] [UN] PID: 8 TASK: ffff88018ee0ae80 CPU: 3 COMMAND: "rcu_sched"
[67038667541] [??] PID: 888 TASK: ffff880230241740 CPU: 2 COMMAND: "kworker/2:3"
[64007020859] [IN] PID: 26 TASK: ffff88018ee8c5c0 CPU: 3 COMMAND: "watchdog/3"
[61920020596] [IN] PID: 40 TASK: ffff88023fd7ae80 CPU: 2 COMMAND: "khugepaged"
[59540010881] [IN] PID: 16 TASK: ffff88018ee3c5c0 CPU: 1 COMMAND: "ksoftirqd/1"
[57812010332] [IN] PID: 22 TASK: ffff88018ee65d00 CPU: 2 COMMAND: "ksoftirqd/2"
[44001013858] [UN] PID: 348 TASK: ffff88023023c5c0 CPU: 1 COMMAND: "kworker/1:1H"
[42992018575] [IN] PID: 57 TASK: ffff880230105d00 CPU: 1 COMMAND: "kswapd0"
[42976019430] [UN] PID: 565 TASK: ffff88022fde2e80 CPU: 3 COMMAND: "kworker/3:1H"
[42838016448] [IN] PID: 347 TASK: ffff88023023dd00 CPU: 3 COMMAND: "xfsaild/vda1"
[42786314423] [UN] PID: 75 TASK: ffff8802301eae80 CPU: 1 COMMAND: "kworker/u8:1"
[42785491218] [UN] PID: 315 TASK: ffff88022f20c5c0 CPU: 2 COMMAND: "kworker/2:1H"
[42784618771] [UN] PID: 5 TASK: ffff88018edadd00 CPU: 1 COMMAND: "kworker/u8:0"
[42783021941] [UN] PID: 574 TASK: ffff88023675c5c0 CPU: 3 COMMAND: "irqbalance"
[40738480039] [RU] PID: 1046 TASK: ffff88022c390000 CPU: 0 COMMAND: "idle-priority"
[40738478908] [RU] PID: 205 TASK: ffff88022fde5d00 CPU: 0 COMMAND: "kworker/0:3"
[40005008096] [RU] PID: 11 TASK: ffff88018ee10000 CPU: 0 COMMAND: "watchdog/0"
[37392925636] [IN] PID: 2063 TASK: ffff88022cb48000 CPU: 0 COMMAND: "normal-priority"
[37392903473] [IN] PID: 2066 TASK: ffff88022cb4c5c0 CPU: 0 COMMAND: "normal-priority"
[37392880968] [IN] PID: 2068 TASK: ffff88022cbb8000 CPU: 0 COMMAND: "normal-priority"
(304 '[IN] CPU: 0 COMMAND: "normal-priority"' lines snipped.)
[37381076875] [IN] PID: 1699 TASK: ffff8802339c1740 CPU: 0 COMMAND: "normal-priority"
[37381052728] [IN] PID: 1620 TASK: ffff88023478ae80 CPU: 0 COMMAND: "normal-priority"
[37381026465] [IN] PID: 1514 TASK: ffff880235361740 CPU: 0 COMMAND: "normal-priority"
[37381008004] [RU] PID: 7 TASK: ffff88018ee09740 CPU: 0 COMMAND: "ksoftirqd/0"
[37380971366] [IN] PID: 1738 TASK: ffff880231900000 CPU: 0 COMMAND: "normal-priority"
[37380947911] [IN] PID: 1684 TASK: ffff880233899740 CPU: 0 COMMAND: "normal-priority"
[37380924449] [IN] PID: 1597 TASK: ffff8802345a5d00 CPU: 0 COMMAND: "normal-priority"
(708 '[IN] CPU: 0 COMMAND: "normal-priority"' lines snipped.)
[37361136670] [IN] PID: 1239 TASK: ffff880233601740 CPU: 0 COMMAND: "normal-priority"
[37361111718] [IN] PID: 1238 TASK: ffff880233600000 CPU: 0 COMMAND: "normal-priority"
[37361069614] [IN] PID: 1237 TASK: ffff88023559dd00 CPU: 0 COMMAND: "normal-priority"
[37344152029] [IN] PID: 1047 TASK: ffff880230250000 CPU: 3 COMMAND: "first-victim"
[37344142207] [IN] PID: 27 TASK: ffff88018ee8dd00 CPU: 3 COMMAND: "migration/3"
[37342945775] [IN] PID: 1027 TASK: ffff8802302545c0 CPU: 2 COMMAND: "bash"
[34669929199] [IN] PID: 1018 TASK: ffff88022c391740 CPU: 3 COMMAND: "login"
[34669576733] [IN] PID: 579 TASK: ffff88022fde45c0 CPU: 3 COMMAND: "rs:main Q:Reg"
[34669018457] [IN] PID: 500 TASK: ffff880236ee5d00 CPU: 1 COMMAND: "auditd"
[34668995398] [IN] PID: 56 TASK: ffff8802301045c0 CPU: 1 COMMAND: "kauditd"
[34668846904] [UN] PID: 300 TASK: ffff88018ef645c0 CPU: 3 COMMAND: "kworker/3:2"
[34668281096] [IN] PID: 1 TASK: ffff88018eda8000 CPU: 2 COMMAND: "systemd"
[34668234545] [IN] PID: 572 TASK: ffff880233555d00 CPU: 1 COMMAND: "systemd-logind"
[34668221621] [IN] PID: 573 TASK: ffff88023675dd00 CPU: 1 COMMAND: "polkitd"
[34668157306] [IN] PID: 587 TASK: ffff8802367c8000 CPU: 1 COMMAND: "gdbus"
[34668030779] [IN] PID: 575 TASK: ffff880236759740 CPU: 0 COMMAND: "dbus-daemon"
[34667468724] [IN] PID: 597 TASK: ffff88018eeb45c0 CPU: 1 COMMAND: "NetworkManager"
[34667366672] [IN] PID: 608 TASK: ffff88022c3945c0 CPU: 3 COMMAND: "gdbus"
[34659025190] [IN] PID: 28 TASK: ffff88018eeb0000 CPU: 3 COMMAND: "ksoftirqd/3"
[32316161544] [UN] PID: 934 TASK: ffff88023020c5c0 CPU: 0 COMMAND: "kworker/0:1H"
[31711078811] [UN] PID: 34 TASK: ffff88018eed9740 CPU: 0 COMMAND: "kworker/0:1"
[31124096990] [UN] PID: 23 TASK: ffff88018ee88000 CPU: 2 COMMAND: "kworker/2:0"
[ 7006653998] [IN] PID: 15 TASK: ffff88018ee3ae80 CPU: 1 COMMAND: "migration/1"
[ 6940208222] [IN] PID: 504 TASK: ffff880230242e80 CPU: 2 COMMAND: "auditd"
[ 5704370038] [IN] PID: 439 TASK: ffff8802302445c0 CPU: 3 COMMAND: "systemd-udevd"
[ 3411620247] [IN] PID: 665 TASK: ffff88022fdc5d00 CPU: 0 COMMAND: "dhclient"
[ 2960244550] [IN] PID: 10 TASK: ffff88018ee0dd00 CPU: 0 COMMAND: "migration/0"
[ 2954991612] [IN] PID: 988 TASK: ffff880233552e80 CPU: 0 COMMAND: "qmgr"
[ 2951385676] [IN] PID: 987 TASK: ffff880233550000 CPU: 1 COMMAND: "pickup"
[ 2797722377] [UN] PID: 4 TASK: ffff88018edac5c0 CPU: 0 COMMAND: "kworker/0:0H"
[ 2797696572] [IN] PID: 2 TASK: ffff88018eda9740 CPU: 2 COMMAND: "kthreadd"
[ 2789250032] [IN] PID: 930 TASK: ffff8802367c45c0 CPU: 1 COMMAND: "tuned"
[ 2786019436] [IN] PID: 927 TASK: ffff8802311a45c0 CPU: 2 COMMAND: "tuned"
[ 2785978972] [IN] PID: 846 TASK: ffff88023020ae80 CPU: 1 COMMAND: "tuned"
[ 2781228329] [IN] PID: 925 TASK: ffff8802311a1740 CPU: 0 COMMAND: "gmain"
[ 2693216592] [UN] PID: 158 TASK: ffff880230252e80 CPU: 2 COMMAND: "kworker/2:2"
[ 2693215466] [UN] PID: 59 TASK: ffff880230189740 CPU: 2 COMMAND: "kworker/2:1"
[ 2688297935] [IN] PID: 848 TASK: ffff880230208000 CPU: 0 COMMAND: "sshd"
[ 2681010287] [UN] PID: 53 TASK: ffff880230100000 CPU: 1 COMMAND: "kworker/1:1"
[ 2676065949] [IN] PID: 21 TASK: ffff88018ee645c0 CPU: 2 COMMAND: "migration/2"
[ 2375572292] [UN] PID: 663 TASK: ffff8802367c5d00 CPU: 1 COMMAND: "kworker/1:4"
[ 2375558842] [UN] PID: 17 TASK: ffff88018ee3dd00 CPU: 1 COMMAND: "kworker/1:0"
[ 1897490458] [IN] PID: 593 TASK: ffff88022fd81740 CPU: 3 COMMAND: "polkitd"
[ 1897116851] [IN] PID: 591 TASK: ffff8802367cae80 CPU: 1 COMMAND: "JS Sour~ Thread"
[ 1895094186] [IN] PID: 32 TASK: ffff88018eeb5d00 CPU: 1 COMMAND: "kdevtmpfs"
[ 1884903018] [IN] PID: 588 TASK: ffff8802367cc5c0 CPU: 1 COMMAND: "JS GC Helper"
[ 1881538545] [IN] PID: 585 TASK: ffff8802367cdd00 CPU: 1 COMMAND: "gmain"
[ 1872987774] [IN] PID: 583 TASK: ffff88022fde1740 CPU: 1 COMMAND: "dbus-daemon"
[ 1840792454] [IN] PID: 303 TASK: ffff88023019dd00 CPU: 2 COMMAND: "scsi_eh_0"
[ 1836228666] [IN] PID: 569 TASK: ffff8802335545c0 CPU: 3 COMMAND: "rsyslogd"
[ 1820454811] [UN] PID: 30 TASK: ffff88018eeb2e80 CPU: 3 COMMAND: "kworker/3:0H"
[ 1652310426] [UN] PID: 311 TASK: ffff88022f209740 CPU: 3 COMMAND: "kworker/u8:3"
[ 1638991783] [UN] PID: 111 TASK: ffff8802301b45c0 CPU: 0 COMMAND: "kworker/0:2"
[ 1252067459] [UN] PID: 18 TASK: ffff88018ee60000 CPU: 1 COMMAND: "kworker/1:0H"
[ 1248932602] [UN] PID: 346 TASK: ffff880230238000 CPU: 3 COMMAND: "xfs-eofblocks/v"
[ 1248911038] [UN] PID: 345 TASK: ffff8802301c1740 CPU: 2 COMMAND: "xfs-log/vda1"
[ 1248897201] [UN] PID: 344 TASK: ffff8802301c5d00 CPU: 1 COMMAND: "xfs-reclaim/vda"
[ 1248867714] [UN] PID: 343 TASK: ffff8802301c45c0 CPU: 0 COMMAND: "xfs-cil/vda1"
[ 1248853541] [UN] PID: 342 TASK: ffff8802301c2e80 CPU: 1 COMMAND: "xfs-conv/vda1"
[ 1248824175] [UN] PID: 341 TASK: ffff8802301c0000 CPU: 0 COMMAND: "xfs-data/vda1"
[ 1248798070] [UN] PID: 340 TASK: ffff88018ef61740 CPU: 2 COMMAND: "xfs-buf/vda1"
[ 1248455451] [UN] PID: 339 TASK: ffff88018ef60000 CPU: 3 COMMAND: "xfs_mru_cache"
[ 1248405777] [UN] PID: 338 TASK: ffff88018ef62e80 CPU: 3 COMMAND: "xfsalloc"
[ 974516821] [UN] PID: 323 TASK: ffff88018ef65d00 CPU: 1 COMMAND: "ttm_swap"
[ 967022556] [IN] PID: 305 TASK: ffff88023018ae80 CPU: 2 COMMAND: "scsi_eh_1"
[ 843549456] [UN] PID: 24 TASK: ffff88018ee89740 CPU: 2 COMMAND: "kworker/2:0H"
[ 816011833] [UN] PID: 307 TASK: ffff880230188000 CPU: 2 COMMAND: "kworker/u8:2"
[ 804534162] [UN] PID: 306 TASK: ffff88023018c5c0 CPU: 2 COMMAND: "scsi_tmf_1"
[ 803964224] [UN] PID: 304 TASK: ffff88023018dd00 CPU: 1 COMMAND: "scsi_tmf_0"
[ 800968743] [UN] PID: 29 TASK: ffff88018eeb1740 CPU: 3 COMMAND: "kworker/3:0"
[ 800361745] [UN] PID: 302 TASK: ffff880230198000 CPU: 2 COMMAND: "ata_sff"
[ 728153882] [UN] PID: 3 TASK: ffff88018edaae80 CPU: 0 COMMAND: "kworker/0:0"
[ 630499518] [IN] PID: 25 TASK: ffff88018ee8ae80 CPU: 3 COMMAND: "cpuhp/3"
[ 630455067] [IN] PID: 19 TASK: ffff88018ee61740 CPU: 2 COMMAND: "cpuhp/2"
[ 630429505] [UN] PID: 112 TASK: ffff8802301b2e80 CPU: 3 COMMAND: "ipv6_addrconf"
[ 630416623] [IN] PID: 13 TASK: ffff88018ee38000 CPU: 1 COMMAND: "cpuhp/1"
[ 630374181] [IN] PID: 12 TASK: ffff88018ee15d00 CPU: 0 COMMAND: "cpuhp/0"
[ 616646337] [UN] PID: 110 TASK: ffff8802301b1740 CPU: 2 COMMAND: "kaluad"
[ 616609183] [UN] PID: 109 TASK: ffff880230102e80 CPU: 3 COMMAND: "kmpath_rdacd"
[ 594524204] [UN] PID: 108 TASK: ffff880230101740 CPU: 1 COMMAND: "acpi_thermal_pm"
[ 594182764] [UN] PID: 107 TASK: ffff880230255d00 CPU: 0 COMMAND: "kthrotld"
[ 134270773] [UN] PID: 47 TASK: ffff88023fc4dd00 CPU: 1 COMMAND: "watchdogd"
[ 134081248] [UN] PID: 45 TASK: ffff88023fc4ae80 CPU: 3 COMMAND: "edac-poller"
[ 134081248] [UN] PID: 46 TASK: ffff88023fc4c5c0 CPU: 1 COMMAND: "devfreq_wq"
[ 133661695] [UN] PID: 44 TASK: ffff88023fc49740 CPU: 2 COMMAND: "md"
[ 20366125] [IN] PID: 35 TASK: ffff88018eedae80 CPU: 2 COMMAND: "khungtaskd"
[ 20366125] [IN] PID: 36 TASK: ffff88018eedc5c0 CPU: 1 COMMAND: "oom_reaper"
[ 20366125] [UN] PID: 37 TASK: ffff88018eeddd00 CPU: 1 COMMAND: "writeback"
[ 20366125] [IN] PID: 38 TASK: ffff88023fd78000 CPU: 2 COMMAND: "kcompactd0"
[ 20366125] [IN] PID: 39 TASK: ffff88023fd79740 CPU: 3 COMMAND: "ksmd"
[ 20366125] [UN] PID: 41 TASK: ffff88023fd7c5c0 CPU: 2 COMMAND: "crypto"
[ 20366125] [UN] PID: 42 TASK: ffff88023fd7dd00 CPU: 1 COMMAND: "kintegrityd"
[ 20366125] [UN] PID: 43 TASK: ffff88023fc48000 CPU: 3 COMMAND: "kblockd"
[ 18537224] [UN] PID: 33 TASK: ffff88018eed8000 CPU: 1 COMMAND: "netns"
[ 4808153] [UN] PID: 9 TASK: ffff88018ee0c5c0 CPU: 0 COMMAND: "rcu_bh"
[ 4785179] [UN] PID: 6 TASK: ffff88018ee08000 CPU: 0 COMMAND: "mm_percpu_wq"
[ 0] [RU] PID: 0 TASK: ffffffff81c10480 CPU: 0 COMMAND: "swapper/0"
[ 0] [RU] PID: 0 TASK: ffff88018ee11740 CPU: 1 COMMAND: "swapper/1"
[ 0] [RU] PID: 0 TASK: ffff88018ee12e80 CPU: 2 COMMAND: "swapper/2"
[ 0] [RU] PID: 0 TASK: ffff88018ee145c0 CPU: 3 COMMAND: "swapper/3"
crash> log
[ 68.005884] Normal free:42276kB min:42468kB low:53084kB high:63700kB active_anon:4878580kB inactive_anon:8348kB active_file:20kB inactive_file:48kB unevictable:0kB writepending:8kB present:5242880kB managed:5110028kB mlocked:0kB kernel_stack:18400kB pagetables:42248kB bounce:0kB free_pcp:1700kB local_pcp:732kB free_cma:0kB
[ 68.005884] 75
[ 68.005884] lowmem_reserve[]:
[ 68.005884] 0b
[ 68.005885] 0
[ 68.005885] f3
[ 68.005886] 0
[ 68.005886] 90
[ 68.005886] 0
[ 68.005887] 41
[ 68.005887] 0
[ 68.005887] 0
[ 68.005888] (U)
[ 68.005888] Node 0
[ 68.005889] 1*256kB
[ 68.005889] DMA:
[ 68.005889] (U)
[ 68.005890] 1*4kB
[ 68.005890] 0*512kB
[ 68.005891] (U)
[ 68.005891] 1*1024kB
[ 68.005891] 1*8kB
[ 68.005892] (U)
[ 68.005892] (U)
[ 68.005892] 1*2048kB
[ 68.005893] 1*16kB
[ 68.005893] (M)
[ 68.005893] (U)
[ 68.005894] 3*4096kB
[ 68.005894] 0*32kB
[ 68.005894] (M)
[ 68.005895] 2*64kB
[ 68.005895] = 15900kB
[ 68.005895] (U)
[ 68.005895] Node 0
[ 68.005896] 1*128kB
[ 68.005896] DMA32:
[ 68.005897] (U)
[ 68.005897] 4*4kB
[ 68.005897] 1*256kB
[ 68.005898] (UM)
[ 68.005898] (U)
[ 68.005898] 3*8kB
[ 68.005899] 0*512kB
[ 68.005899] (UM)
[ 68.005899] 1*1024kB
[ 68.005900] 3*16kB
[ 68.005900] (U)
[ 68.005900] (U)
[ 68.005901] 1*2048kB
[ 68.005901] 4*32kB
[ 68.005901] (M)
[ 68.005901] (UM)
[ 68.005902] 3*4096kB
[ 68.005902] 2*64kB
[ 68.005902] (M)
[ 68.005903] (U)
[ 68.005903] = 15900kB
[ 68.005903] 0*128kB
[ 68.005904] Node 0
[ 68.005904] 0*256kB
[ 68.005904] DMA32:
[ 68.005905] 5*512kB
[ 68.005905] 4*4kB
[ 68.005905] (M)
[ 68.005906] (UM)
[ 68.005906] 5*1024kB
[ 68.005906] 3*8kB
[ 68.005907] (UM)
[ 68.005907] (UM)
[ 68.005907] 2*2048kB
[ 68.005908] 3*16kB
[ 68.005908] (UM)
[ 68.005908] (U)
[ 68.005909] 8*4096kB
[ 68.005909] 4*32kB
[ 68.005909] (M)
[ 68.005910] (UM)
[ 68.005910] = 44888kB
[ 68.005910] 2*64kB
[ 68.005911] Node 0
[ 68.005911] (U)
[ 68.005911] Normal:
[ 68.005912] 0*128kB
[ 68.005912] 702*4kB
[ 68.005912] 0*256kB
[ 68.005913] (UME)
[ 68.005913] 5*512kB
[ 68.005913] 411*8kB
[ 68.005914] (M)
[ 68.005914] (UME)
[ 68.005914] 5*1024kB
[ 68.005915] 186*16kB
[ 68.005915] (UM)
[ 68.005915] (UE)
[ 68.005916] 2*2048kB
[ 68.005916] 58*32kB
[ 68.005916] (UM)
[ 68.005917] (UME)
[ 68.005917] 8*4096kB
[ 68.005917] 70*64kB
[ 68.005918] (M)
[ 68.005918] (UME)
[ 68.005918] = 44888kB
[ 68.005919] 65*128kB
[ 68.005919] Node 0
[ 68.005919] (UME)
[ 68.005919] Normal:
[ 68.005920] 22*256kB
[ 68.005920] 702*4kB
[ 68.005921] (UM)
[ 68.005921] (UME)
[ 68.005921] 10*512kB
[ 68.005922] 411*8kB
[ 68.005922] (M)
[ 68.005922] (UME)
[ 68.005923] 7*1024kB
[ 68.005923] 186*16kB
[ 68.005923] (UM)
[ 68.005924] (UE)
[ 68.005924] 0*2048kB
[ 68.005924] 58*32kB
[ 68.005925] 0*4096kB
[ 68.005925] (UME)
[ 68.005925] = 41648kB
[ 68.005926] 70*64kB
[ 68.005926] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 68.005927] (UME)
[ 68.005927] 2205 total pagecache pages
[ 68.005927] 65*128kB
[ 68.005928] 0 pages in swap cache
[ 68.005928] (UME)
[ 68.005928] Swap cache stats: add 0, delete 0, find 0/0
[ 68.005929] 22*256kB
[ 68.005929] Free swap = 0kB
[ 68.005929] (UM)
[ 68.005930] Total swap = 0kB
[ 68.005930] 10*512kB
[ 68.005930] 2097045 pages RAM
[ 68.005930] (M)
[ 68.005931] 0 pages HighMem/MovableOnly
[ 68.005931] 7*1024kB
[ 68.005931] 53742 pages reserved
[ 68.005932] (UM)
[ 68.005932] 0 pages cma reserved
[ 68.005932] 0*2048kB
[ 68.005933] 0 pages hwpoisoned
[ 68.005933] 0*4096kB = 41648kB
[ 68.005934] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 68.005934] 2205 total pagecache pages
[ 68.005935] 0 pages in swap cache
[ 68.005935] Swap cache stats: add 0, delete 0, find 0/0
[ 68.005935] Free swap = 0kB
[ 68.005936] Total swap = 0kB
[ 68.005936] 2097045 pages RAM
[ 68.005936] 0 pages HighMem/MovableOnly
[ 68.005936] 53742 pages reserved
[ 68.005937] 0 pages cma reserved
[ 68.005937] 0 pages hwpoisoned
[ 68.006581] Kernel panic - not syncing: softlockup: hung tasks
[ 68.006582] CPU: 0 PID: 1046 Comm: idle-priority Tainted: G L 4.14.0-rc6+ #308
[ 68.006582] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 68.006582] Call Trace:
[ 68.006583] <IRQ>
[ 68.006585] dump_stack+0x63/0x87
[ 68.006586] panic+0xeb/0x245
[ 68.006588] watchdog_timer_fn+0x212/0x220
[ 68.006589] ? watchdog+0x30/0x30
[ 68.006591] __hrtimer_run_queues+0xe5/0x230
[ 68.006593] hrtimer_interrupt+0xa8/0x1a0
[ 68.006595] smp_apic_timer_interrupt+0x5f/0x130
[ 68.006596] apic_timer_interrupt+0x9d/0xb0
[ 68.006596] </IRQ>
[ 68.006597] RIP: 0010:console_unlock+0x24e/0x4c0
[ 68.006598] RSP: 0018:ffffc900016b76d8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[ 68.006598] RAX: 0000000000000001 RBX: 0000000000000025 RCX: ffff88022f302000
[ 68.006599] RDX: 0000000000000025 RSI: 0000000000000087 RDI: 0000000000000246
[ 68.006599] RBP: ffffc900016b7718 R08: 0000000001080020 R09: 0000000080000000
[ 68.006600] R10: 0000000000000e06 R11: 000000000000000c R12: 0000000000000400
[ 68.006600] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000025
[ 68.006602] vprintk_emit+0x2f5/0x3a0
[ 68.006603] vprintk_default+0x29/0x50
[ 68.006604] vprintk_func+0x27/0x60
[ 68.006605] printk+0x58/0x6f
[ 68.006606] show_mem+0x1e/0xf0
[ 68.006607] dump_header+0xc0/0x234
[ 68.006608] oom_kill_process+0x21c/0x430
[ 68.006609] out_of_memory+0x114/0x4a0
[ 68.006610] __alloc_pages_slowpath+0x83c/0xb88
[ 68.006612] __alloc_pages_nodemask+0x26a/0x290
[ 68.006613] alloc_pages_vma+0x7f/0x180
[ 68.006614] __handle_mm_fault+0xcb0/0x1290
[ 68.006616] handle_mm_fault+0xcc/0x1e0
[ 68.006617] __do_page_fault+0x24a/0x4d0
[ 68.006619] do_page_fault+0x38/0x130
[ 68.006620] do_async_page_fault+0x22/0xd0
[ 68.006621] async_page_fault+0x22/0x30
[ 68.006622] RIP: 0010:__clear_user+0x25/0x50
[ 68.006622] RSP: 0018:ffffc900016b7d80 EFLAGS: 00010202
[ 68.006623] RAX: 0000000000000000 RBX: ffffc900016b7e68 RCX: 0000000000000002
[ 68.006623] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 00007f40739ac000
[ 68.006624] RBP: ffffc900016b7d80 R08: 0000000000100000 R09: 0000000000000000
[ 68.006624] R10: 0000000000000198 R11: 0000000000000345 R12: 0000000000001000
[ 68.006625] R13: ffffc900016b7e30 R14: 000000005c46a000 R15: 0000000000001000
[ 68.006626] clear_user+0x2b/0x40
[ 68.006628] iov_iter_zero+0x88/0x390
[ 68.006630] read_iter_zero+0x3d/0xa0
[ 68.006631] __vfs_read+0xec/0x160
[ 68.006632] vfs_read+0x8c/0x130
[ 68.006632] SyS_read+0x55/0xc0
[ 68.006634] do_syscall_64+0x67/0x1b0
[ 68.006635] entry_SYSCALL64_slow_path+0x25/0x25
[ 68.006636] RIP: 0033:0x7f429743a7e0
[ 68.006636] RSP: 002b:00007ffd32b02bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 68.006637] RAX: ffffffffffffffda RBX: 0000000080003000 RCX: 00007f429743a7e0
[ 68.006637] RDX: 0000000080003000 RSI: 00007f4017541010 RDI: 0000000000000003
[ 68.006637] RBP: 00007f4017541010 R08: 0000000000000000 R09: 0000000000021000
[ 68.006638] R10: 00007ffd32b02910 R11: 0000000000000246 R12: 00007f3e97544010
[ 68.006638] R13: 0000000000000005 R14: 0000000000000003 R15: 0000000000000000
[ 68.006749] Kernel Offset: disabled
[ 78.017432] ---[ end Kernel panic - not syncing: softlockup: hung tasks
----------
While warn_alloc() messages are completely unreadable, what we should note are that
(a) out_of_memory() => oom_kill_process() => dump_header() => show_mem() => printk()
got stuck at console_unlock() despite this is schedulable context.
----------
2180: for (;;) {
2181: struct printk_log *msg;
2182: size_t ext_len = 0;
2183: size_t len;
2184:
2185: printk_safe_enter_irqsave(flags);
2186: raw_spin_lock(&logbuf_lock);
(...snipped...)
2228: console_idx = log_next(console_idx);
2229: console_seq++;
2230: raw_spin_unlock(&logbuf_lock);
2231:
2232: stop_critical_timings(); /* don't trace print latency */
2233: call_console_drivers(ext_text, ext_len, text, len);
2234: start_critical_timings();
2235: printk_safe_exit_irqrestore(flags); // console_unlock+0x24e/0x4c0 is here.
2236:
2237: if (do_cond_resched)
2238: cond_resched();
2239: }
----------
(b) Last run timestamps of all threads which are on CPU 0, including the "watchdog/0"
watchdog thread, are no longer updated once the "idle-priority" thread started
printk() flooding inside console_unlock(). I don't know why no longer updated,
but is async_page_fault() somehow relevant?
I don't know why "ksoftirqd/0" is in [RU], but due to use of netconsole for
capturing kernel messages?
Anyway, depending on configuration/environment/stress, it is possible to trigger OOM
lockup caused by printk() versus oom_lock dependency. Thus, I do really want to prevent
other threads from appending to printk() buffer when some thread is printk()ing memory
related messages. And mutex_trylock(&oom_printk_lock) can do it more reliably than
__mutex_owner(&oom_lock) == NULL.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm,page_alloc: Serialize out_of_memory() and allocation stall messages.
2017-10-24 11:23 ` Tetsuo Handa
@ 2017-10-25 10:47 ` Tetsuo Handa
0 siblings, 0 replies; 5+ messages in thread
From: Tetsuo Handa @ 2017-10-25 10:47 UTC (permalink / raw)
To: mhocko
Cc: akpm, linux-mm, linux-kernel, xiyou.wangcong, hannes, pmladek,
sergey.senozhatsky, yuwang.yuwang
Tetsuo Handa wrote:
> While warn_alloc() messages are completely unreadable, what we should note are that
>
> (a) out_of_memory() => oom_kill_process() => dump_header() => show_mem() => printk()
> got stuck at console_unlock() despite this is schedulable context.
>
> ----------
> 2180: for (;;) {
> 2181: struct printk_log *msg;
> 2182: size_t ext_len = 0;
> 2183: size_t len;
> 2184:
> 2185: printk_safe_enter_irqsave(flags);
> 2186: raw_spin_lock(&logbuf_lock);
> (...snipped...)
> 2228: console_idx = log_next(console_idx);
> 2229: console_seq++;
> 2230: raw_spin_unlock(&logbuf_lock);
> 2231:
> 2232: stop_critical_timings(); /* don't trace print latency */
> 2233: call_console_drivers(ext_text, ext_len, text, len);
> 2234: start_critical_timings();
> 2235: printk_safe_exit_irqrestore(flags); // console_unlock+0x24e/0x4c0 is here.
> 2236:
> 2237: if (do_cond_resched)
> 2238: cond_resched();
> 2239: }
> ----------
It turned out that cond_resched() was not called due to do_cond_resched == 0 due to
preemptible() == 0 due to CONFIG_PREEMPT_COUNT=n despite CONFIG_PREEMPT_VOLUNTARY=y,
for CONFIG_PREEMPT_VOLUNTARY itself does not select CONFIG_PREEMPT_COUNT. Surprising...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-10-25 10:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-19 10:51 [PATCH] mm,page_alloc: Serialize out_of_memory() and allocation stall messages Tetsuo Handa
2017-10-19 11:44 ` Michal Hocko
2017-10-20 10:20 ` Tetsuo Handa
2017-10-24 11:23 ` Tetsuo Handa
2017-10-25 10:47 ` Tetsuo Handa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox