* linux-4.4-rc1: TIF_MEMDIE without SIGKILL pending?
@ 2015-11-22 12:13 Tetsuo Handa
2015-11-23 8:30 ` Michal Hocko
0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2015-11-22 12:13 UTC (permalink / raw)
To: akpm, oleg; +Cc: linux-mm
I was updating kmallocwd in preparation for testing "[RFC 0/3] OOM detection
rework v2" patchset. I noticed an unexpected result with linux.git as of
3ad5d7e06a96 .
The problem is that an OOM victim arrives at do_exit() with TIF_MEMDIE flag
set but without pending SIGKILL. Is this correct behavior?
----------
diff --git a/kernel/exit.c b/kernel/exit.c
index 07110c6..ea5bcd0 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -656,6 +656,7 @@ void do_exit(long code)
int group_dead;
TASKS_RCU(int tasks_rcu_i);
+ BUG_ON(test_thread_flag(TIF_MEMDIE) && !fatal_signal_pending(current));
profile_task_exit(tsk);
WARN_ON(blk_needs_flush_plug(tsk));
----------
[ 103.796002] ------------[ cut here ]------------
[ 103.797700] kernel BUG at kernel/exit.c:659!
[ 103.799314] invalid opcode: 0000 [#1] SMP
[ 103.800932] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc e\
btable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw iptable_filter ip_tables coretemp crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel glue_h\
elper lrw gf128mul ablk_helper ppdev cryptd vmw_balloon serio_raw pcspkr parport_pc vmw_vmci parport shpchp i2c_piix4 sd_mod ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci ata_piix\
mptspi scsi_transport_spi mptscsih libahci libata mptbase e1000 i2c_core
[ 103.820275] CPU: 1 PID: 11036 Comm: oom-tester4 Not tainted 4.4.0-rc1+ #9
[ 103.822514] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 103.825459] task: ffff880078f0c200 ti: ffff880078db8000 task.ti: ffff880078db8000
[ 103.827850] RIP: 0010:[<ffffffff810726ef>] [<ffffffff810726ef>] do_exit+0xa3f/0xb40
[ 103.830535] RSP: 0018:ffff880078dbbcd0 EFLAGS: 00010246
[ 103.832606] RAX: 0000000000100084 RBX: 0000000000000002 RCX: 0000000000000000
[ 103.834935] RDX: 00000000418004fc RSI: 0000000000000001 RDI: 0000000000000002
[ 103.837314] RBP: ffff880078dbbd30 R08: 0000000000000000 R09: 0000000000000000
[ 103.839595] R10: 0000000000000001 R11: ffff880078f0c930 R12: 0000000000000002
[ 103.841894] R13: ffff880078f0c200 R14: ffff880078f0c200 R15: 0000000000000008
[ 103.844305] FS: 00007fbf5610a740(0000) GS:ffff88007fc40000(0000) knlGS:0000000000000000
[ 103.846845] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 103.849002] CR2: 000055a5c65d17d0 CR3: 0000000078dfd000 CR4: 00000000001406e0
[ 103.851433] Stack:
[ 103.852787] 0000000000000001 ffff880078f0c200 0000000000000046 ffff880078dbbe18
[ 103.855263] ffff880078dbbe38 ffff880078f0c200 000000005c6d8319 ffff880035dc2f40
[ 103.857717] 0000000000000002 ffff880078f0c200 ffff880078dbbe38 0000000000000008
[ 103.860138] Call Trace:
[ 103.861603] [<ffffffff81072877>] do_group_exit+0x47/0xc0
include/linux/sched.h:807
kernel/exit.c:862
[ 103.863513] [<ffffffff8107e0b2>] get_signal+0x222/0x7e0
kernel/signal.c:2307
[ 103.865395] [<ffffffff8100f362>] do_signal+0x32/0x670
arch/x86/kernel/signal.c:709
[ 103.867219] [<ffffffff8106a517>] ? syscall_slow_exit_work+0x4b/0x10d
arch/x86/entry/common.c:306
[ 103.869264] [<ffffffff8106a46a>] ? exit_to_usermode_loop+0x2e/0x90
arch/x86/include/asm/paravirt.h:816
arch/x86/entry/common.c:237
[ 103.871249] [<ffffffff8106a488>] exit_to_usermode_loop+0x4c/0x90
arch/x86/entry/common.c:249
[ 103.873324] [<ffffffff8100355b>] syscall_return_slowpath+0xbb/0x130
arch/x86/entry/common.c:282
arch/x86/entry/common.c:344
[ 103.875322] [<ffffffff816e85da>] int_ret_from_sys_call+0x25/0x9f
arch/x86/entry/entry_64.S:282
[ 103.877228] Code: ba 9f 81 31 c0 e8 ed 6c 0c 00 48 8b b8 90 00 00 00 e8 e6 6b 02 00 e9 31 fb ff ff 49 8b 46 08 48 8b 40 08 a8 04 0f 85 bd 00 00 00 <0f> 0b 4c 89 f7 e8 77 80 0a 00 e9 fb f6 ff ff 49 8b 96 c0 05 00
[ 103.884329] RIP [<ffffffff810726ef>] do_exit+0xa3f/0xb40
kernel/exit.c:659
[ 103.886121] RSP <ffff880078dbbcd0>
[ 103.887537] ---[ end trace a5e757a180b4cf32 ]---
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: linux-4.4-rc1: TIF_MEMDIE without SIGKILL pending?
2015-11-22 12:13 linux-4.4-rc1: TIF_MEMDIE without SIGKILL pending? Tetsuo Handa
@ 2015-11-23 8:30 ` Michal Hocko
2015-11-23 11:06 ` Tetsuo Handa
0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2015-11-23 8:30 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: akpm, oleg, linux-mm
On Sun 22-11-15 21:13:22, Tetsuo Handa wrote:
> I was updating kmallocwd in preparation for testing "[RFC 0/3] OOM detection
> rework v2" patchset. I noticed an unexpected result with linux.git as of
> 3ad5d7e06a96 .
>
> The problem is that an OOM victim arrives at do_exit() with TIF_MEMDIE flag
> set but without pending SIGKILL. Is this correct behavior?
Have a look at out_of_memory where we do:
/*
* If current has a pending SIGKILL or is exiting, then automatically
* select it. The goal is to allow it to allocate so that it may
* quickly exit and free its memory.
*
* But don't select if current has already released its mm and cleared
* TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur.
*/
if (current->mm &&
(fatal_signal_pending(current) || task_will_free_mem(current))) {
mark_oom_victim(current);
return true;
}
So if the current was exiting already we are not killing it, we just give it
access to memory reserves to expedite the exit. We do the same thing for the
memcg case.
Why would that be an issue in the first place?
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: linux-4.4-rc1: TIF_MEMDIE without SIGKILL pending?
2015-11-23 8:30 ` Michal Hocko
@ 2015-11-23 11:06 ` Tetsuo Handa
2015-11-23 11:33 ` Michal Hocko
0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2015-11-23 11:06 UTC (permalink / raw)
To: mhocko; +Cc: akpm, oleg, linux-mm
Michal Hocko wrote:
> On Sun 22-11-15 21:13:22, Tetsuo Handa wrote:
> > I was updating kmallocwd in preparation for testing "[RFC 0/3] OOM detection
> > rework v2" patchset. I noticed an unexpected result with linux.git as of
> > 3ad5d7e06a96 .
> >
> > The problem is that an OOM victim arrives at do_exit() with TIF_MEMDIE flag
> > set but without pending SIGKILL. Is this correct behavior?
>
> Have a look at out_of_memory where we do:
> /*
> * If current has a pending SIGKILL or is exiting, then automatically
> * select it. The goal is to allow it to allocate so that it may
> * quickly exit and free its memory.
> *
> * But don't select if current has already released its mm and cleared
> * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur.
> */
> if (current->mm &&
> (fatal_signal_pending(current) || task_will_free_mem(current))) {
> mark_oom_victim(current);
> return true;
> }
>
> So if the current was exiting already we are not killing it, we just give it
> access to memory reserves to expedite the exit. We do the same thing for the
> memcg case.
The result is the same even if I do
- BUG_ON(test_thread_flag(TIF_MEMDIE) && !fatal_signal_pending(current));
+ BUG_ON(test_thread_flag(TIF_MEMDIE) && !fatal_signal_pending(current) && !task_will_free_mem(current));
. I think that task_will_free_mem() is always false because this BUG_ON()
is located before "exit_signals(tsk); /* sets PF_EXITING */" line.
>
> Why would that be an issue in the first place?
The real problem I care is TIF_MEMDIE livelock.
MemAlloc: oom-tester4(11040) uninterruptible dying victim
MemAlloc: oom-tester4(11045) gfp=0x242014a order=0 delay=10000 dying
I'm not talking about TIF_MEMDIE livelock in this thread. I'm just worrying
that below output (which is caused by an OOM victim arriving at do_exit()
with TIF_MEMDIE flag set but without pending SIGKILL) is a foretaste of
unnoticed problem.
MemAlloc: oom-tester4(11520) uninterruptible victim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: linux-4.4-rc1: TIF_MEMDIE without SIGKILL pending?
2015-11-23 11:06 ` Tetsuo Handa
@ 2015-11-23 11:33 ` Michal Hocko
2015-11-23 12:38 ` Tetsuo Handa
0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2015-11-23 11:33 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: akpm, oleg, linux-mm
On Mon 23-11-15 20:06:02, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Sun 22-11-15 21:13:22, Tetsuo Handa wrote:
> > > I was updating kmallocwd in preparation for testing "[RFC 0/3] OOM detection
> > > rework v2" patchset. I noticed an unexpected result with linux.git as of
> > > 3ad5d7e06a96 .
> > >
> > > The problem is that an OOM victim arrives at do_exit() with TIF_MEMDIE flag
> > > set but without pending SIGKILL. Is this correct behavior?
> >
> > Have a look at out_of_memory where we do:
> > /*
> > * If current has a pending SIGKILL or is exiting, then automatically
> > * select it. The goal is to allow it to allocate so that it may
> > * quickly exit and free its memory.
> > *
> > * But don't select if current has already released its mm and cleared
> > * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur.
> > */
> > if (current->mm &&
> > (fatal_signal_pending(current) || task_will_free_mem(current))) {
> > mark_oom_victim(current);
> > return true;
> > }
> >
> > So if the current was exiting already we are not killing it, we just give it
> > access to memory reserves to expedite the exit. We do the same thing for the
> > memcg case.
>
> The result is the same even if I do
>
> - BUG_ON(test_thread_flag(TIF_MEMDIE) && !fatal_signal_pending(current));
> + BUG_ON(test_thread_flag(TIF_MEMDIE) && !fatal_signal_pending(current) && !task_will_free_mem(current));
>
> . I think that task_will_free_mem() is always false because this BUG_ON()
> is located before "exit_signals(tsk); /* sets PF_EXITING */" line.
I haven't checked where exactly you added the BUG_ON, I was merely
comenting on the possibility that TIF_MEMDIE is set without sending
SIGKILL.
Now that I am looking at your BUG_ON more closely I am wondering whether
it makes sense at all. The fatal signal has been dequeued in get_signal
before we call into do_group_exit AFAICS.
[...]
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: linux-4.4-rc1: TIF_MEMDIE without SIGKILL pending?
2015-11-23 11:33 ` Michal Hocko
@ 2015-11-23 12:38 ` Tetsuo Handa
0 siblings, 0 replies; 5+ messages in thread
From: Tetsuo Handa @ 2015-11-23 12:38 UTC (permalink / raw)
To: mhocko; +Cc: akpm, oleg, linux-mm
Michal Hocko wrote:
> I haven't checked where exactly you added the BUG_ON, I was merely
> comenting on the possibility that TIF_MEMDIE is set without sending
> SIGKILL.
>
> Now that I am looking at your BUG_ON more closely I am wondering whether
> it makes sense at all. The fatal signal has been dequeued in get_signal
> before we call into do_group_exit AFAICS.
Indeed, it makes no sense at all.
Making below change made expected output.
MemAlloc: oom-tester4(11306) uninterruptible exiting victim
----------
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 01127b8..8c8fb6d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3289,8 +3289,9 @@ static int kmallocwd(void *unused)
snprintf(buf, sizeof(buf),
" gfp=0x%x order=%u delay=%lu", memalloc.gfp,
memalloc.order, now - memalloc.start);
- pr_warn("MemAlloc: %s(%u)%s%s%s%s\n", p->comm, p->pid, buf,
+ pr_warn("MemAlloc: %s(%u)%s%s%s%s%s\n", p->comm, p->pid, buf,
(type & 8) ? " uninterruptible" : "",
+ (p->flags & PF_EXITING) ? " exiting" : "",
(type & 2) ? " dying" : "",
(type & 1) ? " victim" : "");
touch_nmi_watchdog();
----------
I'll make V3 of kmallocwd. Thank you.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-11-23 12:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-22 12:13 linux-4.4-rc1: TIF_MEMDIE without SIGKILL pending? Tetsuo Handa
2015-11-23 8:30 ` Michal Hocko
2015-11-23 11:06 ` Tetsuo Handa
2015-11-23 11:33 ` Michal Hocko
2015-11-23 12:38 ` Tetsuo Handa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox