* Re: Excessive xfs_inode allocations trigger OOM killer
[not found] ` <20160920214612.GJ340@dastard>
@ 2016-09-21 5:45 ` Florian Weimer
2016-09-21 5:45 ` Florian Weimer
[not found] ` <20160921080425.GC10300@dhcp22.suse.cz>
2 siblings, 0 replies; 7+ messages in thread
From: Florian Weimer @ 2016-09-21 5:45 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs, linux-xfs, linux-mm, Michal Hocko
* Dave Chinner:
> [cc Michal, linux-mm@kvack.org]
>
> On Tue, Sep 20, 2016 at 10:56:31PM +0200, Florian Weimer wrote:
>> * Dave Chinner:
>>
>> >> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>> >> 4121208 4121177 99% 0.88K 1030302 4 4121208K xfs_inode
>> >> 986286 985229 99% 0.19K 46966 21 187864K dentry
>> >> 723255 723076 99% 0.10K 18545 39 74180K buffer_head
>> >> 270263 269251 99% 0.56K 38609 7 154436K radix_tree_node
>> >> 140310 67409 48% 0.38K 14031 10 56124K mnt_cache
>> >
>> > That's not odd at all. It means your workload is visiting millions
>> > on inodes in your filesystem between serious memory pressure events.
>>
>> Okay.
>>
>> >> (I have attached the /proc/meminfo contents in case it offers further
>> >> clues.)
>> >>
>> >> Confronted with large memory allocations (from “make -j12” and
>> >> compiling GCC, so perhaps ~8 GiB of memory), the OOM killer kicks in
>> >> and kills some random process. I would have expected that some
>> >> xfs_inodes are freed instead.
>> >
>> > The oom killer is unreliable and often behaves very badly, and
>> > that's typicaly not an XFS problem.
>> >
>> > What is the full output off the oom killer invocations from dmesg?
>>
>> I've attached the dmesg output (two events).
>
> Copied from the traces you attached (I've left them intact below for
> reference):
>
>> [51669.515086] make invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
>> [51669.515092] CPU: 1 PID: 1202 Comm: make Tainted: G I 4.7.1fw #1
>> [51669.515093] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011
>> [51669.515095] 0000000000000000 ffffffff812a7d39 0000000000000000 0000000000000000
>> [51669.515098] ffffffff8114e4da ffff880018707d98 0000000000000000 000000000066ca81
>> [51669.515100] ffffffff8170e88d ffffffff810fe69e ffff88033fc38728 0000000200000006
>> [51669.515102] Call Trace:
>> [51669.515108] [<ffffffff812a7d39>] ? dump_stack+0x46/0x5d
>> [51669.515113] [<ffffffff8114e4da>] ? dump_header.isra.12+0x51/0x176
>> [51669.515116] [<ffffffff810fe69e>] ? oom_kill_process+0x32e/0x420
>> [51669.515119] [<ffffffff811003a0>] ? page_alloc_cpu_notify+0x40/0x40
>> [51669.515120] [<ffffffff810fdcdc>] ? find_lock_task_mm+0x2c/0x70
>> [51669.515122] [<ffffffff810fea6d>] ? out_of_memory+0x28d/0x2d0
>> [51669.515125] [<ffffffff81103137>] ? __alloc_pages_nodemask+0xb97/0xc90
>> [51669.515128] [<ffffffff81076d9c>] ? copy_process.part.54+0xec/0x17a0
>> [51669.515131] [<ffffffff81123318>] ? handle_mm_fault+0xaa8/0x1900
>> [51669.515133] [<ffffffff81078614>] ? _do_fork+0xd4/0x320
>> [51669.515137] [<ffffffff81084ecc>] ? __set_current_blocked+0x2c/0x40
>> [51669.515140] [<ffffffff810013ce>] ? do_syscall_64+0x3e/0x80
>> [51669.515144] [<ffffffff8151433c>] ? entry_SYSCALL64_slow_path+0x25/0x25
> .....
>> [51669.515194] DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
>> [51669.515202] DMA32: 45619*4kB (UME) 73*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 183060kB
>> [51669.515209] Normal: 39979*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 159916kB
> .....
>
> Alright, that's what I suspected. high order allocation for a new
> kernel stack and memory is so fragmented that a contiguous
> allocation fails. Really, this is a memory reclaim issue, not an XFS
> issue. There is lots of reclaimable memory available, but memory
> reclaim is:
>
> a) not trying hard enough to reclaim reclaimable memory; and
> b) not waiting for memory compaction to rebuild contiguous
> memory regions for high order allocations.
>
> Instead, it is declaring OOM and kicking the killer to free memory
> held busy userspace.
Thanks.
I have put the full kernel config here:
<http://static.enyo.de/fw/volatile/config-4.7.1fw>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Excessive xfs_inode allocations trigger OOM killer
[not found] ` <20160920214612.GJ340@dastard>
2016-09-21 5:45 ` Excessive xfs_inode allocations trigger OOM killer Florian Weimer
@ 2016-09-21 5:45 ` Florian Weimer
[not found] ` <20160921080425.GC10300@dhcp22.suse.cz>
2 siblings, 0 replies; 7+ messages in thread
From: Florian Weimer @ 2016-09-21 5:45 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs, linux-xfs, linux-mm, Michal Hocko
* Dave Chinner:
> [cc Michal, linux-mm@kvack.org]
>
> On Tue, Sep 20, 2016 at 10:56:31PM +0200, Florian Weimer wrote:
>> * Dave Chinner:
>>
>> >> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>> >> 4121208 4121177 99% 0.88K 1030302 4 4121208K xfs_inode
>> >> 986286 985229 99% 0.19K 46966 21 187864K dentry
>> >> 723255 723076 99% 0.10K 18545 39 74180K buffer_head
>> >> 270263 269251 99% 0.56K 38609 7 154436K radix_tree_node
>> >> 140310 67409 48% 0.38K 14031 10 56124K mnt_cache
>> >
>> > That's not odd at all. It means your workload is visiting millions
>> > on inodes in your filesystem between serious memory pressure events.
>>
>> Okay.
>>
>> >> (I have attached the /proc/meminfo contents in case it offers further
>> >> clues.)
>> >>
>> >> Confronted with large memory allocations (from “make -j12” and
>> >> compiling GCC, so perhaps ~8 GiB of memory), the OOM killer kicks in
>> >> and kills some random process. I would have expected that some
>> >> xfs_inodes are freed instead.
>> >
>> > The oom killer is unreliable and often behaves very badly, and
>> > that's typicaly not an XFS problem.
>> >
>> > What is the full output off the oom killer invocations from dmesg?
>>
>> I've attached the dmesg output (two events).
>
> Copied from the traces you attached (I've left them intact below for
> reference):
>
>> [51669.515086] make invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
>> [51669.515092] CPU: 1 PID: 1202 Comm: make Tainted: G I 4.7.1fw #1
>> [51669.515093] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011
>> [51669.515095] 0000000000000000 ffffffff812a7d39 0000000000000000 0000000000000000
>> [51669.515098] ffffffff8114e4da ffff880018707d98 0000000000000000 000000000066ca81
>> [51669.515100] ffffffff8170e88d ffffffff810fe69e ffff88033fc38728 0000000200000006
>> [51669.515102] Call Trace:
>> [51669.515108] [<ffffffff812a7d39>] ? dump_stack+0x46/0x5d
>> [51669.515113] [<ffffffff8114e4da>] ? dump_header.isra.12+0x51/0x176
>> [51669.515116] [<ffffffff810fe69e>] ? oom_kill_process+0x32e/0x420
>> [51669.515119] [<ffffffff811003a0>] ? page_alloc_cpu_notify+0x40/0x40
>> [51669.515120] [<ffffffff810fdcdc>] ? find_lock_task_mm+0x2c/0x70
>> [51669.515122] [<ffffffff810fea6d>] ? out_of_memory+0x28d/0x2d0
>> [51669.515125] [<ffffffff81103137>] ? __alloc_pages_nodemask+0xb97/0xc90
>> [51669.515128] [<ffffffff81076d9c>] ? copy_process.part.54+0xec/0x17a0
>> [51669.515131] [<ffffffff81123318>] ? handle_mm_fault+0xaa8/0x1900
>> [51669.515133] [<ffffffff81078614>] ? _do_fork+0xd4/0x320
>> [51669.515137] [<ffffffff81084ecc>] ? __set_current_blocked+0x2c/0x40
>> [51669.515140] [<ffffffff810013ce>] ? do_syscall_64+0x3e/0x80
>> [51669.515144] [<ffffffff8151433c>] ? entry_SYSCALL64_slow_path+0x25/0x25
> .....
>> [51669.515194] DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
>> [51669.515202] DMA32: 45619*4kB (UME) 73*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 183060kB
>> [51669.515209] Normal: 39979*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 159916kB
> .....
>
> Alright, that's what I suspected. high order allocation for a new
> kernel stack and memory is so fragmented that a contiguous
> allocation fails. Really, this is a memory reclaim issue, not an XFS
> issue. There is lots of reclaimable memory available, but memory
> reclaim is:
>
> a) not trying hard enough to reclaim reclaimable memory; and
> b) not waiting for memory compaction to rebuild contiguous
> memory regions for high order allocations.
>
> Instead, it is declaring OOM and kicking the killer to free memory
> held busy userspace.
Thanks.
I have put the full kernel config here:
<http://static.enyo.de/fw/volatile/config-4.7.1fw>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Excessive xfs_inode allocations trigger OOM killer
[not found] ` <20160921080425.GC10300@dhcp22.suse.cz>
@ 2016-09-21 8:06 ` Michal Hocko
2016-09-26 17:33 ` Florian Weimer
1 sibling, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2016-09-21 8:06 UTC (permalink / raw)
To: Dave Chinner; +Cc: Florian Weimer, xfs, linux-xfs, linux-mm
[fixup linux-mm address]
On Wed 21-09-16 10:04:25, Michal Hocko wrote:
> On Wed 21-09-16 07:46:12, Dave Chinner wrote:
> > [cc Michal, linux-mm@kvack.org]
> >
> > On Tue, Sep 20, 2016 at 10:56:31PM +0200, Florian Weimer wrote:
> [...]
> > > [51669.515086] make invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
> > > [51669.515092] CPU: 1 PID: 1202 Comm: make Tainted: G I 4.7.1fw #1
> > > [51669.515093] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011
> > > [51669.515095] 0000000000000000 ffffffff812a7d39 0000000000000000 0000000000000000
> > > [51669.515098] ffffffff8114e4da ffff880018707d98 0000000000000000 000000000066ca81
> > > [51669.515100] ffffffff8170e88d ffffffff810fe69e ffff88033fc38728 0000000200000006
> > > [51669.515102] Call Trace:
> > > [51669.515108] [<ffffffff812a7d39>] ? dump_stack+0x46/0x5d
> > > [51669.515113] [<ffffffff8114e4da>] ? dump_header.isra.12+0x51/0x176
> > > [51669.515116] [<ffffffff810fe69e>] ? oom_kill_process+0x32e/0x420
> > > [51669.515119] [<ffffffff811003a0>] ? page_alloc_cpu_notify+0x40/0x40
> > > [51669.515120] [<ffffffff810fdcdc>] ? find_lock_task_mm+0x2c/0x70
> > > [51669.515122] [<ffffffff810fea6d>] ? out_of_memory+0x28d/0x2d0
> > > [51669.515125] [<ffffffff81103137>] ? __alloc_pages_nodemask+0xb97/0xc90
> > > [51669.515128] [<ffffffff81076d9c>] ? copy_process.part.54+0xec/0x17a0
> > > [51669.515131] [<ffffffff81123318>] ? handle_mm_fault+0xaa8/0x1900
> > > [51669.515133] [<ffffffff81078614>] ? _do_fork+0xd4/0x320
> > > [51669.515137] [<ffffffff81084ecc>] ? __set_current_blocked+0x2c/0x40
> > > [51669.515140] [<ffffffff810013ce>] ? do_syscall_64+0x3e/0x80
> > > [51669.515144] [<ffffffff8151433c>] ? entry_SYSCALL64_slow_path+0x25/0x25
> > .....
> > > [51669.515194] DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
> > > [51669.515202] DMA32: 45619*4kB (UME) 73*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 183060kB
> > > [51669.515209] Normal: 39979*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 159916kB
> > .....
> >
> > Alright, that's what I suspected. high order allocation for a new
> > kernel stack and memory is so fragmented that a contiguous
> > allocation fails. Really, this is a memory reclaim issue, not an XFS
> > issue. There is lots of reclaimable memory available, but memory
> > reclaim is:
> >
> > a) not trying hard enough to reclaim reclaimable memory; and
> > b) not waiting for memory compaction to rebuild contiguous
> > memory regions for high order allocations.
> >
> > Instead, it is declaring OOM and kicking the killer to free memory
> > held busy userspace.
>
> Yes this was the case with 4.7 kernel. There is a workaround sitting in
> the linus tree 6b4e3181d7bd ("mm, oom: prevent premature OOM killer
> invocation for high order request") which should get to stable
> eventually. More approapriate fix is currently in the linux-next.
>
> Testing the same workload with linux-next would be very helpful.
>
> Thanks!
>
> --
> Michal Hocko
> SUSE Labs
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Excessive xfs_inode allocations trigger OOM killer
[not found] ` <20160921080425.GC10300@dhcp22.suse.cz>
2016-09-21 8:06 ` Michal Hocko
@ 2016-09-26 17:33 ` Florian Weimer
2016-09-26 20:02 ` Michal Hocko
1 sibling, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2016-09-26 17:33 UTC (permalink / raw)
To: Michal Hocko; +Cc: Dave Chinner, xfs, linux-xfs, linux-mm
* Michal Hocko:
> On Wed 21-09-16 07:46:12, Dave Chinner wrote:
>> [cc Michal, linux-mm@kvack.org]
>>
>> On Tue, Sep 20, 2016 at 10:56:31PM +0200, Florian Weimer wrote:
> [...]
>> > [51669.515086] make invoked oom-killer:
>> > gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2,
>> > oom_score_adj=0
>> > [51669.515092] CPU: 1 PID: 1202 Comm: make Tainted: G I 4.7.1fw #1
>> > [51669.515093] Hardware name: System manufacturer System Product
>> > Name/P6X58D-E, BIOS 0701 05/10/2011
>> > [51669.515095] 0000000000000000 ffffffff812a7d39 0000000000000000
>> > 0000000000000000
>> > [51669.515098] ffffffff8114e4da ffff880018707d98 0000000000000000
>> > 000000000066ca81
>> > [51669.515100] ffffffff8170e88d ffffffff810fe69e ffff88033fc38728
>> > 0000000200000006
>> > [51669.515102] Call Trace:
>> > [51669.515108] [<ffffffff812a7d39>] ? dump_stack+0x46/0x5d
>> > [51669.515113] [<ffffffff8114e4da>] ? dump_header.isra.12+0x51/0x176
>> > [51669.515116] [<ffffffff810fe69e>] ? oom_kill_process+0x32e/0x420
>> > [51669.515119] [<ffffffff811003a0>] ? page_alloc_cpu_notify+0x40/0x40
>> > [51669.515120] [<ffffffff810fdcdc>] ? find_lock_task_mm+0x2c/0x70
>> > [51669.515122] [<ffffffff810fea6d>] ? out_of_memory+0x28d/0x2d0
>> > [51669.515125] [<ffffffff81103137>] ? __alloc_pages_nodemask+0xb97/0xc90
>> > [51669.515128] [<ffffffff81076d9c>] ? copy_process.part.54+0xec/0x17a0
>> > [51669.515131] [<ffffffff81123318>] ? handle_mm_fault+0xaa8/0x1900
>> > [51669.515133] [<ffffffff81078614>] ? _do_fork+0xd4/0x320
>> > [51669.515137] [<ffffffff81084ecc>] ? __set_current_blocked+0x2c/0x40
>> > [51669.515140] [<ffffffff810013ce>] ? do_syscall_64+0x3e/0x80
>> > [51669.515144] [<ffffffff8151433c>] ? entry_SYSCALL64_slow_path+0x25/0x25
>> .....
>> > [51669.515194] DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB
>> > (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M)
>> > 3*4096kB (M) = 15900kB
>> > [51669.515202] DMA32: 45619*4kB (UME) 73*8kB (UM) 0*16kB 0*32kB
>> > 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
>> > 183060kB
>> > [51669.515209] Normal: 39979*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB
>> > 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 159916kB
>> .....
>>
>> Alright, that's what I suspected. high order allocation for a new
>> kernel stack and memory is so fragmented that a contiguous
>> allocation fails. Really, this is a memory reclaim issue, not an XFS
>> issue. There is lots of reclaimable memory available, but memory
>> reclaim is:
>>
>> a) not trying hard enough to reclaim reclaimable memory; and
>> b) not waiting for memory compaction to rebuild contiguous
>> memory regions for high order allocations.
>>
>> Instead, it is declaring OOM and kicking the killer to free memory
>> held busy userspace.
>
> Yes this was the case with 4.7 kernel. There is a workaround sitting in
> the linus tree 6b4e3181d7bd ("mm, oom: prevent premature OOM killer
> invocation for high order request") which should get to stable
> eventually. More approapriate fix is currently in the linux-next.
>
> Testing the same workload with linux-next would be very helpful.
I'm not sure if I can reproduce this issue in a sufficiently reliable
way, but I can try. (I still have not found the process which causes
the xfs_inode allocations go up.)
Is linux-next still the tree to test?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Excessive xfs_inode allocations trigger OOM killer
2016-09-26 17:33 ` Florian Weimer
@ 2016-09-26 20:02 ` Michal Hocko
2016-10-03 17:35 ` Florian Weimer
0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2016-09-26 20:02 UTC (permalink / raw)
To: Florian Weimer; +Cc: Dave Chinner, xfs, linux-xfs, linux-mm
On Mon 26-09-16 19:33:09, Florian Weimer wrote:
> * Michal Hocko:
>
> > On Wed 21-09-16 07:46:12, Dave Chinner wrote:
> >> [cc Michal, linux-mm@kvack.org]
> >>
> >> On Tue, Sep 20, 2016 at 10:56:31PM +0200, Florian Weimer wrote:
> > [...]
> >> > [51669.515086] make invoked oom-killer:
> >> > gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2,
> >> > oom_score_adj=0
> >> > [51669.515092] CPU: 1 PID: 1202 Comm: make Tainted: G I 4.7.1fw #1
> >> > [51669.515093] Hardware name: System manufacturer System Product
> >> > Name/P6X58D-E, BIOS 0701 05/10/2011
> >> > [51669.515095] 0000000000000000 ffffffff812a7d39 0000000000000000
> >> > 0000000000000000
> >> > [51669.515098] ffffffff8114e4da ffff880018707d98 0000000000000000
> >> > 000000000066ca81
> >> > [51669.515100] ffffffff8170e88d ffffffff810fe69e ffff88033fc38728
> >> > 0000000200000006
> >> > [51669.515102] Call Trace:
> >> > [51669.515108] [<ffffffff812a7d39>] ? dump_stack+0x46/0x5d
> >> > [51669.515113] [<ffffffff8114e4da>] ? dump_header.isra.12+0x51/0x176
> >> > [51669.515116] [<ffffffff810fe69e>] ? oom_kill_process+0x32e/0x420
> >> > [51669.515119] [<ffffffff811003a0>] ? page_alloc_cpu_notify+0x40/0x40
> >> > [51669.515120] [<ffffffff810fdcdc>] ? find_lock_task_mm+0x2c/0x70
> >> > [51669.515122] [<ffffffff810fea6d>] ? out_of_memory+0x28d/0x2d0
> >> > [51669.515125] [<ffffffff81103137>] ? __alloc_pages_nodemask+0xb97/0xc90
> >> > [51669.515128] [<ffffffff81076d9c>] ? copy_process.part.54+0xec/0x17a0
> >> > [51669.515131] [<ffffffff81123318>] ? handle_mm_fault+0xaa8/0x1900
> >> > [51669.515133] [<ffffffff81078614>] ? _do_fork+0xd4/0x320
> >> > [51669.515137] [<ffffffff81084ecc>] ? __set_current_blocked+0x2c/0x40
> >> > [51669.515140] [<ffffffff810013ce>] ? do_syscall_64+0x3e/0x80
> >> > [51669.515144] [<ffffffff8151433c>] ? entry_SYSCALL64_slow_path+0x25/0x25
> >> .....
> >> > [51669.515194] DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB
> >> > (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M)
> >> > 3*4096kB (M) = 15900kB
> >> > [51669.515202] DMA32: 45619*4kB (UME) 73*8kB (UM) 0*16kB 0*32kB
> >> > 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
> >> > 183060kB
> >> > [51669.515209] Normal: 39979*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB
> >> > 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 159916kB
> >> .....
> >>
> >> Alright, that's what I suspected. high order allocation for a new
> >> kernel stack and memory is so fragmented that a contiguous
> >> allocation fails. Really, this is a memory reclaim issue, not an XFS
> >> issue. There is lots of reclaimable memory available, but memory
> >> reclaim is:
> >>
> >> a) not trying hard enough to reclaim reclaimable memory; and
> >> b) not waiting for memory compaction to rebuild contiguous
> >> memory regions for high order allocations.
> >>
> >> Instead, it is declaring OOM and kicking the killer to free memory
> >> held busy userspace.
> >
> > Yes this was the case with 4.7 kernel. There is a workaround sitting in
> > the linus tree 6b4e3181d7bd ("mm, oom: prevent premature OOM killer
> > invocation for high order request") which should get to stable
> > eventually. More approapriate fix is currently in the linux-next.
> >
> > Testing the same workload with linux-next would be very helpful.
>
> I'm not sure if I can reproduce this issue in a sufficiently reliable
> way, but I can try. (I still have not found the process which causes
> the xfs_inode allocations go up.)
>
> Is linux-next still the tree to test?
Yes it contains all the compaction related fixes which we believe to
address recent higher order OOMs.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Excessive xfs_inode allocations trigger OOM killer
2016-09-26 20:02 ` Michal Hocko
@ 2016-10-03 17:35 ` Florian Weimer
2016-10-03 17:54 ` Michal Hocko
0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2016-10-03 17:35 UTC (permalink / raw)
To: Michal Hocko; +Cc: Dave Chinner, xfs, linux-xfs, linux-mm
* Michal Hocko:
>> I'm not sure if I can reproduce this issue in a sufficiently reliable
>> way, but I can try. (I still have not found the process which causes
>> the xfs_inode allocations go up.)
>>
>> Is linux-next still the tree to test?
>
> Yes it contains all the compaction related fixes which we believe to
> address recent higher order OOMs.
I tried 4.7.5 instead. I could not reproduce the issue so far there.
Thanks to whoever fixed it. :)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Excessive xfs_inode allocations trigger OOM killer
2016-10-03 17:35 ` Florian Weimer
@ 2016-10-03 17:54 ` Michal Hocko
0 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2016-10-03 17:54 UTC (permalink / raw)
To: Florian Weimer; +Cc: Dave Chinner, xfs, linux-xfs, linux-mm
On Mon 03-10-16 19:35:18, Florian Weimer wrote:
> * Michal Hocko:
>
> >> I'm not sure if I can reproduce this issue in a sufficiently reliable
> >> way, but I can try. (I still have not found the process which causes
> >> the xfs_inode allocations go up.)
> >>
> >> Is linux-next still the tree to test?
> >
> > Yes it contains all the compaction related fixes which we believe to
> > address recent higher order OOMs.
>
> I tried 4.7.5 instead. I could not reproduce the issue so far there.
> Thanks to whoever fixed it. :)
The 4.7 stable tree contains a workaround rather than the full fix we
would like to have in 4.9. So if you can then testing the current
linux-next would be really appreciated.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-10-03 17:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <87a8f2pd2d.fsf@mid.deneb.enyo.de>
[not found] ` <20160920203039.GI340@dastard>
[not found] ` <87mvj2mgsg.fsf@mid.deneb.enyo.de>
[not found] ` <20160920214612.GJ340@dastard>
2016-09-21 5:45 ` Excessive xfs_inode allocations trigger OOM killer Florian Weimer
2016-09-21 5:45 ` Florian Weimer
[not found] ` <20160921080425.GC10300@dhcp22.suse.cz>
2016-09-21 8:06 ` Michal Hocko
2016-09-26 17:33 ` Florian Weimer
2016-09-26 20:02 ` Michal Hocko
2016-10-03 17:35 ` Florian Weimer
2016-10-03 17:54 ` Michal Hocko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox