Re: [RFC] oom, memcg: handle sysctl oom_kill_allocating

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [RFC] oom, memcg: handle sysctl oom_kill_allocating_task while memcg oom happening
       [not found] <5396ED66.7090401@1h.com>
@ 2014-06-10 11:52 ` Michal Hocko
  2014-06-10 14:42   ` Marian Marinov
  0 siblings, 1 reply; 3+ messages in thread
From: Michal Hocko @ 2014-06-10 11:52 UTC (permalink / raw)
  To: Marian Marinov
  Cc: linux-kernel, Johannes Weiner, KOSAKI Motohiro,
	KAMEZAWA Hiroyuki, Tejun Heo, linux-mm

[More people to CC]
On Tue 10-06-14 14:35:02, Marian Marinov wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello,

Hi,

> a while back in 2012 there was a request for this functionality.
>   oom, memcg: handle sysctl oom_kill_allocating_task while memcg oom
>   happening
>
> This is the thread: https://lkml.org/lkml/2012/10/16/168
>
> Now we run a several machines with around 10k processes on each
> machine, using containers.
>
> Regularly we see OOM from within a container that causes performance
> degradation.

What kind of performance degradation and which parts of the system are
affected?

memcg oom killer happens outside of any locks currently so the only
bottleneck I can see is the per-cgroup container which iterates all
tasks in the group. Is this what is going on here?

> We are running 3.12.20 with the following OOM configuration and memcg
> oom enabled:
> 
> vm.oom_dump_tasks = 0
> vm.oom_kill_allocating_task = 1
> vm.panic_on_oom = 0
> 
> When OOM occurs we see very high numbers for the loadavg and the
> overall responsiveness of the machine degrades.

What is the system waiting for?

> During these OOM states the load of the machine gradualy increases
> from 25 up to 120 in the interval of 10minutes.
>
> Once we manually bring down the memory usage of a container(killing
> some tasks) the load drops down to 25 within 5 to 7 minutes.

So the OOM killer is not able to find a victim to kill?

> I read the whole thread from 2012 but I do not see the expected
> behavior that is described by the people that commented the issue.

Why do you think that killing the allocating task would be helpful in
your case?

> In this case, with real usage for this patch, would it be considered
> for inclusion?

I would still prefer to fix the real issue which is not clear from your
description yet.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] oom, memcg: handle sysctl oom_kill_allocating_task while memcg oom happening
  2014-06-10 11:52 ` [RFC] oom, memcg: handle sysctl oom_kill_allocating_task while memcg oom happening Michal Hocko
@ 2014-06-10 14:42   ` Marian Marinov
  2014-06-10 22:37     ` David Rientjes
  0 siblings, 1 reply; 3+ messages in thread
From: Marian Marinov @ 2014-06-10 14:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Johannes Weiner, KOSAKI Motohiro,
	KAMEZAWA Hiroyuki, Tejun Heo, linux-mm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/10/2014 02:52 PM, Michal Hocko wrote:
> [More people to CC] On Tue 10-06-14 14:35:02, Marian Marinov wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> 
>> Hello,
> 
> Hi,
> 
>> a while back in 2012 there was a request for this functionality. oom, memcg: handle sysctl
>> oom_kill_allocating_task while memcg oom happening
>> 
>> This is the thread: https://lkml.org/lkml/2012/10/16/168
>> 
>> Now we run a several machines with around 10k processes on each machine, using containers.
>> 
>> Regularly we see OOM from within a container that causes performance degradation.
> 
> What kind of performance degradation and which parts of the system are affected?

The responsiveness to SSH terminals and DB queries on the host machine is significantly slowed.
I'm still unsure what exactly to measure.

> 
> memcg oom killer happens outside of any locks currently so the only bottleneck I can see is the per-cgroup
> container which iterates all tasks in the group. Is this what is going on here?

When the container has 1000s of processes it seams that there is a problem. But I'm not sure, and will be happy to put
some diagnostic lines of code there.

> 
>> We are running 3.12.20 with the following OOM configuration and memcg oom enabled:
>> 
>> vm.oom_dump_tasks = 0 vm.oom_kill_allocating_task = 1 vm.panic_on_oom = 0
>> 
>> When OOM occurs we see very high numbers for the loadavg and the overall responsiveness of the machine degrades.
> 
> What is the system waiting for?

I don't know, since I was not the one to actually handle the case. However, my guys are instructed to collect iostat
and vmstat information from the machines, the next time this happens.

> 
>> During these OOM states the load of the machine gradualy increases from 25 up to 120 in the interval of
>> 10minutes.
>> 
>> Once we manually bring down the memory usage of a container(killing some tasks) the load drops down to 25 within
>> 5 to 7 minutes.
> 
> So the OOM killer is not able to find a victim to kill?

It was constantly killing tasks. 245 oom invocations in less then 6min for that particular cgroup. With top 61 oom
invocations in one minute.

It was killing... In that particular case, the problem was a web server that was under attack. New php processes was
spawned very often and instead of killing each newly created process(which is allocating memory) the kernel tries to
find more suitable task. Which in this case was not desired.

> 
>> I read the whole thread from 2012 but I do not see the expected behavior that is described by the people that
>> commented the issue.
> 
> Why do you think that killing the allocating task would be helpful in your case?

As mentioned above, the usual case with the hosting companies is that, the allocating task should not be allowed to
run. So killing it is the proper solution there.

Essentially we solved the issue by setting a process limit to that particular cgroup using the task-limit patches of
Dwight Engen.

> 
>> In this case, with real usage for this patch, would it be considered for inclusion?
> 
> I would still prefer to fix the real issue which is not clear from your description yet.

I would love to have a better way to solve the issue.

Marian



- -- 
Marian Marinov
Founder & CEO of 1H Ltd.
Jabber/GTalk: hackman@jabber.org
ICQ: 7556201
Mobile: +359 886 660 270
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlOXGUsACgkQ4mt9JeIbjJR1YACgysnzxg9IPzcwQRmBZVVV6cp3
N4YAoKygaqbqcuz6dkmtMfI/pu2Br5H/
=3pZj
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] oom, memcg: handle sysctl oom_kill_allocating_task while memcg oom happening
  2014-06-10 14:42   ` Marian Marinov
@ 2014-06-10 22:37     ` David Rientjes
  0 siblings, 0 replies; 3+ messages in thread
From: David Rientjes @ 2014-06-10 22:37 UTC (permalink / raw)
  To: Marian Marinov
  Cc: Michal Hocko, Johannes Weiner, KOSAKI Motohiro,
	KAMEZAWA Hiroyuki, Tejun Heo, linux-kernel, linux-mm

On Tue, 10 Jun 2014, Marian Marinov wrote:

> >> During these OOM states the load of the machine gradualy increases from 25 up to 120 in the interval of
> >> 10minutes.
> >> 
> >> Once we manually bring down the memory usage of a container(killing some tasks) the load drops down to 25 within
> >> 5 to 7 minutes.
> > 
> > So the OOM killer is not able to find a victim to kill?
> 
> It was constantly killing tasks. 245 oom invocations in less then 6min for that particular cgroup. With top 61 oom
> invocations in one minute.
> 
> It was killing... In that particular case, the problem was a web server that was under attack. New php processes was
> spawned very often and instead of killing each newly created process(which is allocating memory) the kernel tries to
> find more suitable task. Which in this case was not desired.
> 

This is a forkbomb problem, then, that causes processes to constantly be 
reforked and the memcg go out of memory immediately after another process 
has been killed for the same reason.

Enabling oom_kill_allocating_task (or its identical behavior targeted for 
a specific memcg or memcg hierarchy) would result in random kills of your 
processes, whichever process is the unlucky one to be allocating at the 
time would get killed as long as it wasn't oom disabled.  The only benefit 
in this case would be that the oom killer wouldn't need to iterate 
processes, but there's nothing to suggest that your problem -- the fact 
that you're under a forkbomb -- would be fixed.

If, once the oom killer has killed something, another process is 
immediately forked, charges the memory that was just freed by the oom 
killer, and hits the limit again, then that's outside the scope of the oom 
killer.

> >> I read the whole thread from 2012 but I do not see the expected behavior that is described by the people that
> >> commented the issue.
> > 
> > Why do you think that killing the allocating task would be helpful in your case?
> 
> As mentioned above, the usual case with the hosting companies is that, the allocating task should not be allowed to
> run. So killing it is the proper solution there.
> 

That's not what the oom killer does: it finds the process that is using 
the most amount of memory and is eligible for kill and it is killed.  That 
prevents memory leakers from killing everything else attached to the memcg 
or on the system and results in one process being killed instead of many 
processes.  Userspace can tune the selection of processes with 
/proc/pid/oom_score_adj.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-06-10 22:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5396ED66.7090401@1h.com>
2014-06-10 11:52 ` [RFC] oom, memcg: handle sysctl oom_kill_allocating_task while memcg oom happening Michal Hocko
2014-06-10 14:42   ` Marian Marinov
2014-06-10 22:37     ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox