From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Michal Hocko <mhocko@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, Kirill Tkhai <ktkhai@virtuozzo.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] memcg: killed threads should not invoke memcg OOM killer
Date: Sun, 6 Jan 2019 15:02:24 +0900 [thread overview]
Message-ID: <f6d97ad3-ab04-f5e2-4822-96eac6ab45da@i-love.sakura.ne.jp> (raw)
In-Reply-To: <1545819215-10892-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
Michal and Johannes, can we please stop this stupid behavior now?
Reproducer:
----------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sched.h>
#include <sys/mman.h>
#define NUMTHREADS 256
#define MMAPSIZE 4 * 10485760
#define STACKSIZE 4096
static int pipe_fd[2] = { EOF, EOF };
static int memory_eater(void *unused)
{
int fd = open("/dev/zero", O_RDONLY);
char *buf = mmap(NULL, MMAPSIZE, PROT_WRITE | PROT_READ,
MAP_ANONYMOUS | MAP_SHARED, EOF, 0);
read(pipe_fd[0], buf, 1);
read(fd, buf, MMAPSIZE);
pause();
return 0;
}
int main(int argc, char *argv[])
{
int i;
char *stack;
FILE *fp;
const unsigned long size = 1048576 * 200;
mkdir("/sys/fs/cgroup/memory/test1", 0755);
fp = fopen("/sys/fs/cgroup/memory/test1/memory.limit_in_bytes", "w");
fprintf(fp, "%lu\n", size);
fclose(fp);
fp = fopen("/sys/fs/cgroup/memory/test1/tasks", "w");
fprintf(fp, "%u\n", getpid());
fclose(fp);
if (setgid(-2) || setuid(-2) || pipe(pipe_fd))
return 1;
if (fork() == 0) {
stack = mmap(NULL, STACKSIZE * NUMTHREADS, PROT_WRITE | PROT_READ,
MAP_ANONYMOUS | MAP_SHARED, EOF, 0);
for (i = 0; i < NUMTHREADS; i++)
if (clone(memory_eater, stack + (i + 1) * STACKSIZE,
CLONE_SIGHAND | CLONE_THREAD | CLONE_VM | CLONE_FS | CLONE_FILES, NULL) == -1)
break;
close(pipe_fd[1]);
pause();
}
close(pipe_fd[0]);
for (i = 0; i < NUMTHREADS / 2; i++)
if (fork() == 0) {
close(pipe_fd[1]);
pause();
}
sleep(1);
close(pipe_fd[1]);
pause();
return 0;
}
----------
Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20190106.txt.xz :
----------
[ 79.104729] a.out invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
(...snipped...)
[ 79.237203] memory: usage 204800kB, limit 204800kB, failcnt 2834
[ 79.242176] memory+swap: usage 204800kB, limit 9007199254740988kB, failcnt 0
[ 79.245175] kmem: usage 23420kB, limit 9007199254740988kB, failcnt 0
[ 79.247945] Memory cgroup stats for /test1: cache:177456KB rss:3420KB rss_huge:0KB shmem:177456KB mapped_file:177540KB dirty:0KB writeback:0KB swap:0KB inactive_anon:177676KB active_anon:3612KB inactive_file:0KB active_file:0KB unevictable:0KB
[ 79.256726] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test1,task_memcg=/test1,task=a.out,pid=8204,uid=-2
[ 79.262470] Memory cgroup out of memory: Kill process 8204 (a.out) score 822 or sacrifice child
[ 79.266901] Killed process 8204 (a.out) total-vm:10491132kB, anon-rss:92kB, file-rss:444kB, shmem-rss:167028kB
[ 79.272974] oom_reaper: reaped process 8447 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:167488kB
[ 79.277733] a.out invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
(...snipped...)
[ 79.386222] memory: usage 204708kB, limit 204800kB, failcnt 2837
[ 79.412519] Killed process 8205 (a.out) total-vm:4348kB, anon-rss:92kB, file-rss:0kB, shmem-rss:0kB
(...snipped...)
[ 79.539042] memory: usage 204600kB, limit 204800kB, failcnt 2838
[ 79.564617] Killed process 8206 (a.out) total-vm:4348kB, anon-rss:92kB, file-rss:0kB, shmem-rss:0kB
(...snipped...)
[ 81.967741] Memory cgroup out of memory: Kill process 8203 (a.out) score 6 or sacrifice child
[ 81.971760] Killed process 8203 (a.out) total-vm:4348kB, anon-rss:92kB, file-rss:1156kB, shmem-rss:0kB
[ 81.977329] a.out invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
(...snipped...)
[ 81.977529] memory: usage 187160kB, limit 204800kB, failcnt 2838
[ 81.977530] memory+swap: usage 187160kB, limit 9007199254740988kB, failcnt 0
[ 81.977531] kmem: usage 8264kB, limit 9007199254740988kB, failcnt 0
[ 81.977532] Memory cgroup stats for /test1: cache:178248KB rss:372KB rss_huge:0KB shmem:178248KB mapped_file:178332KB dirty:0KB writeback:0KB swap:0KB inactive_anon:178568KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[ 81.977545] Out of memory and no killable processes...
(...snipped...)
[ 87.914960] a.out invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
(...snipped...)
[ 88.019110] memory: usage 183472kB, limit 204800kB, failcnt 2838
[ 88.021629] memory+swap: usage 183472kB, limit 9007199254740988kB, failcnt 0
[ 88.024513] kmem: usage 4448kB, limit 9007199254740988kB, failcnt 0
[ 88.027137] Memory cgroup stats for /test1: cache:178512KB rss:372KB rss_huge:0KB shmem:178512KB mapped_file:178464KB dirty:0KB writeback:0KB swap:0KB inactive_anon:178760KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[ 88.036008] Out of memory and no killable processes...
----------
>From 0fb58415770a83d6c40d471e1840f8bc4a35ca83 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 26 Dec 2018 19:13:35 +0900
Subject: [PATCH] memcg: killed threads should not invoke memcg OOM killer
If $N > $M, a single process with $N threads in a memcg group can easily
kill all $M processes in that memcg group, for mem_cgroup_out_of_memory()
does not check if current thread needs to invoke the memcg OOM killer.
T1@P1 |T2...$N@P1|P2...$M |OOM reaper
----------+----------+----------+----------
# all sleeping
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(oom_lock)
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(oom_lock)
out_of_memory()
select_bad_process()
oom_kill_process(P1)
wake_oom_reaper()
oom_reap_task() # ignores P1
mutex_unlock(oom_lock)
out_of_memory()
select_bad_process(P2...M)
# all killed by T2...N@P1
wake_oom_reaper()
oom_reap_task() # ignores P2...M
mutex_unlock(oom_lock)
We don't need to invoke the memcg OOM killer if current thread was killed
when waiting for oom_lock, for mem_cgroup_oom_synchronize(true) and
memory_max_write() can bail out upon SIGKILL, and try_charge() allows
already killed/exiting threads to make forward progress.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
mm/memcontrol.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b860dd4f7..b0d3bf3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1389,8 +1389,13 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
};
bool ret;
- mutex_lock(&oom_lock);
- ret = out_of_memory(&oc);
+ if (mutex_lock_killable(&oom_lock))
+ return true;
+ /*
+ * A few threads which were not waiting at mutex_lock_killable() can
+ * fail to bail out. Therefore, check again after holding oom_lock.
+ */
+ ret = fatal_signal_pending(current) || out_of_memory(&oc);
mutex_unlock(&oom_lock);
return ret;
}
--
1.8.3.1
WARNING: multiple messages have this Message-ID
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Michal Hocko <mhocko@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, Kirill Tkhai <ktkhai@virtuozzo.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] memcg: killed threads should not invoke memcg OOM killer
Date: Sun, 6 Jan 2019 15:02:24 +0900 [thread overview]
Message-ID: <f6d97ad3-ab04-f5e2-4822-96eac6ab45da@i-love.sakura.ne.jp> (raw)
Message-ID: <20190106060224.yMdrgleHSw0__dqcFL4MyZySCdOe-nUwapvvIOLaA7E@z> (raw)
In-Reply-To: <1545819215-10892-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
Michal and Johannes, can we please stop this stupid behavior now?
Reproducer:
----------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sched.h>
#include <sys/mman.h>
#define NUMTHREADS 256
#define MMAPSIZE 4 * 10485760
#define STACKSIZE 4096
static int pipe_fd[2] = { EOF, EOF };
static int memory_eater(void *unused)
{
int fd = open("/dev/zero", O_RDONLY);
char *buf = mmap(NULL, MMAPSIZE, PROT_WRITE | PROT_READ,
MAP_ANONYMOUS | MAP_SHARED, EOF, 0);
read(pipe_fd[0], buf, 1);
read(fd, buf, MMAPSIZE);
pause();
return 0;
}
int main(int argc, char *argv[])
{
int i;
char *stack;
FILE *fp;
const unsigned long size = 1048576 * 200;
mkdir("/sys/fs/cgroup/memory/test1", 0755);
fp = fopen("/sys/fs/cgroup/memory/test1/memory.limit_in_bytes", "w");
fprintf(fp, "%lu\n", size);
fclose(fp);
fp = fopen("/sys/fs/cgroup/memory/test1/tasks", "w");
fprintf(fp, "%u\n", getpid());
fclose(fp);
if (setgid(-2) || setuid(-2) || pipe(pipe_fd))
return 1;
if (fork() == 0) {
stack = mmap(NULL, STACKSIZE * NUMTHREADS, PROT_WRITE | PROT_READ,
MAP_ANONYMOUS | MAP_SHARED, EOF, 0);
for (i = 0; i < NUMTHREADS; i++)
if (clone(memory_eater, stack + (i + 1) * STACKSIZE,
CLONE_SIGHAND | CLONE_THREAD | CLONE_VM | CLONE_FS | CLONE_FILES, NULL) == -1)
break;
close(pipe_fd[1]);
pause();
}
close(pipe_fd[0]);
for (i = 0; i < NUMTHREADS / 2; i++)
if (fork() == 0) {
close(pipe_fd[1]);
pause();
}
sleep(1);
close(pipe_fd[1]);
pause();
return 0;
}
----------
Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20190106.txt.xz :
----------
[ 79.104729] a.out invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
(...snipped...)
[ 79.237203] memory: usage 204800kB, limit 204800kB, failcnt 2834
[ 79.242176] memory+swap: usage 204800kB, limit 9007199254740988kB, failcnt 0
[ 79.245175] kmem: usage 23420kB, limit 9007199254740988kB, failcnt 0
[ 79.247945] Memory cgroup stats for /test1: cache:177456KB rss:3420KB rss_huge:0KB shmem:177456KB mapped_file:177540KB dirty:0KB writeback:0KB swap:0KB inactive_anon:177676KB active_anon:3612KB inactive_file:0KB active_file:0KB unevictable:0KB
[ 79.256726] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test1,task_memcg=/test1,task=a.out,pid=8204,uid=-2
[ 79.262470] Memory cgroup out of memory: Kill process 8204 (a.out) score 822 or sacrifice child
[ 79.266901] Killed process 8204 (a.out) total-vm:10491132kB, anon-rss:92kB, file-rss:444kB, shmem-rss:167028kB
[ 79.272974] oom_reaper: reaped process 8447 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:167488kB
[ 79.277733] a.out invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
(...snipped...)
[ 79.386222] memory: usage 204708kB, limit 204800kB, failcnt 2837
[ 79.412519] Killed process 8205 (a.out) total-vm:4348kB, anon-rss:92kB, file-rss:0kB, shmem-rss:0kB
(...snipped...)
[ 79.539042] memory: usage 204600kB, limit 204800kB, failcnt 2838
[ 79.564617] Killed process 8206 (a.out) total-vm:4348kB, anon-rss:92kB, file-rss:0kB, shmem-rss:0kB
(...snipped...)
[ 81.967741] Memory cgroup out of memory: Kill process 8203 (a.out) score 6 or sacrifice child
[ 81.971760] Killed process 8203 (a.out) total-vm:4348kB, anon-rss:92kB, file-rss:1156kB, shmem-rss:0kB
[ 81.977329] a.out invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
(...snipped...)
[ 81.977529] memory: usage 187160kB, limit 204800kB, failcnt 2838
[ 81.977530] memory+swap: usage 187160kB, limit 9007199254740988kB, failcnt 0
[ 81.977531] kmem: usage 8264kB, limit 9007199254740988kB, failcnt 0
[ 81.977532] Memory cgroup stats for /test1: cache:178248KB rss:372KB rss_huge:0KB shmem:178248KB mapped_file:178332KB dirty:0KB writeback:0KB swap:0KB inactive_anon:178568KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[ 81.977545] Out of memory and no killable processes...
(...snipped...)
[ 87.914960] a.out invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
(...snipped...)
[ 88.019110] memory: usage 183472kB, limit 204800kB, failcnt 2838
[ 88.021629] memory+swap: usage 183472kB, limit 9007199254740988kB, failcnt 0
[ 88.024513] kmem: usage 4448kB, limit 9007199254740988kB, failcnt 0
[ 88.027137] Memory cgroup stats for /test1: cache:178512KB rss:372KB rss_huge:0KB shmem:178512KB mapped_file:178464KB dirty:0KB writeback:0KB swap:0KB inactive_anon:178760KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[ 88.036008] Out of memory and no killable processes...
----------
From 0fb58415770a83d6c40d471e1840f8bc4a35ca83 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 26 Dec 2018 19:13:35 +0900
Subject: [PATCH] memcg: killed threads should not invoke memcg OOM killer
If $N > $M, a single process with $N threads in a memcg group can easily
kill all $M processes in that memcg group, for mem_cgroup_out_of_memory()
does not check if current thread needs to invoke the memcg OOM killer.
T1@P1 |T2...$N@P1|P2...$M |OOM reaper
----------+----------+----------+----------
# all sleeping
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(oom_lock)
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(oom_lock)
out_of_memory()
select_bad_process()
oom_kill_process(P1)
wake_oom_reaper()
oom_reap_task() # ignores P1
mutex_unlock(oom_lock)
out_of_memory()
select_bad_process(P2...M)
# all killed by T2...N@P1
wake_oom_reaper()
oom_reap_task() # ignores P2...M
mutex_unlock(oom_lock)
We don't need to invoke the memcg OOM killer if current thread was killed
when waiting for oom_lock, for mem_cgroup_oom_synchronize(true) and
memory_max_write() can bail out upon SIGKILL, and try_charge() allows
already killed/exiting threads to make forward progress.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
mm/memcontrol.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b860dd4f7..b0d3bf3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1389,8 +1389,13 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
};
bool ret;
- mutex_lock(&oom_lock);
- ret = out_of_memory(&oc);
+ if (mutex_lock_killable(&oom_lock))
+ return true;
+ /*
+ * A few threads which were not waiting at mutex_lock_killable() can
+ * fail to bail out. Therefore, check again after holding oom_lock.
+ */
+ ret = fatal_signal_pending(current) || out_of_memory(&oc);
mutex_unlock(&oom_lock);
return ret;
}
--
1.8.3.1
next prev parent reply other threads:[~2019-01-06 6:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-26 10:13 Tetsuo Handa
2018-12-28 10:22 ` Kirill Tkhai
2018-12-28 11:00 ` Tetsuo Handa
2018-12-28 11:28 ` Kirill Tkhai
2019-01-06 6:02 ` Tetsuo Handa [this message]
2019-01-06 6:02 ` Tetsuo Handa
2019-01-07 11:41 ` Michal Hocko
2019-01-07 13:07 ` Tetsuo Handa
2019-01-07 13:37 ` Michal Hocko
2019-01-07 14:20 ` Tetsuo Handa
2019-01-09 10:56 ` Tetsuo Handa
2019-01-09 10:56 ` Tetsuo Handa
2019-01-15 10:17 ` [PATCH v2] " Tetsuo Handa
2019-01-15 11:55 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f6d97ad3-ab04-f5e2-4822-96eac6ab45da@i-love.sakura.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=ktkhai@virtuozzo.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox