linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, rientjes@google.com, oleg@redhat.com,
	vdavydov@parallels.com, akpm@linux-foundation.org
Subject: Re: [PATCH 4/6] mm, oom: skip vforked tasks from being selected
Date: Thu, 2 Jun 2016 19:45:00 +0900	[thread overview]
Message-ID: <201606021945.AFH26572.OJMVLFOHFFtOSQ@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20160601142502.GY26601@dhcp22.suse.cz>

Michal Hocko wrote:
> On Wed 01-06-16 23:12:20, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > vforked tasks are not really sitting on any memory. They are sharing
> > > the mm with parent until they exec into a new code. Until then it is
> > > just pinning the address space. OOM killer will kill the vforked task
> > > along with its parent but we still can end up selecting vforked task
> > > when the parent wouldn't be selected. E.g. init doing vfork to launch
> > > a task or vforked being a child of oom unkillable task with an updated
> > > oom_score_adj to be killable.
> > > 
> > > Make sure to not select vforked task as an oom victim by checking
> > > vfork_done in oom_badness.
> > 
> > While vfork()ed task cannot modify userspace memory, can't such task
> > allocate significant amount of kernel memory inside execve() operation
> > (as demonstrated by CVE-2010-4243 64bit_dos.c )?
> > 
> > It is possible that killing vfork()ed task releases a lot of memory,
> > isn't it?
> 
> I am not familiar with the above CVE but doesn't that allocated memory
> come after flush_old_exec (and so mm_release)?

That memory is allocated as of copy_strings() in do_execveat_common().

An example shown below (based on https://grsecurity.net/~spender/exploits/64bit_dos.c )
can consume nearly 50% of 2GB RAM while execve() from vfork(). That is, selecting
vfork()ed task as an OOM victim might release nearly 50% of 2GB RAM.

----------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define NUM_ARGS 8000 /* Nearly 50% of 2GB RAM */

int main(void)
{
        /* Be sure to do "ulimit -s unlimited" before run. */
        char **args;
        char *str;
        int i;
        str = malloc(128 * 1024);
        memset(str, ' ', 128 * 1024 - 1);
        str[128 * 1024 - 1] = '\0';
        args = malloc(NUM_ARGS * sizeof(char *));
        for (i = 0; i < (NUM_ARGS - 1); i++)
                args[i] = str;
        args[i] = NULL;
        if (vfork() == 0) {
                execve("/bin/true", args, NULL);
                _exit(1);
        }
        return 0;
}
----------

# strace -f ./a.out
execve("./a.out", ["./a.out"], [/* 22 vars */]) = 0
brk(0)                                  = 0x2283000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2bdbc81000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=44165, ...}) = 0
mmap(NULL, 44165, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b2bdbc82000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2112384, ...}) = 0
mmap(NULL, 3936832, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2b2bdbe84000
mprotect(0x2b2bdc03b000, 2097152, PROT_NONE) = 0
mmap(0x2b2bdc23b000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b7000) = 0x2b2bdc23b000
mmap(0x2b2bdc241000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2b2bdc241000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2bdbc8d000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2bdbc8e000
arch_prctl(ARCH_SET_FS, 0x2b2bdbc8db80) = 0
mprotect(0x2b2bdc23b000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x2b2bdbe81000, 4096, PROT_READ) = 0
munmap(0x2b2bdbc82000, 44165)           = 0
mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2bdbc90000
brk(0)                                  = 0x2283000
brk(0x22b3000)                          = 0x22b3000
brk(0)                                  = 0x22b3000
vfork(Process 9787 attached
 <unfinished ...>
[pid  9787] execve("/bin/true", ["                                "..., (...snipped...), ...], [/* 0 vars */] <unfinished ...>
[pid  9786] <... vfork resumed> )       = 9787
[pid  9786] exit_group(0)               = ?
[pid  9786] +++ exited with 0 +++
<... execve resumed> )                  = 0
brk(0)                                  = 0x13e2000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2e71a6f000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=44165, ...}) = 0
mmap(NULL, 44165, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b2e71a70000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2112384, ...}) = 0
mmap(NULL, 3936832, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2b2e71c6e000
mprotect(0x2b2e71e25000, 2097152, PROT_NONE) = 0
mmap(0x2b2e72025000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b7000) = 0x2b2e72025000
mmap(0x2b2e7202b000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2b2e7202b000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2e71a7b000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2e71a7c000
arch_prctl(ARCH_SET_FS, 0x2b2e71a7bb80) = 0
mprotect(0x2b2e72025000, 16384, PROT_READ) = 0
mprotect(0x605000, 4096, PROT_READ)     = 0
mprotect(0x2b2e71c6b000, 4096, PROT_READ) = 0
munmap(0x2b2e71a70000, 44165)           = 0
exit_group(0)                           = ?
+++ exited with 0 +++

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-06-02 10:45 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-30 13:05 [PATCH 0/6 -v2] Handle oom bypass more gracefully Michal Hocko
2016-05-30 13:05 ` [PATCH 1/6] proc, oom: drop bogus task_lock and mm check Michal Hocko
2016-05-30 13:49   ` Vladimir Davydov
2016-05-30 17:43   ` Oleg Nesterov
2016-05-31  7:32     ` Michal Hocko
2016-05-31 22:53       ` Oleg Nesterov
2016-06-01  6:53         ` Michal Hocko
2016-06-01 10:41           ` Tetsuo Handa
2016-06-01 10:48             ` Michal Hocko
2016-05-30 13:05 ` [PATCH 2/6] proc, oom_adj: extract oom_score_adj setting into a helper Michal Hocko
2016-05-30 13:05 ` [PATCH 3/6] mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj Michal Hocko
2016-05-31  7:41   ` Michal Hocko
2016-05-30 13:05 ` [PATCH 4/6] mm, oom: skip vforked tasks from being selected Michal Hocko
2016-05-30 19:28   ` Oleg Nesterov
2016-05-31  7:42     ` Michal Hocko
2016-05-31 21:43       ` Oleg Nesterov
2016-06-01  7:09         ` Michal Hocko
2016-06-01 14:12   ` Tetsuo Handa
2016-06-01 14:25     ` Michal Hocko
2016-06-02 10:45       ` Tetsuo Handa [this message]
2016-06-02 11:20         ` Michal Hocko
2016-06-02 11:31           ` Tetsuo Handa
2016-06-02 12:55             ` Michal Hocko
2016-05-30 13:05 ` [PATCH 5/6] mm, oom: kill all tasks sharing the mm Michal Hocko
2016-05-30 18:18   ` Oleg Nesterov
2016-05-31  7:43     ` Michal Hocko
2016-05-31 21:48       ` Oleg Nesterov
2016-05-30 13:05 ` [PATCH 6/6] mm, oom: fortify task_will_free_mem Michal Hocko
2016-05-30 17:35   ` Oleg Nesterov
2016-05-31  7:46     ` Michal Hocko
2016-05-31 22:29       ` Oleg Nesterov
2016-06-01  7:03         ` Michal Hocko
2016-05-31 15:03   ` Tetsuo Handa
2016-05-31 15:10     ` Michal Hocko
2016-05-31 15:29       ` Tetsuo Handa
2016-06-01  7:25         ` Michal Hocko
2016-06-01 12:04           ` Tetsuo Handa
2016-06-01 12:43             ` Michal Hocko
2016-06-02 14:03 ` [PATCH 7/6] mm, oom: task_will_free_mem should skip oom_reaped tasks Michal Hocko
2016-06-02 15:24   ` Tetsuo Handa
2016-06-02 15:50     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201606021945.AFH26572.OJMVLFOHFFtOSQ@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox