Re: [PATCH] mm/page_alloc: Wait for oom_lock before retrying.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@suse.com
Cc: linux-mm@kvack.org
Subject: Re: [PATCH] mm/page_alloc: Wait for oom_lock before retrying.
Date: Thu, 8 Dec 2016 00:29:26 +0900	[thread overview]
Message-ID: <201612080029.IBD55588.OSOFOtHVMLQFFJ@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20161207081555.GB17136@dhcp22.suse.cz>

Michal Hocko wrote:
> On Tue 06-12-16 19:33:59, Tetsuo Handa wrote:
> > If the OOM killer is invoked when many threads are looping inside the
> > page allocator, it is possible that the OOM killer is preempted by other
> > threads.
> 
> Hmm, the only way I can see this would happen is when the task which
> actually manages to take the lock is not invoking the OOM killer for
> whatever reason. Is this what happens in your case? Are you able to
> trigger this reliably?

Regarding http://I-love.SAKURA.ne.jp/tmp/serial-20161206.txt.xz ,
somebody called oom_kill_process() and reached

  pr_err("%s: Kill process %d (%s) score %u or sacrifice child\n",

line but did not reach

  pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",

line within tolerable delay.

It is trivial to make the page allocator being spammed by uncontrolled
warn_alloc() like http://I-love.SAKURA.ne.jp/tmp/serial-20161207-2.txt.xz
and delayed by printk() using a stressor shown below. It seems to me that
most of CPU time is spent for pointless direct reclaim and printk().

----------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <poll.h>

int main(int argc, char *argv[])
{
        static char buffer[4096] = { };
        char *buf = NULL;
        unsigned long size;
        int i;
        for (i = 0; i < 1024; i++) {
                if (fork() == 0) {
                        int fd = open("/proc/self/oom_score_adj", O_WRONLY);
                        write(fd, "1000", 4);
                        close(fd);
                        sleep(1);
                        snprintf(buffer, sizeof(buffer), "/tmp/file.%u", getpid());
                        //snprintf(buffer, sizeof(buffer), "/tmp/file");
                        fd = open(buffer, O_WRONLY | O_CREAT | O_APPEND, 0600);
                        while (write(fd, buffer, sizeof(buffer)) == sizeof(buffer)) {
                                poll(NULL, 0, 10);
                                fsync(fd);
                        }
                        _exit(0);
                }
        }
        for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
                char *cp = realloc(buf, size);
                if (!cp) {
                        size >>= 1;
                        break;
                }
                buf = cp;
        }
        sleep(2);
        /* Will cause OOM due to overcommit */
        for (i = 0; i < size; i += 4096) {
                buf[i] = 0;
                if (i >= 1800 * 1048576) /* This VM has 2048MB RAM */
                        poll(NULL, 0, 10);
        }
        pause();
        return 0;
}
----------

> 
> > As a result, the OOM killer is unable to send SIGKILL to OOM
> > victims and/or wake up the OOM reaper by releasing oom_lock for minutes
> > because other threads consume a lot of CPU time for pointless direct
> > reclaim.
> > 
> > ----------
> > [ 2802.635229] Killed process 7267 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> > [ 2802.644296] oom_reaper: reaped process 7267 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 2802.650237] Out of memory: Kill process 7268 (a.out) score 999 or sacrifice child
> > [ 2803.653052] Killed process 7268 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> > [ 2804.426183] oom_reaper: reaped process 7268 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 2804.432524] Out of memory: Kill process 7269 (a.out) score 999 or sacrifice child
> > [ 2805.349380] a.out: page allocation stalls for 10047ms, order:0, mode:0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO)
> > [ 2805.349383] CPU: 2 PID: 7243 Comm: a.out Not tainted 4.9.0-rc8 #62
> > (...snipped...)
> > [ 3540.977499]           a.out  7269     22716.893359      5272   120
> > [ 3540.977499]         0.000000      1447.601063         0.000000
> > [ 3540.977499]  0 0
> > [ 3540.977500]  /autogroup-155
> > ----------
> > 
> > This patch adds extra sleeps which is effectively equivalent to
> > 
> >   if (mutex_lock_killable(&oom_lock) == 0)
> >     mutex_unlock(&oom_lock);
> > 
> > before retrying allocation at __alloc_pages_may_oom() so that the
> > OOM killer is not preempted by other threads waiting for the OOM
> > killer/reaper to reclaim memory. Since the OOM reaper grabs oom_lock
> > due to commit e2fe14564d3316d1 ("oom_reaper: close race with exiting
> > task"), waking up other threads before the OOM reaper is woken up by
> > directly waiting for oom_lock might not help so much.
> 
> So, why don't you simply s@mutex_trylock@mutex_lock_killable@ then?
> The trylock is simply an optimistic heuristic to retry while the memory
> is being freed. Making this part sync might help for the case you are
> seeing.

May I? Something like below? With patch below, the OOM killer can send
SIGKILL smoothly and printk() can report smoothly (the frequency of
"** XXX printk messages dropped **" messages is significantly reduced).

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2c6d5f6..ee0105b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3075,7 +3075,7 @@ void warn_alloc(gfp_t gfp_mask, const char *fmt, ...)
 	 * Acquire the oom lock.  If that fails, somebody else is
 	 * making progress for us.
 	 */
-	if (!mutex_trylock(&oom_lock)) {
+	if (mutex_lock_killable(&oom_lock)) {
 		*did_some_progress = 1;
 		schedule_timeout_uninterruptible(1);
 		return NULL;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-12-07 15:29 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-06 10:33 Tetsuo Handa
2016-12-07  8:15 ` Michal Hocko
2016-12-07 15:29   ` Tetsuo Handa [this message]
2016-12-08  8:20     ` Vlastimil Babka
2016-12-08 11:00       ` Tetsuo Handa
2016-12-08 13:32         ` Michal Hocko
2016-12-08 16:18         ` Sergey Senozhatsky
2016-12-08 13:27     ` Michal Hocko
2016-12-09 14:23       ` Tetsuo Handa
2016-12-09 14:46         ` Michal Hocko
2016-12-10 11:24           ` Tetsuo Handa
2016-12-12  9:07             ` Michal Hocko
2016-12-12 11:49               ` Petr Mladek
2016-12-12 13:00                 ` Michal Hocko
2016-12-12 14:05                   ` Tetsuo Handa
2016-12-13  1:06                 ` Sergey Senozhatsky
2016-12-12 12:12               ` Tetsuo Handa
2016-12-12 12:55                 ` Michal Hocko
2016-12-12 13:19                   ` Michal Hocko
2016-12-13 12:06                     ` Tetsuo Handa
2016-12-13 17:06                       ` Michal Hocko
2016-12-14 11:37                         ` Tetsuo Handa
2016-12-14 12:42                           ` Michal Hocko
2016-12-14 16:36                             ` Tetsuo Handa
2016-12-14 18:18                               ` Michal Hocko
2016-12-15 10:21                                 ` Tetsuo Handa
2016-12-19 11:25                                   ` Tetsuo Handa
2016-12-19 12:27                                     ` Sergey Senozhatsky
2016-12-20 15:39                                       ` Sergey Senozhatsky
2016-12-22 10:27                                         ` Tetsuo Handa
2016-12-22 10:53                                           ` Petr Mladek
2016-12-22 13:40                                             ` Sergey Senozhatsky
2016-12-22 13:33                                           ` Tetsuo Handa
2016-12-22 19:24                                             ` Michal Hocko
2016-12-24  6:25                                               ` Tetsuo Handa
2016-12-26 11:49                                                 ` Michal Hocko
2016-12-27 10:39                                                   ` Tetsuo Handa
2016-12-27 10:57                                                     ` Michal Hocko
2016-12-22 13:42                                           ` Sergey Senozhatsky
2016-12-22 14:01                                             ` Tetsuo Handa
2016-12-22 14:09                                               ` Sergey Senozhatsky
2016-12-22 14:30                                                 ` Sergey Senozhatsky
2016-12-26 10:54                                                 ` Tetsuo Handa
2016-12-26 11:34                                                   ` Sergey Senozhatsky
2017-01-12 13:10                                                     ` Petr Mladek
2017-01-13  2:52                                                       ` Sergey Senozhatsky
2017-01-13  3:53                                                         ` Sergey Senozhatsky
2017-01-13 11:15                                                           ` Petr Mladek
2017-01-13 11:14                                                         ` Petr Mladek
2017-01-12 14:18                                                     ` Petr Mladek
2017-01-13  2:28                                                       ` Sergey Senozhatsky
2017-01-13 11:03                                                         ` Petr Mladek
2017-01-13 11:50                                                           ` Sergey Senozhatsky
2017-01-13 12:15                                                             ` Petr Mladek
2016-12-26 11:41                                                   ` Sergey Senozhatsky
2017-01-13 14:03                                                     ` Petr Mladek
2016-12-15  1:11                         ` Sergey Senozhatsky
2016-12-15  6:35                           ` Michal Hocko
2016-12-15 10:16                             ` Petr Mladek
2016-12-14  9:37                       ` Petr Mladek
2016-12-14 10:20                         ` Sergey Senozhatsky
2016-12-14 11:01                           ` Petr Mladek
2016-12-14 12:23                             ` Sergey Senozhatsky
2016-12-14 12:47                               ` Petr Mladek
2016-12-14 10:26                         ` Michal Hocko
2016-12-15  7:34                           ` Sergey Senozhatsky
2016-12-14 11:37                         ` Tetsuo Handa
2016-12-14 12:36                           ` Petr Mladek
2016-12-14 12:44                             ` Michal Hocko
2016-12-14 13:36                               ` Tetsuo Handa
2016-12-14 13:52                                 ` Michal Hocko
2016-12-14 12:50                             ` Sergey Senozhatsky
2016-12-12 14:59                   ` Tetsuo Handa
2016-12-12 15:55                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201612080029.IBD55588.OSOFOtHVMLQFFJ@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox