linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	rientjes@google.com, hannes@cmpxchg.org, guro@fb.com,
	tj@kernel.org, vdavydov.dev@gmail.com,
	torvalds@linux-foundation.org
Subject: Re: [PATCH v2] mm,page_alloc: wait for oom_lock than back off
Date: Mon, 26 Feb 2018 22:16:25 +0900	[thread overview]
Message-ID: <201802262216.ADH48949.FtQLFOHJOVSOMF@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20180226121933.GC16269@dhcp22.suse.cz>

Michal Hocko wrote:
> On Mon 26-02-18 19:58:19, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Sat 24-02-18 17:00:51, Tetsuo Handa wrote:
> > > > >From d922dd170c2bed01a775e8cca0871098aecc253d Mon Sep 17 00:00:00 2001
> > > > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > Date: Sat, 24 Feb 2018 16:49:21 +0900
> > > > Subject: [PATCH v2] mm,page_alloc: wait for oom_lock than back off
> > > > 
> > > > This patch fixes a bug which is essentially same with a bug fixed by
> > > > commit 400e22499dd92613 ("mm: don't warn about allocations which stall for
> > > > too long").
> > > > 
> > > > Currently __alloc_pages_may_oom() is using mutex_trylock(&oom_lock) based
> > > > on an assumption that the owner of oom_lock is making progress for us. But
> > > > it is possible to trigger OOM lockup when many threads concurrently called
> > > > __alloc_pages_slowpath() because all CPU resources are wasted for pointless
> > > > direct reclaim efforts. That is, schedule_timeout_uninterruptible(1) in
> > > > __alloc_pages_may_oom() does not always give enough CPU resource to the
> > > > owner of the oom_lock.
> > > > 
> > > > It is possible that the owner of oom_lock is preempted by other threads.
> > > > Preemption makes the OOM situation much worse. But the page allocator is
> > > > not responsible about wasting CPU resource for something other than memory
> > > > allocation request. Wasting CPU resource for memory allocation request
> > > > without allowing the owner of oom_lock to make forward progress is a page
> > > > allocator's bug.
> > > > 
> > > > Therefore, this patch changes to wait for oom_lock in order to guarantee
> > > > that no thread waiting for the owner of oom_lock to make forward progress
> > > > will not consume CPU resources for pointless direct reclaim efforts.
> > > > 
> > > > We know printk() from OOM situation where a lot of threads are doing almost
> > > > busy-looping is a nightmare. As a side effect of this patch, printk() with
> > > > oom_lock held can start utilizing CPU resources saved by this patch (and
> > > > reduce preemption during printk(), making printk() complete faster).
> > > > 
> > > > By changing !mutex_trylock(&oom_lock) with mutex_lock_killable(&oom_lock),
> > > > it is possible that many threads prevent the OOM reaper from making forward
> > > > progress. Thus, this patch removes mutex_lock(&oom_lock) from the OOM
> > > > reaper.
> > > > 
> > > > Also, since nobody uses oom_lock serialization when setting MMF_OOM_SKIP
> > > > and we don't try last second allocation attempt after confirming that there
> > > > is no !MMF_OOM_SKIP OOM victim, the possibility of needlessly selecting
> > > > more OOM victims will be increased if we continue using ALLOC_WMARK_HIGH.
> > > > Thus, this patch changes to use ALLOC_MARK_MIN.
> > > > 
> > > > Also, since we don't want to sleep with oom_lock held so that we can allow
> > > > threads waiting at mutex_lock_killable(&oom_lock) to try last second
> > > > allocation attempt (because the OOM reaper starts reclaiming memory without
> > > > waiting for oom_lock) and start selecting next OOM victim if necessary,
> > > > this patch changes the location of the short sleep from inside of oom_lock
> > > > to outside of oom_lock.
> > > 
> > > This patch does three different things mangled into one patch. All that
> > > with a patch description which talks a lot but doesn't really explain
> > > those changes.
> > > 
> > > Moreover, you are effectively tunning for an overloaded page allocator
> > > artifical test case and add a central lock where many tasks would
> > > block. I have already tried to explain that this is not an universal
> > > win and you should better have a real life example where this is really
> > > helpful.
> > > 
> > > While I do agree that removing the oom_lock from __oom_reap_task_mm is a
> > > sensible thing, changing the last allocation attempt to ALLOC_WMARK_MIN
> > > is not all that straightforward and it would require much more detailed
> > > explaination.
> > > 
> > > So the patch in its current form is not mergeable IMHO.
> > 
> > Your comment is impossible to satisfy.
> > Please show me your version, for you are keeping me deadlocked.
> > 
> > I'm angry with MM people's attitude that MM people are not friendly to
> > users who are bothered by lockup / slowdown problems under memory pressure.
> > They just say "Your system is overloaded" and don't provide enough support
> > for checking whether they are hitting a real bug other than overloaded.
> 
> You should listen much more and also try to understand concerns that we
> have. You are trying to touch a very subtle piece of code. That code
> is not perfect at all but it tends to work reasonably well under most
> workloads out there. Now you try to handle corner cases you are seeing
> and that is good thing in general. But the fix shouldn't introduce new
> risks and adding a single synchronization point into the oom path is
> simply not without its own risks.

Then, please show me a real life example (by comparing trylock() versus
lock_killable()) where lock_killable() hurts. If you proved it, I'm
willing to make this optional (i.e. "use trylock() at the risk of lockup
or use lock_killable() at the risk of latency"). Without testing this
patch at real world, we won't be able to prove it though.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2018-02-26 13:16 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-22 13:46 [PATCH] mm,oom: Don't call schedule_timeout_killable() with oom_lock held Tetsuo Handa
2018-01-23  8:38 ` Michal Hocko
2018-01-23 12:07   ` Tetsuo Handa
2018-01-23 12:42     ` Michal Hocko
2018-01-24 13:28       ` Tetsuo Handa
2018-02-13 11:58         ` Tetsuo Handa
2018-02-20 13:32           ` [PATCH] mm,page_alloc: wait for oom_lock than back off Tetsuo Handa
2018-02-20 13:40             ` Matthew Wilcox
2018-02-20 14:12               ` Tetsuo Handa
2018-02-20 14:49             ` Michal Hocko
2018-02-21 14:27               ` Tetsuo Handa
2018-02-22 13:06                 ` Michal Hocko
2018-02-24  8:00                   ` [PATCH v2] " Tetsuo Handa
2018-02-26  9:27                     ` Michal Hocko
2018-02-26 10:58                       ` Tetsuo Handa
2018-02-26 12:19                         ` Michal Hocko
2018-02-26 13:16                           ` Tetsuo Handa [this message]
2018-03-02 11:10                             ` [PATCH v3] " Tetsuo Handa
2018-03-02 14:10                               ` Michal Hocko
2018-03-03  3:15                                 ` Tetsuo Handa
2018-03-21 10:39                                   ` Tetsuo Handa
2018-03-21 11:21                                     ` Michal Hocko
2018-03-21 11:35                                       ` Tetsuo Handa
2018-03-21 12:00                                         ` Michal Hocko
2018-03-21 12:20                                           ` Tetsuo Handa
2018-03-21 12:31                                             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201802262216.ADH48949.FtQLFOHJOVSOMF@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox