From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: akpm@linux-foundation.org
Cc: linux-mm@kvack.org, mgorman@suse.de, mhocko@suse.com, vbabka@suse.cz
Subject: Re: [PATCH 1/2] mm,page_alloc: Don't call __node_reclaim() with oom_lock held.
Date: Sat, 26 Aug 2017 10:28:24 +0900 [thread overview]
Message-ID: <201708261028.HBH81733.HOtQJLMVFOFFOS@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20170825134714.844d9fb169e5b1883c3dd6eb@linux-foundation.org>
Andrew Morton wrote:
> On Thu, 24 Aug 2017 21:18:25 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>
> > We are doing last second memory allocation attempt before calling
> > out_of_memory(). But since slab shrinker functions might indirectly
> > wait for other thread's __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory
> > allocations via sleeping locks, calling slab shrinker functions from
> > node_reclaim() from get_page_from_freelist() with oom_lock held has
> > possibility of deadlock. Therefore, make sure that last second memory
> > allocation attempt does not call slab shrinker functions.
>
> I wonder if there's any way we could gert lockdep to detect this sort
> of thing.
That is hopeless regarding MM subsystem.
The root problem is that MM subsystem assumes that somebody else shall make
progress for me. And direct reclaim does not check for other thread's progress
(e.g. too_many_isolated() looping forever waiting for kswapd) and continue
consuming CPU resource (e.g. deprive a thread doing schedule_timeout_killable()
with oom_lock held of all CPU time for doing pointless get_page_from_freelist()
etc.).
Since the page allocator chooses retry the attempt rather than wait for locks,
lockdep won't help. The dependency is spreaded to all threads with timing and
threshold checks, preventing threads from calling operations which lockdep
will detect.
I do wish we can get rid of __GFP_DIRECT_RECLAIM and offload memory reclaim
operation to some kswapd-like kernel threads. Then, we would be able to check
progress of relevant threads and invoke the OOM killer as needed (rather than
doing __GFP_FS check in out_of_memory()), as well as implementing __GFP_KILLABLE.
>
> Has the deadlock been observed in testing? Do we think this fix
> should be backported into -stable?
I have never observed this deadlock, but it is hard for everybody to know
if he/she hit this deadlock. The only clue which is available since 4.9+
(though still unreliable) is warn_alloc() complaining memory allocation is
stalling for some reason. For users using 2.6.18/2.6.32/3.10 kernels, they
have absolutely no clue to know it (other than using SysRq-t etc. which is
generating too much messages and asking for too much efforts).
Judging from my experience at a support center, it is too difficult for users
to report memory allocation hangs. It requires users to stand by in front of
the console twenty-four seven so that we get SysRq-t etc. whenever a memory
allocation related problem is suspected. We can't ask users for such effort.
There is no report does not mean memory allocation hang is not occurring in
the real life. But nobody (other than me) is interested in adding asynchronous
watchdog like kmallocwd. Thus, I'm spending much effort for finding potential
lockup bugs using stress tests, and Michal do not care bugs which are found by
stress tests, and nobody else are responding, and users do not have a reliable
mean to report lockup bugs caused by memory allocation (e.g. kmallocwd).
Sigh.....
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-08-26 1:28 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1503577106-9196-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
2017-08-24 12:18 ` [RFC PATCH 2/2] mm,oom: Try last second allocation after selecting an OOM victim Tetsuo Handa
2017-08-24 13:18 ` Michal Hocko
2017-08-24 14:40 ` Tetsuo Handa
2017-08-25 8:00 ` Michal Hocko
2017-09-09 0:55 ` Tetsuo Handa
[not found] ` <201710172204.AGG30740.tVHJFFOQLMSFOO@I-love.SAKURA.ne.jp>
2017-10-20 12:40 ` Michal Hocko
2017-10-20 14:18 ` Tetsuo Handa
2017-10-23 11:30 ` Michal Hocko
2017-10-24 11:24 ` Tetsuo Handa
2017-10-24 11:41 ` Michal Hocko
2017-10-25 10:48 ` Tetsuo Handa
2017-10-25 11:09 ` Michal Hocko
2017-10-25 12:15 ` Tetsuo Handa
2017-10-25 12:41 ` Michal Hocko
2017-10-25 14:58 ` Tetsuo Handa
2017-10-25 15:05 ` Michal Hocko
2017-10-25 15:34 ` Tetsuo Handa
2017-08-24 13:03 ` [PATCH 1/2] mm,page_alloc: Don't call __node_reclaim() with oom_lock held Michal Hocko
2017-08-25 20:47 ` Andrew Morton
2017-08-26 1:28 ` Tetsuo Handa [this message]
2017-08-27 4:17 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201708261028.HBH81733.HOtQJLMVFOFFOS@I-love.SAKURA.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox