From: Michal Hocko <mhocko@suse.cz>
To: Greg Thelen <gthelen@google.com>
Cc: azurIt <azurit@pobox.sk>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
cgroups mailinglist <cgroups@vger.kernel.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH for 3.2.34] memcg: do not trigger OOM from add_to_page_cache_locked
Date: Fri, 8 Feb 2013 17:29:18 +0100 [thread overview]
Message-ID: <20130208162918.GF7557@dhcp22.suse.cz> (raw)
In-Reply-To: <xr93ip63ig6j.fsf@gthelen.mtv.corp.google.com>
On Thu 07-02-13 20:27:00, Greg Thelen wrote:
> On Tue, Feb 05 2013, Michal Hocko wrote:
>
> > On Tue 05-02-13 10:09:57, Greg Thelen wrote:
> >> On Tue, Feb 05 2013, Michal Hocko wrote:
> >>
> >> > On Tue 05-02-13 08:48:23, Greg Thelen wrote:
> >> >> On Tue, Feb 05 2013, Michal Hocko wrote:
> >> >>
> >> >> > On Tue 05-02-13 15:49:47, azurIt wrote:
> >> >> > [...]
> >> >> >> Just to be sure - am i supposed to apply this two patches?
> >> >> >> http://watchdog.sk/lkml/patches/
> >> >> >
> >> >> > 5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
> >> >> > mentioned in a follow up email. Here is the full patch:
> >> >> > ---
> >> >> > From f2bf8437d5b9bb38a95a432bf39f32c584955171 Mon Sep 17 00:00:00 2001
> >> >> > From: Michal Hocko <mhocko@suse.cz>
> >> >> > Date: Mon, 26 Nov 2012 11:47:57 +0100
> >> >> > Subject: [PATCH] memcg: do not trigger OOM from add_to_page_cache_locked
> >> >> >
> >> >> > memcg oom killer might deadlock if the process which falls down to
> >> >> > mem_cgroup_handle_oom holds a lock which prevents other task to
> >> >> > terminate because it is blocked on the very same lock.
> >> >> > This can happen when a write system call needs to allocate a page but
> >> >> > the allocation hits the memcg hard limit and there is nothing to reclaim
> >> >> > (e.g. there is no swap or swap limit is hit as well and all cache pages
> >> >> > have been reclaimed already) and the process selected by memcg OOM
> >> >> > killer is blocked on i_mutex on the same inode (e.g. truncate it).
> >> >> >
> >> >> > Process A
> >> >> > [<ffffffff811109b8>] do_truncate+0x58/0xa0 # takes i_mutex
> >> >> > [<ffffffff81121c90>] do_last+0x250/0xa30
> >> >> > [<ffffffff81122547>] path_openat+0xd7/0x440
> >> >> > [<ffffffff811229c9>] do_filp_open+0x49/0xa0
> >> >> > [<ffffffff8110f7d6>] do_sys_open+0x106/0x240
> >> >> > [<ffffffff8110f950>] sys_open+0x20/0x30
> >> >> > [<ffffffff815b5926>] system_call_fastpath+0x18/0x1d
> >> >> > [<ffffffffffffffff>] 0xffffffffffffffff
> >> >> >
> >> >> > Process B
> >> >> > [<ffffffff8110a9c1>] mem_cgroup_handle_oom+0x241/0x3b0
> >> >> > [<ffffffff8110b5ab>] T.1146+0x5ab/0x5c0
> >> >> > [<ffffffff8110c22e>] mem_cgroup_cache_charge+0xbe/0xe0
> >> >> > [<ffffffff810ca28c>] add_to_page_cache_locked+0x4c/0x140
> >> >> > [<ffffffff810ca3a2>] add_to_page_cache_lru+0x22/0x50
> >> >> > [<ffffffff810ca45b>] grab_cache_page_write_begin+0x8b/0xe0
> >> >> > [<ffffffff81193a18>] ext3_write_begin+0x88/0x270
> >> >> > [<ffffffff810c8fc6>] generic_file_buffered_write+0x116/0x290
> >> >> > [<ffffffff810cb3cc>] __generic_file_aio_write+0x27c/0x480
> >> >> > [<ffffffff810cb646>] generic_file_aio_write+0x76/0xf0 # takes ->i_mutex
> >> >> > [<ffffffff8111156a>] do_sync_write+0xea/0x130
> >> >> > [<ffffffff81112183>] vfs_write+0xf3/0x1f0
> >> >> > [<ffffffff81112381>] sys_write+0x51/0x90
> >> >> > [<ffffffff815b5926>] system_call_fastpath+0x18/0x1d
> >> >> > [<ffffffffffffffff>] 0xffffffffffffffff
> >> >>
> >> >> It looks like grab_cache_page_write_begin() passes __GFP_FS into
> >> >> __page_cache_alloc() and mem_cgroup_cache_charge(). Which makes me
> >> >> think that this deadlock is also possible in the page allocator even
> >> >> before getting to add_to_page_cache_lru. no?
> >> >
> >> > I am not that familiar with VFS but i_mutex is a high level lock AFAIR
> >> > and it shouldn't be called from the pageout path so __page_cache_alloc
> >> > should be safe.
> >>
> >> I wasn't clear, sorry. My concern is not that pageout() grabs i_mutex.
> >> My concern is that __page_cache_alloc() will invoke the oom killer and
> >> select a victim which wants i_mutex. This victim will deadlock because
> >> the oom killer caller already holds i_mutex.
> >
> > That would be true for the memcg oom because that one is blocking but
> > the global oom just puts the allocator into sleep for a while and then
> > the allocator should back off eventually (unless this is NOFAIL
> > allocation). I would need to look closer whether this is really the case
> > - I haven't seen that allocator code path for a while...
>
> I think the page allocator can loop forever waiting for an oom victim to
> terminate even without NOFAIL. Especially if the oom victim wants a
> resource exclusively held by the allocating thread (e.g. i_mutex). It
> looks like the same deadlock you describe is also possible (though more
> rare) without memcg.
OK, I have checked the allocator slow path and you are right even
GFP_KERNEL will not fail. This can lead to similar deadlocks - e.g.
OOM killed task blocked on down_write(mmap_sem) while the page fault
handler holding mmap_sem for reading and allocating a new page without
any progress.
Luckily there are memory reserves where the allocator fall back
eventually so the allocation should be able to get some memory and
release the lock. There is still a theoretical chance this would block
though. This sounds like a corner case though so I wouldn't care about
it very much.
> If the looping thread is an eligible oom victim (i.e. not oom disabled,
> not an kernel thread, etc) then the page allocator can return NULL in so
> long as NOFAIL is not used. So any allocator which is able to call the
> oom killer and is not oom disabled (kernel thread, etc) is already
> exposed to the possibility of page allocator failure. So if the page
> allocator could detect the deadlock, then it could safely return NULL.
> Maybe after looping N times without forward progress the page allocator
> should consider failing unless NOFAIL is given.
page allocator is quite tricky to touch and the chances of this deadlock
are not that big.
> if memcg oom kill has been tried a reasonable number of times. Simply
> failing the memcg charge with ENOMEM seems easier to support than
> exceeding limit (Kame's loan patch).
We cannot do that in the page fault path because this would lead to a
global oom killer. We would need to either retry the page fault or send
KILL to the faulting process. But I do not like this much as this could
lead to DoS attacks.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-02-08 16:29 UTC|newest]
Thread overview: 171+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20121121200207.01068046@pobox.sk>
2012-11-22 0:26 ` memory-cgroup bug Kamezawa Hiroyuki
2012-11-22 9:36 ` azurIt
2012-11-22 21:45 ` Michal Hocko
2012-11-22 15:24 ` Michal Hocko
2012-11-22 18:05 ` azurIt
2012-11-22 21:42 ` Michal Hocko
2012-11-22 22:34 ` azurIt
2012-11-23 7:40 ` Michal Hocko
2012-11-23 9:21 ` azurIt
2012-11-23 9:28 ` Michal Hocko
2012-11-23 9:44 ` azurIt
2012-11-23 10:10 ` Michal Hocko
2012-11-23 9:34 ` Glauber Costa
2012-11-23 10:04 ` Michal Hocko
2012-11-23 14:59 ` azurIt
2012-11-25 10:17 ` Michal Hocko
2012-11-25 12:39 ` azurIt
2012-11-25 13:02 ` Michal Hocko
2012-11-25 13:27 ` azurIt
2012-11-25 13:44 ` Michal Hocko
2012-11-25 0:10 ` azurIt
2012-11-25 12:05 ` Michal Hocko
2012-11-25 12:36 ` azurIt
2012-11-25 13:55 ` Michal Hocko
2012-11-26 0:38 ` azurIt
2012-11-26 7:57 ` Michal Hocko
2012-11-26 13:18 ` [PATCH -mm] memcg: do not trigger OOM from add_to_page_cache_locked Michal Hocko
2012-11-26 13:21 ` [PATCH for 3.2.34] " Michal Hocko
2012-11-26 21:28 ` azurIt
2012-11-30 1:45 ` azurIt
2012-11-30 2:29 ` azurIt
2012-11-30 12:45 ` Michal Hocko
2012-11-30 12:53 ` azurIt
2012-11-30 13:44 ` azurIt
2012-11-30 14:44 ` Michal Hocko
2012-11-30 15:03 ` Michal Hocko
2012-11-30 15:37 ` Michal Hocko
2012-11-30 15:08 ` azurIt
2012-11-30 15:39 ` Michal Hocko
2012-11-30 15:59 ` azurIt
2012-11-30 16:19 ` Michal Hocko
2012-11-30 16:26 ` azurIt
2012-11-30 16:53 ` Michal Hocko
2012-11-30 20:43 ` azurIt
2012-12-03 15:16 ` Michal Hocko
2012-12-05 1:36 ` azurIt
2012-12-05 14:17 ` Michal Hocko
2012-12-06 0:29 ` azurIt
2012-12-06 9:54 ` Michal Hocko
2012-12-06 10:12 ` azurIt
2012-12-06 17:06 ` Michal Hocko
2012-12-10 1:20 ` azurIt
2012-12-10 9:43 ` Michal Hocko
2012-12-10 10:18 ` azurIt
2012-12-10 15:52 ` Michal Hocko
2012-12-10 17:18 ` azurIt
2012-12-17 1:34 ` azurIt
2012-12-17 16:32 ` Michal Hocko
2012-12-17 18:23 ` azurIt
2012-12-17 19:55 ` Michal Hocko
2012-12-18 14:22 ` azurIt
2012-12-18 15:20 ` Michal Hocko
2012-12-24 13:25 ` azurIt
2012-12-28 16:22 ` Michal Hocko
2012-12-30 1:09 ` azurIt
2012-12-30 11:08 ` Michal Hocko
2013-01-25 15:07 ` azurIt
2013-01-25 16:31 ` Michal Hocko
2013-02-05 13:49 ` Michal Hocko
2013-02-05 14:49 ` azurIt
2013-02-05 16:09 ` Michal Hocko
2013-02-05 16:46 ` azurIt
2013-02-05 16:48 ` Greg Thelen
2013-02-05 17:46 ` Michal Hocko
2013-02-05 18:09 ` Greg Thelen
2013-02-05 18:59 ` Michal Hocko
2013-02-08 4:27 ` Greg Thelen
2013-02-08 16:29 ` Michal Hocko [this message]
2013-02-08 16:40 ` Michal Hocko
2013-02-06 1:17 ` azurIt
2013-02-06 14:01 ` Michal Hocko
2013-02-06 14:22 ` Michal Hocko
2013-02-06 16:00 ` [PATCH for 3.2.34] memcg: do not trigger OOM if PF_NO_MEMCG_OOM is set Michal Hocko
2013-02-08 5:03 ` azurIt
2013-02-08 9:44 ` Michal Hocko
2013-02-08 11:02 ` azurIt
2013-02-08 12:38 ` Michal Hocko
2013-02-08 13:56 ` azurIt
2013-02-08 14:47 ` Michal Hocko
2013-02-08 15:24 ` Michal Hocko
2013-02-08 15:58 ` azurIt
2013-02-08 17:10 ` Michal Hocko
2013-02-08 21:02 ` azurIt
2013-02-10 15:03 ` Michal Hocko
2013-02-10 16:46 ` azurIt
2013-02-11 11:22 ` Michal Hocko
2013-02-22 8:23 ` azurIt
2013-02-22 12:52 ` Michal Hocko
2013-02-22 12:54 ` azurIt
2013-02-22 13:00 ` Michal Hocko
2013-06-06 16:04 ` Michal Hocko
2013-06-06 16:16 ` azurIt
2013-06-07 13:11 ` [PATCH for 3.2] memcg: do not trap chargers with full callstack on OOM Michal Hocko
2013-06-17 10:21 ` azurIt
2013-06-19 13:26 ` Michal Hocko
2013-06-22 20:09 ` azurIt
2013-06-24 20:13 ` Johannes Weiner
2013-06-28 10:06 ` azurIt
2013-07-05 18:17 ` Johannes Weiner
2013-07-05 19:02 ` azurIt
2013-07-05 19:18 ` Johannes Weiner
2013-07-07 23:42 ` azurIt
2013-07-09 13:10 ` Michal Hocko
2013-07-09 13:19 ` azurIt
2013-07-09 13:54 ` Michal Hocko
2013-07-10 16:25 ` azurIt
2013-07-11 7:25 ` Michal Hocko
2013-07-13 23:26 ` azurIt
2013-07-13 23:51 ` azurIt
2013-07-15 15:41 ` Michal Hocko
2013-07-15 16:00 ` Michal Hocko
2013-07-16 15:35 ` Johannes Weiner
2013-07-16 16:09 ` Michal Hocko
2013-07-16 16:48 ` Johannes Weiner
2013-07-19 4:21 ` Johannes Weiner
2013-07-19 4:22 ` [patch 1/5] mm: invoke oom-killer from remaining unconverted page fault handlers Johannes Weiner
2013-07-19 4:24 ` [patch 2/5] mm: pass userspace fault flag to generic fault handler Johannes Weiner
2013-07-19 4:25 ` [patch 3/5] x86: finish fault error path with fatal signal Johannes Weiner
2013-07-24 20:32 ` Johannes Weiner
2013-07-25 20:29 ` KOSAKI Motohiro
2013-07-25 21:50 ` Johannes Weiner
2013-07-19 4:25 ` [patch 4/5] memcg: do not trap chargers with full callstack on OOM Johannes Weiner
2013-07-19 4:26 ` [patch 5/5] mm: memcontrol: sanity check memcg OOM context unwind Johannes Weiner
2013-07-19 8:23 ` [PATCH for 3.2] memcg: do not trap chargers with full callstack on OOM azurIt
2013-07-14 17:07 ` azurIt
2013-07-09 13:00 ` Michal Hocko
2013-07-09 13:08 ` Michal Hocko
2013-07-09 13:10 ` Michal Hocko
2013-06-24 16:48 ` azurIt
2013-02-22 12:00 ` [PATCH for 3.2.34] memcg: do not trigger OOM if PF_NO_MEMCG_OOM is set azurIt
2013-02-07 11:01 ` [PATCH for 3.2.34] memcg: do not trigger OOM from add_to_page_cache_locked Kamezawa Hiroyuki
2013-02-07 12:31 ` Michal Hocko
2013-02-08 4:16 ` Kamezawa Hiroyuki
2013-02-08 1:40 ` Kamezawa Hiroyuki
2013-02-08 16:01 ` Michal Hocko
2013-02-05 16:31 ` Michal Hocko
2012-12-24 13:38 ` azurIt
2012-12-28 16:35 ` Michal Hocko
2012-11-26 17:46 ` [PATCH -mm] " Johannes Weiner
2012-11-26 18:04 ` Michal Hocko
2012-11-26 18:24 ` Johannes Weiner
2012-11-26 19:03 ` Michal Hocko
2012-11-26 19:29 ` Johannes Weiner
2012-11-26 20:08 ` Michal Hocko
2012-11-26 20:19 ` Johannes Weiner
2012-11-26 20:46 ` azurIt
2012-11-26 20:53 ` Johannes Weiner
2012-11-26 22:06 ` Michal Hocko
2012-11-27 0:05 ` Kamezawa Hiroyuki
2012-11-27 9:54 ` Michal Hocko
2012-11-27 19:48 ` Johannes Weiner
2012-11-27 20:54 ` [PATCH -v2 " Michal Hocko
2012-11-27 20:59 ` Michal Hocko
2012-11-28 15:26 ` Johannes Weiner
2012-11-28 16:04 ` Michal Hocko
2012-11-28 16:37 ` Johannes Weiner
2012-11-28 16:46 ` Michal Hocko
2012-11-28 16:48 ` Michal Hocko
2012-11-28 18:44 ` Johannes Weiner
2012-11-28 20:20 ` Hugh Dickins
2012-11-29 14:05 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130208162918.GF7557@dhcp22.suse.cz \
--to=mhocko@suse.cz \
--cc=azurit@pobox.sk \
--cc=cgroups@vger.kernel.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox