From: Shakeel Butt <shakeelb@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
bo.liu@linux.alibaba.com, Jan Kara <jack@suse.cz>,
david@fromorbit.com, tytso@mit.edu,
Johannes Weiner <hannes@cmpxchg.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Linux MM <linux-mm@kvack.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH v2] mm, memcg: fix reclaim deadlock with writeback
Date: Wed, 12 Dec 2018 09:24:45 -0800 [thread overview]
Message-ID: <CALvZod4A237wvwsDjQXdwGBXEV9gLrjbARtP5-MaMAD_9OhQrg@mail.gmail.com> (raw)
In-Reply-To: <20181212155055.1269-1-mhocko@kernel.org>
On Wed, Dec 12, 2018 at 7:51 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> From: Michal Hocko <mhocko@suse.com>
>
> Liu Bo has experienced a deadlock between memcg (legacy) reclaim and the
> ext4 writeback
> task1:
> [<ffffffff811aaa52>] wait_on_page_bit+0x82/0xa0
> [<ffffffff811c5777>] shrink_page_list+0x907/0x960
> [<ffffffff811c6027>] shrink_inactive_list+0x2c7/0x680
> [<ffffffff811c6ba4>] shrink_node_memcg+0x404/0x830
> [<ffffffff811c70a8>] shrink_node+0xd8/0x300
> [<ffffffff811c73dd>] do_try_to_free_pages+0x10d/0x330
> [<ffffffff811c7865>] try_to_free_mem_cgroup_pages+0xd5/0x1b0
> [<ffffffff8122df2d>] try_charge+0x14d/0x720
> [<ffffffff812320cc>] memcg_kmem_charge_memcg+0x3c/0xa0
> [<ffffffff812321ae>] memcg_kmem_charge+0x7e/0xd0
> [<ffffffff811b68a8>] __alloc_pages_nodemask+0x178/0x260
> [<ffffffff8120bff5>] alloc_pages_current+0x95/0x140
> [<ffffffff81074247>] pte_alloc_one+0x17/0x40
> [<ffffffff811e34de>] __pte_alloc+0x1e/0x110
> [<ffffffffa06739de>] alloc_set_pte+0x5fe/0xc20
> [<ffffffff811e5d93>] do_fault+0x103/0x970
> [<ffffffff811e6e5e>] handle_mm_fault+0x61e/0xd10
> [<ffffffff8106ea02>] __do_page_fault+0x252/0x4d0
> [<ffffffff8106ecb0>] do_page_fault+0x30/0x80
> [<ffffffff8171bce8>] page_fault+0x28/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> task2:
> [<ffffffff811aadc6>] __lock_page+0x86/0xa0
> [<ffffffffa02f1e47>] mpage_prepare_extent_to_map+0x2e7/0x310 [ext4]
> [<ffffffffa08a2689>] ext4_writepages+0x479/0xd60
> [<ffffffff811bbede>] do_writepages+0x1e/0x30
> [<ffffffff812725e5>] __writeback_single_inode+0x45/0x320
> [<ffffffff81272de2>] writeback_sb_inodes+0x272/0x600
> [<ffffffff81273202>] __writeback_inodes_wb+0x92/0xc0
> [<ffffffff81273568>] wb_writeback+0x268/0x300
> [<ffffffff81273d24>] wb_workfn+0xb4/0x390
> [<ffffffff810a2f19>] process_one_work+0x189/0x420
> [<ffffffff810a31fe>] worker_thread+0x4e/0x4b0
> [<ffffffff810a9786>] kthread+0xe6/0x100
> [<ffffffff8171a9a1>] ret_from_fork+0x41/0x50
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> He adds
> : task1 is waiting for the PageWriteback bit of the page that task2 has
> : collected in mpd->io_submit->io_bio, and tasks2 is waiting for the LOCKED
> : bit the page which tasks1 has locked.
>
> More precisely task1 is handling a page fault and it has a page locked
> while it charges a new page table to a memcg. That in turn hits a memory
> limit reclaim and the memcg reclaim for legacy controller is waiting on
> the writeback but that is never going to finish because the writeback
> itself is waiting for the page locked in the #PF path. So this is
> essentially ABBA deadlock.
>
> Waiting for the writeback in legacy memcg controller is a workaround
> for pre-mature OOM killer invocations because there is no dirty IO
> throttling available for the controller. There is no easy way around
> that unfortunately. Therefore fix this specific issue by pre-allocating
> the page table outside of the page lock. We have that handy
> infrastructure for that already so simply reuse the fault-around pattern
> which already does this.
Michal, can you please add the following para in the commit message as
well which was in the first version. This fact should be documented at
least in the commit message.
>
> There are probably other hidden __GFP_ACCOUNT | GFP_KERNEL allocations
> from under a fs page locked but they should be really rare. I am not
> aware of a better solution unfortunately.
>
thanks,
Shakeel
next prev parent reply other threads:[~2018-12-12 17:24 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-11 13:26 [PATCH] " Michal Hocko
2018-12-11 15:15 ` Kirill A. Shutemov
2018-12-11 16:21 ` Michal Hocko
2018-12-11 16:39 ` Jan Kara
2018-12-12 9:42 ` Kirill A. Shutemov
2018-12-12 9:48 ` Michal Hocko
2018-12-12 10:05 ` Jan Kara
2018-12-12 11:58 ` Kirill A. Shutemov
2018-12-12 12:13 ` Michal Hocko
2018-12-12 15:33 ` kbuild test robot
2018-12-12 15:57 ` Michal Hocko
2018-12-12 15:50 ` [PATCH v2] " Michal Hocko
2018-12-12 17:24 ` Shakeel Butt [this message]
2018-12-12 17:49 ` Liu Bo
2018-12-13 9:22 ` [PATCH v3] " Michal Hocko
2018-12-13 10:41 ` Kirill A. Shutemov
2018-12-13 12:39 ` Michal Hocko
2018-12-13 22:04 ` Johannes Weiner
2018-12-14 8:49 ` Michal Hocko
2018-12-13 22:43 ` Liu Bo
2018-12-14 17:13 ` [PATCH] " kbuild test robot
2018-12-14 17:31 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALvZod4A237wvwsDjQXdwGBXEV9gLrjbARtP5-MaMAD_9OhQrg@mail.gmail.com \
--to=shakeelb@google.com \
--cc=akpm@linux-foundation.org \
--cc=bo.liu@linux.alibaba.com \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=tytso@mit.edu \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox