From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4BA6C433ED for ; Tue, 11 May 2021 08:56:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0A895611F1 for ; Tue, 11 May 2021 08:56:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0A895611F1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 723B26B006E; Tue, 11 May 2021 04:56:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6ADD66B0071; Tue, 11 May 2021 04:56:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 525EA6B0072; Tue, 11 May 2021 04:56:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0144.hostedemail.com [216.40.44.144]) by kanga.kvack.org (Postfix) with ESMTP id 3263A6B006E for ; Tue, 11 May 2021 04:56:53 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id EC060180ACF84 for ; Tue, 11 May 2021 08:56:52 +0000 (UTC) X-FDA: 78128345064.03.EB292E0 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf18.hostedemail.com (Postfix) with ESMTP id 3E9992000262 for ; Tue, 11 May 2021 08:56:53 +0000 (UTC) Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4FfWsx6q5WzPwZW; Tue, 11 May 2021 16:53:25 +0800 (CST) Received: from [10.174.177.210] (10.174.177.210) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.498.0; Tue, 11 May 2021 16:56:41 +0800 Subject: Re: [PATCH] mm/memory-failure: make sure wait for page writeback in memory_failure To: Jan Kara CC: , , , , , , , , , References: <20210511070329.2002597-1-yangerkun@huawei.com> <20210511084600.GG24154@quack2.suse.cz> From: yangerkun Message-ID: Date: Tue, 11 May 2021 16:56:41 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20210511084600.GG24154@quack2.suse.cz> Content-Type: text/plain; charset="utf-8"; format=flowed X-Originating-IP: [10.174.177.210] X-CFilter-Loop: Reflected Authentication-Results: imf18.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=huawei.com; spf=pass (imf18.hostedemail.com: domain of yangerkun@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=yangerkun@huawei.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 3E9992000262 X-Stat-Signature: 7igz6bfpki1irgumts5qredtq19ucc4c Received-SPF: none (huawei.com>: No applicable sender policy available) receiver=imf18; identity=mailfrom; envelope-from=""; helo=szxga05-in.huawei.com; client-ip=45.249.212.191 X-HE-DKIM-Result: none/none X-HE-Tag: 1620723413-469735 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: =E5=9C=A8 2021/5/11 16:46, Jan Kara =E5=86=99=E9=81=93: > On Tue 11-05-21 15:03:29, yangerkun wrote: >> Our syzkaller trigger the "BUG_ON(!list_empty(&inode->i_wb_list))" in >> clear_inode: >> >> [ 292.016156] ------------[ cut here ]------------ >> [ 292.017144] kernel BUG at fs/inode.c:519! >> [ 292.017860] Internal error: Oops - BUG: 0 [#1] SMP >> [ 292.018741] Dumping ftrace buffer: >> [ 292.019577] (ftrace buffer empty) >> [ 292.020430] Modules linked in: >> [ 292.021748] Process syz-executor.0 (pid: 249, stack limit =3D >> 0x00000000a12409d7) >> [ 292.023719] CPU: 1 PID: 249 Comm: syz-executor.0 Not tainted 4.19.9= 5 >> [ 292.025206] Hardware name: linux,dummy-virt (DT) >> [ 292.026176] pstate: 80000005 (Nzcv daif -PAN -UAO) >> [ 292.027244] pc : clear_inode+0x280/0x2a8 >> [ 292.028045] lr : clear_inode+0x280/0x2a8 >> [ 292.028877] sp : ffff8003366c7950 >> [ 292.029582] x29: ffff8003366c7950 x28: 0000000000000000 >> [ 292.030570] x27: ffff80032b5f4708 x26: ffff80032b5f4678 >> [ 292.031863] x25: ffff80036ae6b300 x24: ffff8003689254d0 >> [ 292.032902] x23: ffff80036ae69d80 x22: 0000000000033cc8 >> [ 292.033928] x21: 0000000000000000 x20: ffff80032b5f47a0 >> [ 292.034941] x19: ffff80032b5f4678 x18: 0000000000000000 >> [ 292.035958] x17: 0000000000000000 x16: 0000000000000000 >> [ 292.037102] x15: 0000000000000000 x14: 0000000000000000 >> [ 292.038103] x13: 0000000000000004 x12: 0000000000000000 >> [ 292.039137] x11: 1ffff00066cd8f52 x10: 1ffff00066cd8ec8 >> [ 292.040216] x9 : dfff200000000000 x8 : ffff10006ac1e86a >> [ 292.041432] x7 : dfff200000000000 x6 : ffff100066cd8f1e >> [ 292.042516] x5 : dfff200000000000 x4 : ffff80032b5f47a0 >> [ 292.043525] x3 : ffff200008000000 x2 : ffff200009867000 >> [ 292.044560] x1 : ffff8003366bb000 x0 : 0000000000000000 >> [ 292.045569] Call trace: >> [ 292.046083] clear_inode+0x280/0x2a8 >> [ 292.046828] ext4_clear_inode+0x38/0xe8 >> [ 292.047593] ext4_free_inode+0x130/0xc68 >> [ 292.048383] ext4_evict_inode+0xb20/0xcb8 >> [ 292.049162] evict+0x1a8/0x3c0 >> [ 292.049761] iput+0x344/0x460 >> [ 292.050350] do_unlinkat+0x260/0x410 >> [ 292.051042] __arm64_sys_unlinkat+0x6c/0xc0 >> [ 292.051846] el0_svc_common+0xdc/0x3b0 >> [ 292.052570] el0_svc_handler+0xf8/0x160 >> [ 292.053303] el0_svc+0x10/0x218 >> [ 292.053908] Code: 9413f4a9 d503201f f90017b6 97f4d5b1 (d4210000) >> [ 292.055471] ---[ end trace 01b339dd07795f8d ]--- >> [ 292.056443] Kernel panic - not syncing: Fatal exception >> [ 292.057488] SMP: stopping secondary CPUs >> [ 292.058419] Dumping ftrace buffer: >> [ 292.059078] (ftrace buffer empty) >> [ 292.059756] Kernel Offset: disabled >> [ 292.060443] CPU features: 0x10,a1006000 >> [ 292.061195] Memory Limit: none >> [ 292.061794] Rebooting in 86400 seconds.. >> >> Crash of this problem show that someone call __munlock_pagevec to clea= r >> page LRU. >> >> #0 [ffff80035f02f4c0] __switch_to at ffff20000808d020 >> #1 [ffff80035f02f4f0] __schedule at ffff20000985102c >> #2 [ffff80035f02f5e0] schedule at ffff200009851d1c >> #3 [ffff80035f02f600] io_schedule at ffff2000098525c0 >> #4 [ffff80035f02f620] __lock_page at ffff20000842d2d4 >> #5 [ffff80035f02f710] __munlock_pagevec at ffff2000084c4600 >> #6 [ffff80035f02f870] munlock_vma_pages_range at ffff2000084c5928 >> #7 [ffff80035f02fa60] do_munmap at ffff2000084cbdf4 >> #8 [ffff80035f02faf0] mmap_region at ffff2000084ce20c >> #9 [ffff80035f02fb90] do_mmap at ffff2000084cf018 >> >> So memory_failure will call identify_page_state without >> wait_on_page_writeback. And after generic_truncate_error_page clear th= e > ^^^ this seems to be > truncate_error_page() these days... Or did you mean > generic_error_remove_page()? Yeah. You are right. Will change in next version! >=20 >> mapping of this page. end_page_writeback won't call >> sb_clear_inode_writeback to clear inode->i_wb_list. That will trigger >> BUG_ON in clear_inode! >=20 > We definitely need to wait for writeback of these pages and the change = you > suggest makes sense to me. I'm just not sure whether the only problem w= ith > these "pages in the process of being munlocked()" cannot confuse the st= ate > machinery in memory_failure() also in some other way. Also I'm not sure= if > are really allowed to call wait_on_page_writeback() on just any page th= at > hits memory_failure() - there can be slab pages, anon pages, completely > unknown pages given out by page allocator to device drivers etc. That n= eeds > someone more familiar with these MM details than me. Agree with you. The two problem you suggest seems hard to me... Maybe=20 someone more familiar with MM can help us... >=20 > Honza >=20 >> >> Fix it by move the wait_on_page_writeback before check of LRU. >> >> Signed-off-by: yangerkun >> --- >> mm/memory-failure.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index bd3945446d47..9870a22800d9 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1527,15 +1527,15 @@ int memory_failure(unsigned long pfn, int flag= s) >> return 0; >> } >> =20 >> - if (!PageTransTail(p) && !PageLRU(p)) >> - goto identify_page_state; >> - >> /* >> * It's very difficult to mess with pages currently under IO >> * and in many cases impossible, so we just avoid it here. >> */ >> wait_on_page_writeback(p); >> =20 >> + if (!PageTransTail(p) && !PageLRU(p)) >> + goto identify_page_state; >> + >> /* >> * Now take care of user space mappings. >> * Abort on fail: __delete_from_page_cache() assumes unmapped page. >> --=20 >> 2.25.4 >>