From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DF9AC43461 for ; Wed, 16 Sep 2020 02:16:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0A1222087D for ; Wed, 16 Sep 2020 02:16:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0A1222087D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4A0D76B0003; Tue, 15 Sep 2020 22:16:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 452886B0037; Tue, 15 Sep 2020 22:16:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33EDE6B0055; Tue, 15 Sep 2020 22:16:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0082.hostedemail.com [216.40.44.82]) by kanga.kvack.org (Postfix) with ESMTP id 1B9C76B0003 for ; Tue, 15 Sep 2020 22:16:56 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D8FBB33CD for ; Wed, 16 Sep 2020 02:16:55 +0000 (UTC) X-FDA: 77267311590.26.plane21_3102acb27116 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id A5E111804B669 for ; Wed, 16 Sep 2020 02:16:55 +0000 (UTC) X-HE-Tag: plane21_3102acb27116 X-Filterd-Recvd-Size: 8511 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Wed, 16 Sep 2020 02:16:53 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.76,430,1592841600"; d="scan'208";a="99286190" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 16 Sep 2020 10:16:51 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id 5F18648990F1; Wed, 16 Sep 2020 10:16:48 +0800 (CST) Received: from irides.mr (10.167.225.141) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 16 Sep 2020 10:16:46 +0800 Subject: Re: [RFC PATCH 1/4] fs: introduce ->storage_lost() for memory-failure To: "Darrick J. Wong" CC: , , , , , , , , , , References: <20200915101311.144269-1-ruansy.fnst@cn.fujitsu.com> <20200915101311.144269-2-ruansy.fnst@cn.fujitsu.com> <20200915161606.GD7955@magnolia> From: Ruan Shiyang Message-ID: Date: Wed, 16 Sep 2020 10:15:36 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <20200915161606.GD7955@magnolia> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US X-Originating-IP: [10.167.225.141] X-ClientProxiedBy: G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) To G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) X-yoursite-MailScanner-ID: 5F18648990F1.A96BC X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Rspamd-Queue-Id: A5E111804B669 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2020/9/16 =E4=B8=8A=E5=8D=8812:16, Darrick J. Wong wrote: > On Tue, Sep 15, 2020 at 06:13:08PM +0800, Shiyang Ruan wrote: >> This function is used to handle errors which may cause data lost in >> filesystem. Such as memory-failure in fsdax mode. >> >> In XFS, it requires "rmapbt" feature in order to query for files or >> metadata which associated to the error block. Then we could call fs >> recover functions to try to repair the damaged data.(did not implement= ed >> in this patch) >> >> After that, the memory-failure also needs to kill processes who are >> using those files. The struct mf_recover_controller is created to sto= re >> necessary parameters. >> >> Signed-off-by: Shiyang Ruan >> --- >> fs/xfs/xfs_super.c | 80 ++++++++++++++++++++++++++++++++++++++++++++= ++ >> include/linux/fs.h | 1 + >> include/linux/mm.h | 6 ++++ >> 3 files changed, 87 insertions(+) >> >> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c >> index 71ac6c1cdc36..118d9c5d9e1e 100644 >> --- a/fs/xfs/xfs_super.c >> +++ b/fs/xfs/xfs_super.c >> @@ -35,6 +35,10 @@ >> #include "xfs_refcount_item.h" >> #include "xfs_bmap_item.h" >> #include "xfs_reflink.h" >> +#include "xfs_alloc.h" >> +#include "xfs_rmap.h" >> +#include "xfs_rmap_btree.h" >> +#include "xfs_bit.h" >> =20 >> #include >> #include >> @@ -1104,6 +1108,81 @@ xfs_fs_free_cached_objects( >> return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan); >> } >> =20 >> +static int >> +xfs_storage_lost_helper( >> + struct xfs_btree_cur *cur, >> + struct xfs_rmap_irec *rec, >> + void *priv) >> +{ >> + struct xfs_inode *ip; >> + struct mf_recover_controller *mfrc =3D priv; >> + int rc =3D 0; >> + >> + if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner)) { >> + // TODO check and try to fix metadata >> + } else { >> + /* >> + * Get files that incore, filter out others that are not in use. >> + */ >> + xfs_iget(cur->bc_mp, cur->bc_tp, rec->rm_owner, XFS_IGET_INCORE, >> + 0, &ip); >=20 > Missing return value check here. Yes, I ignored it. My fault. >=20 >> + if (!ip) >> + return 0; >> + if (!VFS_I(ip)->i_mapping) >> + goto out; >> + >> + rc =3D mfrc->recover_fn(mfrc->pfn, mfrc->flags, >> + VFS_I(ip)->i_mapping, rec->rm_offset); >> + >> + // TODO try to fix data >> +out: >> + xfs_irele(ip); >> + } >> + >> + return rc; >> +} >> + >> +static int >> +xfs_fs_storage_lost( >> + struct super_block *sb, >> + loff_t offset, >=20 > offset into which device? XFS supports three... >=20 > I'm also a little surprised you don't pass in a length. >=20 > I guess that means this function will get called repeatedly for every > byte in the poisoned range? The memory-failure will triggered on one PFN each time, so its length=20 could be one PAGE_SIZE. But I think you are right, it's better to tell=20 filesystem how much range is effected and need to be fixed. I didn't=20 know but I think there may be some other cases besides memory-failure.=20 So the length is necessary. >=20 >> + void *priv) >> +{ >> + struct xfs_mount *mp =3D XFS_M(sb); >> + struct xfs_trans *tp =3D NULL; >> + struct xfs_btree_cur *cur =3D NULL; >> + struct xfs_rmap_irec rmap_low, rmap_high; >> + struct xfs_buf *agf_bp =3D NULL; >> + xfs_fsblock_t fsbno =3D XFS_B_TO_FSB(mp, offset); >> + xfs_agnumber_t agno =3D XFS_FSB_TO_AGNO(mp, fsbno); >> + xfs_agblock_t agbno =3D XFS_FSB_TO_AGBNO(mp, fsbno); >> + int error =3D 0; >> + >> + error =3D xfs_trans_alloc_empty(mp, &tp); >> + if (error) >> + return error; >> + >> + error =3D xfs_alloc_read_agf(mp, tp, agno, 0, &agf_bp); >> + if (error) >> + return error; >> + >> + cur =3D xfs_rmapbt_init_cursor(mp, tp, agf_bp, agno); >=20 > ...and this is definitely the wrong call sequence if the malfunctioning > device is the realtime device. If a dax rt device dies, you'll be > shooting down files on the data device, which will cause all sorts of > problems. I didn't notice that. Let me think about it. >=20 > Question: Should all this poison recovery stuff go into a new file? > xfs_poison.c? There's already a lot of code in xfs_super.c. Yes, it's a bit too much. I'll move them into a new file. -- Thanks, Ruan Shiyang. >=20 > --D >=20 >> + >> + /* Construct a range for rmap query */ >> + memset(&rmap_low, 0, sizeof(rmap_low)); >> + memset(&rmap_high, 0xFF, sizeof(rmap_high)); >> + rmap_low.rm_startblock =3D rmap_high.rm_startblock =3D agbno; >> + >> + error =3D xfs_rmap_query_range(cur, &rmap_low, &rmap_high, >> + xfs_storage_lost_helper, priv); >> + if (error =3D=3D -ECANCELED) >> + error =3D 0; >> + >> + xfs_btree_del_cursor(cur, error); >> + xfs_trans_brelse(tp, agf_bp); >> + return error; >> +} >> + >> static const struct super_operations xfs_super_operations =3D { >> .alloc_inode =3D xfs_fs_alloc_inode, >> .destroy_inode =3D xfs_fs_destroy_inode, >> @@ -1117,6 +1196,7 @@ static const struct super_operations xfs_super_o= perations =3D { >> .show_options =3D xfs_fs_show_options, >> .nr_cached_objects =3D xfs_fs_nr_cached_objects, >> .free_cached_objects =3D xfs_fs_free_cached_objects, >> + .storage_lost =3D xfs_fs_storage_lost, >> }; >> =20 >> static int >> diff --git a/include/linux/fs.h b/include/linux/fs.h >> index e019ea2f1347..bd90485cfdc9 100644 >> --- a/include/linux/fs.h >> +++ b/include/linux/fs.h >> @@ -1951,6 +1951,7 @@ struct super_operations { >> struct shrink_control *); >> long (*free_cached_objects)(struct super_block *, >> struct shrink_control *); >> + int (*storage_lost)(struct super_block *sb, loff_t offset, void *pri= v); >> }; >> =20 >> /* >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index 1983e08f5906..3f0c36e1bf3d 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -3002,6 +3002,12 @@ extern void shake_page(struct page *p, int acce= ss); >> extern atomic_long_t num_poisoned_pages __read_mostly; >> extern int soft_offline_page(unsigned long pfn, int flags); >> =20 >> +struct mf_recover_controller { >> + int (*recover_fn)(unsigned long pfn, int flags, >> + struct address_space *mapping, pgoff_t index); >> + unsigned long pfn; >> + int flags; >> +}; >> =20 >> /* >> * Error handlers for various types of pages. >> --=20 >> 2.28.0 >> >> >> >=20 >=20