From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D181C2D0E1 for ; Wed, 16 Sep 2020 02:17:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 77C0A206C9 for ; Wed, 16 Sep 2020 02:17:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 77C0A206C9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 146B86B0037; Tue, 15 Sep 2020 22:17:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F5D26B0055; Tue, 15 Sep 2020 22:17:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F00746B005A; Tue, 15 Sep 2020 22:17:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0155.hostedemail.com [216.40.44.155]) by kanga.kvack.org (Postfix) with ESMTP id CF7BC6B0037 for ; Tue, 15 Sep 2020 22:17:09 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 94B4D8249980 for ; Wed, 16 Sep 2020 02:17:09 +0000 (UTC) X-FDA: 77267312178.22.crowd35_360699127116 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 661DE18038E60 for ; Wed, 16 Sep 2020 02:17:09 +0000 (UTC) X-HE-Tag: crowd35_360699127116 X-Filterd-Recvd-Size: 7804 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Wed, 16 Sep 2020 02:17:05 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.76,430,1592841600"; d="scan'208";a="99286199" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 16 Sep 2020 10:17:04 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id 2B9A24CE34D2; Wed, 16 Sep 2020 10:17:04 +0800 (CST) Received: from irides.mr (10.167.225.141) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 16 Sep 2020 10:17:02 +0800 Subject: Re: [RFC PATCH 2/4] pagemap: introduce ->memory_failure() To: "Darrick J. Wong" CC: , , , , , , , , , , References: <20200915101311.144269-1-ruansy.fnst@cn.fujitsu.com> <20200915101311.144269-3-ruansy.fnst@cn.fujitsu.com> <20200915163104.GG7964@magnolia> From: Ruan Shiyang Message-ID: <7afb6987-17f7-feb7-1ca9-05ff84185086@cn.fujitsu.com> Date: Wed, 16 Sep 2020 10:15:52 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <20200915163104.GG7964@magnolia> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US X-Originating-IP: [10.167.225.141] X-ClientProxiedBy: G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) To G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) X-yoursite-MailScanner-ID: 2B9A24CE34D2.AAD81 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Rspamd-Queue-Id: 661DE18038E60 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2020/9/16 =E4=B8=8A=E5=8D=8812:31, Darrick J. Wong wrote: > On Tue, Sep 15, 2020 at 06:13:09PM +0800, Shiyang Ruan wrote: >> When memory-failure occurs, we call this function which is implemented >> by each devices. For fsdax, pmem device implements it. Pmem device >> will find out the block device where the error page located in, gets t= he >> filesystem on this block device, and finally call ->storage_lost() to >> handle the error in filesystem layer. >> >> Normally, a pmem device may contain one or more partitions, each >> partition contains a block device, each block device contains a >> filesystem. So we are able to find out the filesystem by one offset o= n >> this pmem device. However, in other cases, such as mapped device, I >> didn't find a way to obtain the filesystem laying on it. It is a >> problem need to be fixed. >> >> Signed-off-by: Shiyang Ruan >> --- >> block/genhd.c | 12 ++++++++++++ >> drivers/nvdimm/pmem.c | 31 +++++++++++++++++++++++++++++++ >> include/linux/genhd.h | 2 ++ >> include/linux/memremap.h | 3 +++ >> 4 files changed, 48 insertions(+) >> >> diff --git a/block/genhd.c b/block/genhd.c >> index 99c64641c314..e7442b60683e 100644 >> --- a/block/genhd.c >> +++ b/block/genhd.c >> @@ -1063,6 +1063,18 @@ struct block_device *bdget_disk(struct gendisk = *disk, int partno) >> } >> EXPORT_SYMBOL(bdget_disk); >> =20 >> +struct block_device *bdget_disk_sector(struct gendisk *disk, sector_t= sector) >> +{ >> + struct block_device *bdev =3D NULL; >> + struct hd_struct *part =3D disk_map_sector_rcu(disk, sector); >> + >> + if (part) >> + bdev =3D bdget(part_devt(part)); >> + >> + return bdev; >> +} >> +EXPORT_SYMBOL(bdget_disk_sector); >> + >> /* >> * print a full list of all partitions - intended for places where t= he root >> * filesystem can't be mounted and thus to give the victim some idea= of what >> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c >> index fab29b514372..3ed96486c883 100644 >> --- a/drivers/nvdimm/pmem.c >> +++ b/drivers/nvdimm/pmem.c >> @@ -364,9 +364,40 @@ static void pmem_release_disk(void *__pmem) >> put_disk(pmem->disk); >> } >> =20 >> +static int pmem_pagemap_memory_failure(struct dev_pagemap *pgmap, >> + struct mf_recover_controller *mfrc) >> +{ >> + struct pmem_device *pdev; >> + struct block_device *bdev; >> + sector_t disk_sector; >> + loff_t bdev_offset; >> + >> + pdev =3D container_of(pgmap, struct pmem_device, pgmap); >> + if (!pdev->disk) >> + return -ENXIO; >> + >> + disk_sector =3D (PFN_PHYS(mfrc->pfn) - pdev->phys_addr) >> SECTOR_SH= IFT; >=20 > Ah, I see, looking at the current x86 MCE code, the MCE handler gets a > physical address, which is then rounded down to a PFN, which is then > blown back up into a byte address(?) and then rounded down to sectors. > That is then blown back up into a byte address and passed on to XFS, > which rounds it down to fs blocksize. >=20 > /me wishes that wasn't so convoluted, but reforming the whole mm poison > system to have smaller blast radii isn't the purpose of this patch. :) >=20 >> + bdev =3D bdget_disk_sector(pdev->disk, disk_sector); >> + if (!bdev) >> + return -ENXIO; >> + >> + // TODO what if block device contains a mapped device >=20 > Find its dev_pagemap_ops and invoke its memory_failure function? ;) Thanks for pointing out. I'll think about it in this way. >=20 >> + if (!bdev->bd_super) >> + goto out; >> + >> + bdev_offset =3D ((disk_sector - get_start_sect(bdev)) << SECTOR_SHIF= T) - >> + pdev->data_offset; >> + bdev->bd_super->s_op->storage_lost(bdev->bd_super, bdev_offset, mfrc= ); >=20 > ->storage_lost is required for all filesystems? I think it is required for filesystems that support fsdax, since the=20 owner tracking is moved here. But anyway, there should have a non-NULL=20 judgment. -- Thanks, Ruan Shiyang. >=20 > --D >=20 >> + >> +out: >> + bdput(bdev); >> + return 0; >> +} >> + >> static const struct dev_pagemap_ops fsdax_pagemap_ops =3D { >> .kill =3D pmem_pagemap_kill, >> .cleanup =3D pmem_pagemap_cleanup, >> + .memory_failure =3D pmem_pagemap_memory_failure, >> }; >> =20 >> static int pmem_attach_disk(struct device *dev, >> diff --git a/include/linux/genhd.h b/include/linux/genhd.h >> index 4ab853461dff..16e9e13e0841 100644 >> --- a/include/linux/genhd.h >> +++ b/include/linux/genhd.h >> @@ -303,6 +303,8 @@ static inline void add_disk_no_queue_reg(struct ge= ndisk *disk) >> extern void del_gendisk(struct gendisk *gp); >> extern struct gendisk *get_gendisk(dev_t dev, int *partno); >> extern struct block_device *bdget_disk(struct gendisk *disk, int par= tno); >> +extern struct block_device *bdget_disk_sector(struct gendisk *disk, >> + sector_t sector); >> =20 >> extern void set_device_ro(struct block_device *bdev, int flag); >> extern void set_disk_ro(struct gendisk *disk, int flag); >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >> index 5f5b2df06e61..efebefa70d00 100644 >> --- a/include/linux/memremap.h >> +++ b/include/linux/memremap.h >> @@ -6,6 +6,7 @@ >> =20 >> struct resource; >> struct device; >> +struct mf_recover_controller; >> =20 >> /** >> * struct vmem_altmap - pre-allocated storage for vmemmap_populate >> @@ -87,6 +88,8 @@ struct dev_pagemap_ops { >> * the page back to a CPU accessible page. >> */ >> vm_fault_t (*migrate_to_ram)(struct vm_fault *vmf); >> + int (*memory_failure)(struct dev_pagemap *pgmap, >> + struct mf_recover_controller *mfrc); >> }; >> =20 >> #define PGMAP_ALTMAP_VALID (1 << 0) >> --=20 >> 2.28.0 >> >> >> >=20 >=20