From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9345C83000 for ; Tue, 28 Apr 2020 11:24:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 48922206D6 for ; Tue, 28 Apr 2020 11:24:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 48922206D6 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=fromorbit.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A29CA8E0005; Tue, 28 Apr 2020 07:24:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9DA5F8E0001; Tue, 28 Apr 2020 07:24:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F18D8E0005; Tue, 28 Apr 2020 07:24:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id 77B848E0001 for ; Tue, 28 Apr 2020 07:24:47 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3004482499A8 for ; Tue, 28 Apr 2020 11:24:47 +0000 (UTC) X-FDA: 76757031414.18.bells19_c9446c362e45 X-HE-Tag: bells19_c9446c362e45 X-Filterd-Recvd-Size: 5989 Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Tue, 28 Apr 2020 11:24:46 +0000 (UTC) Received: from dread.disaster.area (pa49-195-157-175.pa.nsw.optusnet.com.au [49.195.157.175]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id C431082144B; Tue, 28 Apr 2020 21:24:42 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1jTOLx-0004Ur-Rg; Tue, 28 Apr 2020 21:24:41 +1000 Date: Tue, 28 Apr 2020 21:24:41 +1000 From: Dave Chinner To: Matthew Wilcox Cc: Ruan Shiyang , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-mm@kvack.org" , "linux-fsdevel@vger.kernel.org" , "darrick.wong@oracle.com" , "dan.j.williams@intel.com" , "hch@lst.de" , "rgoldwyn@suse.de" , "Qi, Fuli" , "Gotou, Yasunori" Subject: Re: =?utf-8?B?5Zue5aSNOiBSZQ==?= =?utf-8?Q?=3A?= [RFC PATCH 0/8] dax: Add a dax-rmap tree to support reflink Message-ID: <20200428112441.GH2040@dread.disaster.area> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> <20200427122836.GD29705@bombadil.infradead.org> <20200428064318.GG2040@dread.disaster.area> <259fe633-e1ff-b279-cd8c-1a81eaa40941@cn.fujitsu.com> <20200428111636.GK29705@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200428111636.GK29705@bombadil.infradead.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=W5xGqiek c=1 sm=1 tr=0 a=ONQRW0k9raierNYdzxQi9Q==:117 a=ONQRW0k9raierNYdzxQi9Q==:17 a=IkcTkHD0fZMA:10 a=cl8xLZFz6L8A:10 a=5KLPUuaC_9wA:10 a=JfrnYn6hAAAA:8 a=7-415B0cAAAA:8 a=cxSUJwBmEXFeedIZ3DoA:9 a=QEXdDO2ut3YA:10 a=1CNFftbPRP8L7MoqJWF3:22 a=biEYGPWJfzWAr4FL6Ov7:22 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 28, 2020 at 04:16:36AM -0700, Matthew Wilcox wrote: > On Tue, Apr 28, 2020 at 05:32:41PM +0800, Ruan Shiyang wrote: > > On 2020/4/28 =E4=B8=8B=E5=8D=882:43, Dave Chinner wrote: > > > On Tue, Apr 28, 2020 at 06:09:47AM +0000, Ruan, Shiyang wrote: > > > > =E5=9C=A8 2020/4/27 20:28:36, "Matthew Wilcox" =E5=86=99=E9=81=93: > > > > > On Mon, Apr 27, 2020 at 04:47:42PM +0800, Shiyang Ruan wrote: > > > > > > This patchset is a try to resolve the shared 'page cache' p= roblem for > > > > > > fsdax. > > > > > >=20 > > > > > > In order to track multiple mappings and indexes on one page= , I > > > > > > introduced a dax-rmap rb-tree to manage the relationship. = A dax entry > > > > > > will be associated more than once if is shared. At the sec= ond time we > > > > > > associate this entry, we create this rb-tree and store its = root in > > > > > > page->private(not used in fsdax). Insert (->mapping, ->ind= ex) when > > > > > > dax_associate_entry() and delete it when dax_disassociate_e= ntry(). > > > > >=20 > > > > > Do we really want to track all of this on a per-page basis? I = would > > > > > have thought a per-extent basis was more useful. Essentially, = create > > > > > a new address_space for each shared extent. Per page just seem= s like > > > > > a huge overhead. > > > > >=20 > > > > Per-extent tracking is a nice idea for me. I haven't thought of = it > > > > yet... > > > >=20 > > > > But the extent info is maintained by filesystem. I think we need= a way > > > > to obtain this info from FS when associating a page. May be a bi= t > > > > complicated. Let me think about it... > > >=20 > > > That's why I want the -user of this association- to do a filesystem > > > callout instead of keeping it's own naive tracking infrastructure. > > > The filesystem can do an efficient, on-demand reverse mapping looku= p > > > from it's own extent tracking infrastructure, and there's zero > > > runtime overhead when there are no errors present. > > >=20 > > > At the moment, this "dax association" is used to "report" a storage > > > media error directly to userspace. I say "report" because what it > > > does is kill userspace processes dead. The storage media error > > > actually needs to be reported to the owner of the storage media, > > > which in the case of FS-DAX is the filesytem. > >=20 > > Understood. > >=20 > > BTW, this is the usage in memory-failure, so what about rmap? I have= not > > found how to use this tracking in rmap. Do you have any ideas? > >=20 > > >=20 > > > That way the filesystem can then look up all the owners of that bad > > > media range (i.e. the filesystem block it corresponds to) and take > > > appropriate action. e.g. > >=20 > > I tried writing a function to look up all the owners' info of one blo= ck in > > xfs for memory-failure use. It was dropped in this patchset because = I found > > out that this lookup function needs 'rmapbt' to be enabled when mkfs.= But > > by default, rmapbt is disabled. I am not sure if it matters... >=20 > I'm pretty sure you can't have shared extents on an XFS filesystem if y= ou > _don't_ have the rmapbt feature enabled. I mean, that's why it exists. You're confusing reflink with rmap. :) rmapbt does all the reverse mapping tracking, reflink just does the shared data extent tracking. But given that anyone who wants to use DAX with reflink is going to have to mkfs their filesystem anyway (to turn on reflink) requiring that rmapbt is also turned on is not a big deal. Especially as we can check it at mount time in the kernel... Cheers, Dave. --=20 Dave Chinner david@fromorbit.com