From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66CA9C54EE9 for ; Mon, 19 Sep 2022 16:12:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A1AC4940007; Mon, 19 Sep 2022 12:12:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9CA846B0072; Mon, 19 Sep 2022 12:12:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8444A940007; Mon, 19 Sep 2022 12:12:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 73AB46B0071 for ; Mon, 19 Sep 2022 12:12:20 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4459112020B for ; Mon, 19 Sep 2022 16:12:20 +0000 (UTC) X-FDA: 79929327240.27.EACC341 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf13.hostedemail.com (Postfix) with ESMTP id DD60520042 for ; Mon, 19 Sep 2022 16:12:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663603939; x=1695139939; h=date:from:to:cc:subject:message-id:references: in-reply-to:mime-version; bh=VN0E0H5V3LgM+cMDZKfSfUqI54GPl39S5DCoHTJfYUA=; b=WeDgphJrT0/GD4ZFVnIr3g2iaH9Fw1GgZqoiKP6YTFOyryc97PE+lRau SQvyzvZioMD3dVXOoESQgY7fQjN8KrG96/+TSNu0q+LlxjJ5sZ6P1mJpb Rmc+dLdNJjNW6+my5mgAZ59P5T+NDw1DHCtX6z3TnlBqv7BgKotP/edzT icZKiLgvUh71aGZ8LvzGfVLytxdFXDSEW2W1mmvRXAar8BWpE3VxNdbUt NrBsFUHVQTUqLGTnHw86i5s9s+vudInn/woeBrPXJHD1E+UvK8PsaJHzp KZO4/u9QF043Eql2IhaNKHAg71S3s7FmdnNPGADb+sd9xL/L5z0rYz49o g==; X-IronPort-AV: E=McAfee;i="6500,9779,10475"; a="298170907" X-IronPort-AV: E=Sophos;i="5.93,328,1654585200"; d="scan'208";a="298170907" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2022 09:11:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,328,1654585200"; d="scan'208";a="620908317" Received: from fmsmsx602.amr.corp.intel.com ([10.18.126.82]) by fmsmga007.fm.intel.com with ESMTP; 19 Sep 2022 09:11:54 -0700 Received: from fmsmsx611.amr.corp.intel.com (10.18.126.91) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 19 Sep 2022 09:11:54 -0700 Received: from fmsmsx603.amr.corp.intel.com (10.18.126.83) by fmsmsx611.amr.corp.intel.com (10.18.126.91) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 19 Sep 2022 09:11:53 -0700 Received: from fmsedg601.ED.cps.intel.com (10.1.192.135) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31 via Frontend Transport; Mon, 19 Sep 2022 09:11:53 -0700 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (104.47.73.43) by edgegateway.intel.com (192.55.55.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2375.31; Mon, 19 Sep 2022 09:11:53 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BrnQReM7ch1Qp2y8X/GpK3zZ910uFasvTHS7xUyOv1nslWaIluB7T+QPOxFlFZ7pL/59QiVKzc4QG0Xnz7aHSOjD4O6DZCHrLce3Jut8SV2d2sSTHe2MZ/T6ZvWri4KxOZcqiDB+K4KHHPPREUYHxuCyCQIfQiVOtNM4r/nIWHBRQv0JRBOSG2Za0/3fvVmWbZrBUdIRu17bA3cDREkRWJnoNJvD3tsFfH65DwX1SLPMKg9ni+d+ZoI8iG6hQSEFlBhPtJqd9L7FjFeDBMlNoZPaj/SBABoXBisOiX2Pgh0fhJnyeQboPs5zVl7u/LBZa6Opt81nD4nk7IjD26sElQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DPcGIQmAaOH1CraeAEBL8KL2/8nG8ph3hbnGDKlHHMU=; b=E/tdjWZ8m8xaB6tmwCoJo/TR4EpjX/iKzAmOVeauxlw3ALstS1m3azzuV1pNQMtuZBQEftb2Q9XcFfOIfAeKcesdmSqrKN3edXFGMcQQgahkwpaAemttWbfkJ1AhcCPkP05vthvCOfFFGNgbnEejev5jrmeReFmd13Vt5aMQyx8jU6Y8e0bMwZUQ703odEoqczsi3j9BMU+GbuZ4cT/ezGub4SwGurkkraTqkOeTiBuKul4v+IkOTDwx92gZWkRuykxdqlVwgZgIfm0xFkoZQnHNFU9DNX4nUy4WE7CYYioBj/M8BYbwT3F3XwOjWHL7g/ENprc2TZOD6g6YfAEwOQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from MWHPR1101MB2126.namprd11.prod.outlook.com (2603:10b6:301:50::20) by BN9PR11MB5465.namprd11.prod.outlook.com (2603:10b6:408:11e::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5632.21; Mon, 19 Sep 2022 16:11:51 +0000 Received: from MWHPR1101MB2126.namprd11.prod.outlook.com ([fe80::9847:345e:4c5b:ca12]) by MWHPR1101MB2126.namprd11.prod.outlook.com ([fe80::9847:345e:4c5b:ca12%6]) with mapi id 15.20.5632.021; Mon, 19 Sep 2022 16:11:51 +0000 Date: Mon, 19 Sep 2022 09:11:48 -0700 From: Dan Williams To: Dave Chinner , Dan Williams CC: , Matthew Wilcox , "Jan Kara" , "Darrick J. Wong" , Jason Gunthorpe , Christoph Hellwig , John Hubbard , , , , , Subject: Re: [PATCH v2 05/18] xfs: Add xfs_break_layouts() to the inode eviction path Message-ID: <632894c4738d8_2a6ded294a@dwillia2-xfh.jf.intel.com.notmuch> References: <166329930818.2786261.6086109734008025807.stgit@dwillia2-xfh.jf.intel.com> <166329933874.2786261.18236541386474985669.stgit@dwillia2-xfh.jf.intel.com> <20220918225731.GG3600936@dread.disaster.area> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20220918225731.GG3600936@dread.disaster.area> X-ClientProxiedBy: SJ0PR03CA0027.namprd03.prod.outlook.com (2603:10b6:a03:33a::32) To MWHPR1101MB2126.namprd11.prod.outlook.com (2603:10b6:301:50::20) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MWHPR1101MB2126:EE_|BN9PR11MB5465:EE_ X-MS-Office365-Filtering-Correlation-Id: 915436a7-b58a-4fc5-ab16-08da9a59a989 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: m7J85kc01GbZYesk64ywtUxCaKywDjU9MMqop+mHGSDiRx+kglENiE3/kM63H4wzC/+vZ7wRZU4qjFOuHkTAthgcAgWIuz+EEmXZXfm/7SqiT7T8zqkvJdnmTq4lMVv6GBVkitPnGDoKvWlIPEs/g/DWju6mH8c2dCd8Y6ZDL+X6O1GVwHaTrDj8WkY2cLwApz19I5cokXrdLhBRqSTzYcHAuL2rHpk/gG9yoGbT4ta6aVLY5uO+vcm310j4/rHuTwGlD5sboVIvL9mrHji2KGJ2ybdkahCcX1YEWfkAjRD3YcRqG505RDBozD+5WD3Su2NM1R0sbMEoc9q7gzAopfEIdUqlMXaChsQk/fiEo74yj0mhsmza/bf+5Rd+NR4TUylR96hlRyzvddlID7IOrR2Ca6GjnQW0M5r/K7/lRsDXaZIY/x9d51n49hPE3VcH+ffsxWS2qVzGniqegXM+VW7MF7r8Q+RuKj/KnP4PiLW32XK0YWtAAGn8czeLyLtKgU5pcnwA5qHzvgGP4QtWttXFobE0Lgxfpov1B1DAazRsyrURGsAao9O4tuCTBnCgI0z+abmd5EwminM0yFAQHveJwdhcwNZpmiQZHtyXLVS7OUajgB3sxxqHQ8IYS3qu0Iq2YOupLKyRJc4sRIjlvPc0tp+fAS8rg+rm4yK4rVxAKu4+2SqAEdwOsobvVIOgGavHsb6r92tJRiTNZffwDQ== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MWHPR1101MB2126.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(376002)(346002)(396003)(366004)(136003)(39860400002)(451199015)(6506007)(4326008)(2906002)(86362001)(38100700002)(8936002)(6512007)(9686003)(6486002)(26005)(66476007)(8676002)(66946007)(41300700001)(6666004)(478600001)(82960400001)(110136005)(83380400001)(66556008)(186003)(5660300002)(54906003)(316002)(7416002);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?WHErHPzxFLpKovVGulo33I3Uj1UqsTYPR9mts+7cAEVm2AJ7Bk5GGt44MsNu?= =?us-ascii?Q?CXz3gyx1OQX+CECh48YqX3xlDAiCKKBxxsDuEp6xKqPvS9k0ZRiK5xjVQbxJ?= =?us-ascii?Q?XkX8O8hIV+2eEUym5wYM/7cEntbicv2GA2QivBVE6RCnFdbl3dw69t+vOhjM?= =?us-ascii?Q?BsD/8hT3/4zmzCWopq0qnRPNJerSrFj6rhYP2tuGJUkf0i6n6SPPljg/9uXj?= =?us-ascii?Q?ueTUOOQRGNpef4gup+nJJsSz4IFXAn36dv95sXH1PSkceLR8tvTSPb9zldtQ?= =?us-ascii?Q?vqF4bXq1CguRsgcYL7cnn0J5DG1Y0wr9m+o4W3JsE/+sxz2TkEOzAaPJ3acP?= =?us-ascii?Q?s2gKd26wy2g50tj7bXx+t6wq2WHg95c0JvFtWocpKyKdXrbXpct9cazkwOO6?= =?us-ascii?Q?c03TCS1GwjeH1Q8hivW2TAZk0vLqcegtN58WkeOfg4Xz9l21hiA/D2kAdzls?= =?us-ascii?Q?71wEpQOgIqxdlt4+TN0zR5UQzAc8JhtSDxWYduZXkcrJcliRpWuQjBeoC/gB?= =?us-ascii?Q?r5gD+Seiy7MnK1P+jlcouZ6ACTiMEUAOuCLYJdaAHj11cplxlkYOtLC9ayoi?= =?us-ascii?Q?pwYu9x7wPD73+7ZjCtxqOoRc2g1KYsD+qxpBVC4WVWeEOX1EIvsNObHPMhcI?= =?us-ascii?Q?2QtukTP7HiE0aw56X1ytS2qPW/eGirSGTnUZ/V5hXmT2pFHbC05MJ49KdEuM?= =?us-ascii?Q?IU5wD/dR+LwC1/72W90QSXj6mM14XBY/0N0qsWOrcltnz2e2jz2w4UfXzBG5?= =?us-ascii?Q?uxBEX2Ioem0dGeS+MBu5S3/SmV+jIOpy1QfD9Jr3j8z7IvuAWKz+s4G0PBtm?= =?us-ascii?Q?ysYl8DM9U35y9DEauq+Durn8q2M7QjwkmccFmeEgUyZPfCi1/ayLkb7/Dotq?= =?us-ascii?Q?5u3P+vQmNRtDytmrQUz4AOKnKOeTKdQAlfFipUCNr3PeslVed81CgEespWxY?= =?us-ascii?Q?xQKsERR1kuK0qjK9ScvIC392dmDVS07mKRL5D3GtvrYUDWMXHdMXu1/3I1nW?= =?us-ascii?Q?R4b9VOvZbQ5kF+f65GzmSN0jZTHKPNbP3IIle6KRkkouTbmJvQsfYE1IGHch?= =?us-ascii?Q?V4svDoJhykAWbIYc07oCQ4rJfk7pqJWKgIcabjKkp6VOUN1G3ZOf29B0SbGW?= =?us-ascii?Q?elFms3R5vA4E2PgfnApD3phrlCIVqrhZXGh5AaE9UobLFl40mOA2g8n/TheM?= =?us-ascii?Q?CYd6OIJgMnOum/PD9bxUJSN5oKZBklX09qbgagttG6k2uey1y0saSPTIrt0b?= =?us-ascii?Q?fwlfpm8DEj0TWmuRORBwRxc6U3mrb+mAXgPRmv2xhPwJeF/W1ZK8ZNYv1yXQ?= =?us-ascii?Q?JVMLoeVFNKq+it5SHN6HAgbZTMjbGap9AcGEkzIRaZq6jsXAOmxyuXZE2jZw?= =?us-ascii?Q?ItZ1MNB0p3eTymeCOOgJ7e6KIQbquQwDfort66/mtZdZPouV8HF2XRnWMs8c?= =?us-ascii?Q?pi9fFmHJj50e0QqwA5Rit5tGTxPfyZjldkveJZCPcA54hQ4cCb4CV6wnwcrQ?= =?us-ascii?Q?/rD/hI9OvmxuCvOT6qsu9e3dNcIRix27sP1qXcoa3yyUU8iVfgxurBnV4Hrl?= =?us-ascii?Q?Zf5Xy31dT6hMSEa1ahl9tBtNz66R9Pyaw//7cf2GLJglhWvaidgUX9Sy0Wy5?= =?us-ascii?Q?ZA=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: 915436a7-b58a-4fc5-ab16-08da9a59a989 X-MS-Exchange-CrossTenant-AuthSource: MWHPR1101MB2126.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Sep 2022 16:11:51.5828 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: JezHI4rgMyDoSGRnr8WK/A1KuAknPtjwfKN4hK+hX+bYSMY4gIqD9izjko0TTliZYOmyLPpAhdRF63uGaqO/tL3QM6MSVr1RLcPY8CE7CUM= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN9PR11MB5465 X-OriginatorOrg: intel.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663603939; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DPcGIQmAaOH1CraeAEBL8KL2/8nG8ph3hbnGDKlHHMU=; b=LlkksWdU1VVl7YnZ+JdcR6NQogfQMNjeTgQGFZSaQMNy3/gVbKIhd2q5W93WRKv0w3fnMH ON7J9kPRf7v7gyVSQ0t+rJ8pX0g2BQc0q335151b2PeBqxj+POfTLNoGaLOESSH7jkUFDD jvfgipnfF9JB66XfDTxPKtapgh7e7oo= ARC-Authentication-Results: i=2; imf13.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=WeDgphJr; dmarc=pass (policy=none) header.from=intel.com; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf13.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1663603939; a=rsa-sha256; cv=fail; b=YidoP+ifDKkZulezIRjUzwywJ74NtQH0eXHwSWOvwXsM51bQo3AJ0cWEqm437uCyGy9h+M sLxy7UnjA1MYNZI9NQuq+PkMQn8GB0JYN9uCr25SeXEUXX9ogpnZUfQo0mUwBPZfgJi0iL V/YDeF6xl9ljihYmXegIgD2ymSpiARE= Authentication-Results: imf13.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=WeDgphJr; dmarc=pass (policy=none) header.from=intel.com; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf13.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com X-Rspam-User: X-Stat-Signature: 66czn7wmt6d7gsh4dngxnp9eh8kc4znz X-Rspamd-Queue-Id: DD60520042 X-Rspamd-Server: rspam12 X-HE-Tag: 1663603938-309248 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Dave Chinner wrote: > On Thu, Sep 15, 2022 at 08:35:38PM -0700, Dan Williams wrote: > > In preparation for moving DAX pages to be 0-based rather than 1-based > > for the idle refcount, the fsdax core wants to have all mappings in a > > "zapped" state before truncate. For typical pages this happens naturally > > via unmap_mapping_range(), for DAX pages some help is needed to record > > this state in the 'struct address_space' of the inode(s) where the page > > is mapped. > > > > That "zapped" state is recorded in DAX entries as a side effect of > > xfs_break_layouts(). Arrange for it to be called before all truncation > > events which already happens for truncate() and PUNCH_HOLE, but not > > truncate_inode_pages_final(). Arrange for xfs_break_layouts() before > > truncate_inode_pages_final(). > > Ugh. That's nasty and awful. > > > > > Cc: Matthew Wilcox > > Cc: Jan Kara > > Cc: "Darrick J. Wong" > > Cc: Jason Gunthorpe > > Cc: Christoph Hellwig > > Cc: John Hubbard > > Signed-off-by: Dan Williams > > --- > > fs/xfs/xfs_file.c | 13 +++++++++---- > > fs/xfs/xfs_inode.c | 3 ++- > > fs/xfs/xfs_inode.h | 6 ++++-- > > fs/xfs/xfs_super.c | 22 ++++++++++++++++++++++ > > 4 files changed, 37 insertions(+), 7 deletions(-) > > > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > index 556e28d06788..d3ff692d5546 100644 > > --- a/fs/xfs/xfs_file.c > > +++ b/fs/xfs/xfs_file.c > > @@ -816,7 +816,8 @@ xfs_wait_dax_page( > > int > > xfs_break_dax_layouts( > > struct inode *inode, > > - bool *retry) > > + bool *retry, > > + int state) > > { > > struct page *page; > > > > @@ -827,8 +828,8 @@ xfs_break_dax_layouts( > > return 0; > > > > *retry = true; > > - return ___wait_var_event(page, dax_page_idle(page), TASK_INTERRUPTIBLE, > > - 0, 0, xfs_wait_dax_page(inode)); > > + return ___wait_var_event(page, dax_page_idle(page), state, 0, 0, > > + xfs_wait_dax_page(inode)); > > } > > > > int > > @@ -839,14 +840,18 @@ xfs_break_layouts( > > { > > bool retry; > > int error; > > + int state = TASK_INTERRUPTIBLE; > > > > ASSERT(xfs_isilocked(XFS_I(inode), XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL)); > > > > do { > > retry = false; > > switch (reason) { > > + case BREAK_UNMAP_FINAL: > > + state = TASK_UNINTERRUPTIBLE; > > + fallthrough; > > case BREAK_UNMAP: > > - error = xfs_break_dax_layouts(inode, &retry); > > + error = xfs_break_dax_layouts(inode, &retry, state); > > if (error || retry) > > break; > > fallthrough; > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > index 28493c8e9bb2..72ce1cb72736 100644 > > --- a/fs/xfs/xfs_inode.c > > +++ b/fs/xfs/xfs_inode.c > > @@ -3452,6 +3452,7 @@ xfs_mmaplock_two_inodes_and_break_dax_layout( > > struct xfs_inode *ip1, > > struct xfs_inode *ip2) > > { > > + int state = TASK_INTERRUPTIBLE; > > int error; > > bool retry; > > struct page *page; > > @@ -3463,7 +3464,7 @@ xfs_mmaplock_two_inodes_and_break_dax_layout( > > retry = false; > > /* Lock the first inode */ > > xfs_ilock(ip1, XFS_MMAPLOCK_EXCL); > > - error = xfs_break_dax_layouts(VFS_I(ip1), &retry); > > + error = xfs_break_dax_layouts(VFS_I(ip1), &retry, state); > > if (error || retry) { > > xfs_iunlock(ip1, XFS_MMAPLOCK_EXCL); > > if (error == 0 && retry) > > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h > > index fa780f08dc89..e4994eb6e521 100644 > > --- a/fs/xfs/xfs_inode.h > > +++ b/fs/xfs/xfs_inode.h > > @@ -454,11 +454,13 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip) > > * layout-holder has a consistent view of the file's extent map. While > > * BREAK_WRITE breaks can be satisfied by recalling FL_LAYOUT leases, > > * BREAK_UNMAP breaks additionally require waiting for busy dax-pages to > > - * go idle. > > + * go idle. BREAK_UNMAP_FINAL is an uninterruptible version of > > + * BREAK_UNMAP. > > */ > > enum layout_break_reason { > > BREAK_WRITE, > > BREAK_UNMAP, > > + BREAK_UNMAP_FINAL, > > }; > > > > /* > > @@ -531,7 +533,7 @@ xfs_itruncate_extents( > > } > > > > /* from xfs_file.c */ > > -int xfs_break_dax_layouts(struct inode *inode, bool *retry); > > +int xfs_break_dax_layouts(struct inode *inode, bool *retry, int state); > > int xfs_break_layouts(struct inode *inode, uint *iolock, > > enum layout_break_reason reason); > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > index 9ac59814bbb6..ebb4a6eba3fc 100644 > > --- a/fs/xfs/xfs_super.c > > +++ b/fs/xfs/xfs_super.c > > @@ -725,6 +725,27 @@ xfs_fs_drop_inode( > > return generic_drop_inode(inode); > > } > > > > +STATIC void > > +xfs_fs_evict_inode( > > + struct inode *inode) > > +{ > > + struct xfs_inode *ip = XFS_I(inode); > > + uint iolock = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; > > + long error; > > + > > + xfs_ilock(ip, iolock); > > I'm guessing you never ran this through lockdep. I always run with lockdep enabled in my development kernels, but maybe my testing was insufficient? Somewhat moot with your concerns below... > The general rule is that XFS should not take inode locks directly in > the inode eviction path because lockdep tends to throw all manner of > memory reclaim related false positives when we do this. We most > definitely don't want to be doing this for anything other than > regular files that are DAX enabled, yes? Guilty. I sought to satisfy the locking expectations of the break_layouts internals rather than drop the unnecessary locking. > > We also don't want to arbitrarily block memory reclaim for long > periods of time waiting on an inode lock. People seem to get very > upset when we introduce unbound latencies into the memory reclaim > path... > > Indeed, what are you actually trying to serialise against here? > Nothing should have a reference to the inode, nor should anything be > able to find and take a new reference to the inode while it is being > evicted.... Ok. > > > + error = xfs_break_layouts(inode, &iolock, BREAK_UNMAP_FINAL); > > + > > + /* The final layout break is uninterruptible */ > > + ASSERT_ALWAYS(!error); > > We don't do error handling with BUG(). If xfs_break_layouts() truly > can't fail (what happens if the fs is shut down and some internal > call path now detects that and returns -EFSCORRUPTED?), theni > WARN_ON_ONCE() and continuing to tear down the inode so the system > is not immediately compromised is the appropriate action here. > > > + > > + truncate_inode_pages_final(&inode->i_data); > > + clear_inode(inode); > > + > > + xfs_iunlock(ip, iolock); > > +} > > That all said, this really looks like a bit of a band-aid. It definitely is since DAX is in this transitory state between doing some activities page-less and others with page metadata. If DAX was fully committed to behaving like a typical page then unmap_mapping_range() would have already satisfied this reference counting situation. > I can't work out why would we we ever have an actual layout lease > here that needs breaking given they are file based and active files > hold a reference to the inode. If we ever break that, then I suspect > this change will cause major problems for anyone using pNFS with XFS > as xfs_break_layouts() can end up waiting for NFS delegation > revocation. This is something we should never be doing in inode > eviction/memory reclaim. > > Hence I have to ask why this lease break is being done > unconditionally for all inodes, instead of only calling > xfs_break_dax_layouts() directly on DAX enabled regular files? I > also wonder what exciting new system deadlocks this will create > because BREAK_UNMAP_FINAL can essentially block forever waiting on > dax mappings going away. If that DAX mapping reclaim requires memory > allocations..... There should be no memory allocations in the DAX mapping reclaim path. Also, the page pins it waits for are precluded from being GUP_LONGTERM. > > /me looks deeper into the dax_layout_busy_page() stuff and realises > that both ext4 and XFS implementations of ext4_break_layouts() and > xfs_break_dax_layouts() are actually identical. > > That is, filemap_invalidate_unlock() and xfs_iunlock(ip, > XFS_MMAPLOCK_EXCL) operate on exactly the same > inode->i_mapping->invalidate_lock. Hence the implementations in ext4 > and XFS are both functionally identical. I assume you mean for the purposes of this "final" break since xfs_file_allocate() holds XFS_IOLOCK_EXCL over xfs_break_layouts(). > Further, when the inode is > in the eviction path there is no reason for needing to take that > mapping->invalidation_lock to invalidate remaining stale DAX > mappings before truncate blasts them away. > > IOWs, I don't see why fixing this problem needs to add new code to > XFS or ext4 at all. The DAX mapping invalidation and waiting can be > done enitrely within truncate_inode_pages_final() (conditional on > IS_DAX()) after mapping_set_exiting() has been set with generic code > and it should not require locking at all. I also think that > ext4_break_layouts() and xfs_break_dax_layouts() should be merged > into a generic dax infrastructure function so the filesystems don't > need to care about the internal details of DAX mappings at all... Yes, I think I can make that happen. Thanks Dave.