From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B0CDE810A3 for ; Wed, 27 Sep 2023 08:19:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D4AA96B018A; Wed, 27 Sep 2023 04:19:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CFBA56B018B; Wed, 27 Sep 2023 04:19:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B4D576B018C; Wed, 27 Sep 2023 04:19:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9DB606B018A for ; Wed, 27 Sep 2023 04:19:02 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6B4B0A0100 for ; Wed, 27 Sep 2023 08:19:02 +0000 (UTC) X-FDA: 81281676924.04.61D263E Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by imf01.hostedemail.com (Postfix) with ESMTP id D3BDE40018 for ; Wed, 27 Sep 2023 08:18:58 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=OX28zieA; dmarc=pass (policy=none) header.from=intel.com; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf01.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695802739; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wTe78kfc4SEV/7rQyb1fDwnsJ9wxa/m/jgdXPpMueWk=; b=Hc5CvxMsd4WjTXsbhSEUURmaKl8OWrgkSEphCWPOphaTOpR52nO70OrvYk9kgae2O3VgRx PoQ8torgQSwQ04rptBHCYloNpg0Ux9e5LC5mU4nEVYfkz0CLVaJ8aemwKRJQHUmVBSkHNO AuaYLYK8knkiXjpWClBmDJRKjAHWBr8= ARC-Authentication-Results: i=2; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=OX28zieA; dmarc=pass (policy=none) header.from=intel.com; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf01.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1695802739; a=rsa-sha256; cv=fail; b=dNrcCTpCn1aUQv/zP0suACkxx9cKAqRJvq+Q+GOXX9MuErABNxpm+hWm+28aN115EH2YLG A0YrzbTlSUSqPPsiu0bvN28NlavmuFQEXIMeJtXsL0rcHkV+DwxxNMYomiWwKfeKPCuPOf 4zBIhYKPD4jCV8uLX+knRzKGBU/AHT4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695802738; x=1727338738; h=date:from:to:cc:subject:message-id:references: in-reply-to:mime-version; bh=BhJuXf+HfZdrA91V6qpDJV0svqzrWtPlNIa4QWBk6D0=; b=OX28zieA2KkiRKc1PAVmqwiI9Ousd/hiJhwWoNlm3s5Eq62lJ2V54Km5 4yI+4DGqAPo2zPzQwacdQM0MEiYvLfTqqe2Imzb1xLo2xfW5NOMcm1wDE vQ7YEwMmv6rMVSyL6Wt+Z9Rj6/ifcHX6VBArHTmPa9SL+E5dVylFORsmW 950UYPMsnUltSr6zq+QY9tSeCvSWvUjLiFWYKmZp11AVLBA2F+LYZnTKn 8lr/baFC+HEnFqA5T06jyrfISO/whz5uxtZEs2nu9mcQDxRBE5x6gG5Pr n0mhGz6LkLJE7I1jtvNyRll1khiCE3hbRoIJFQg9J4GWsuJo2G5OdHTbu g==; X-IronPort-AV: E=McAfee;i="6600,9927,10845"; a="445894427" X-IronPort-AV: E=Sophos;i="6.03,179,1694761200"; d="scan'208";a="445894427" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Sep 2023 01:17:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10845"; a="922692328" X-IronPort-AV: E=Sophos;i="6.03,179,1694761200"; d="scan'208";a="922692328" Received: from orsmsx601.amr.corp.intel.com ([10.22.229.14]) by orsmga005.jf.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 27 Sep 2023 01:17:46 -0700 Received: from orsmsx611.amr.corp.intel.com (10.22.229.24) by ORSMSX601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.32; Wed, 27 Sep 2023 01:17:46 -0700 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX611.amr.corp.intel.com (10.22.229.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.32; Wed, 27 Sep 2023 01:17:46 -0700 Received: from ORSEDG602.ED.cps.intel.com (10.7.248.7) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.32 via Frontend Transport; Wed, 27 Sep 2023 01:17:46 -0700 Received: from NAM02-DM3-obe.outbound.protection.outlook.com (104.47.56.43) by edgegateway.intel.com (134.134.137.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.32; Wed, 27 Sep 2023 01:17:45 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ThFRtwxZps1fTq6DpPIekHusg6n1PTxizBIoKf6rZb5Dr664qvhVwbj1k3Ugotw2slQXGcVvMaH19LxZwCxg/Nkj1ifvGkKuDu6O5GEd64Q76CAHEESh20OjvKYQEDAzO9w+G/LLOpDEldylidb/cclWaCor6FGg1FazFaKvJlvqthdj1j5Dw0Mn3e5V0CzYOBEnBtssUwGu/vbfLSQp0aBZnls5F42pAnC22WRJf6suo3R8WjDpreqGAYxMwDtfqLXQQER0TxOydJq/Mf6tqPlMOe0RKturmyCVa83ScAWusw+Y+YFc4wm2vos5xYW0W7K94WIbnFwSX5xCtlBimg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wTe78kfc4SEV/7rQyb1fDwnsJ9wxa/m/jgdXPpMueWk=; b=KgDqX9jZxoKd96CoEi85XyDzCch/Ji2WWb2iGcfZAVo/kdX01+90jn6+extkmv8xbzaMnNYinKVID6YjqddJlTPGqKWG/M3po/v+Pw/3HzptW23OyLpsg9RHd+FgjyRcwRboJIIRlh4UaXOKqFJbSLRQjfbn9/oygXlPie4QJ68gcr+Pyuo4WhuatsHzEdmZYxOe2gaU5yTKP+K0WIxfokhpq/yQYS9dnHzREhajPJW2VBjfi/fzUKbMSnDkdPoYY0D+2XIkApGcaB1sCmjFnloX7rBxeObGFWMxAUVgSIfnhFe1YhqtULIVvUEi/0E8x1tfw4n8w5QkL2JOOCsL/w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from PH8PR11MB8107.namprd11.prod.outlook.com (2603:10b6:510:256::6) by IA1PR11MB8197.namprd11.prod.outlook.com (2603:10b6:208:446::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.31; Wed, 27 Sep 2023 08:17:38 +0000 Received: from PH8PR11MB8107.namprd11.prod.outlook.com ([fe80::acb0:6bd3:58a:c992]) by PH8PR11MB8107.namprd11.prod.outlook.com ([fe80::acb0:6bd3:58a:c992%5]) with mapi id 15.20.6792.026; Wed, 27 Sep 2023 08:17:38 +0000 Date: Wed, 27 Sep 2023 01:17:35 -0700 From: Dan Williams To: Shiyang Ruan , , , , CC: , , , , , Subject: RE: [PATCH v14] mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind Message-ID: <6513e51eecf18_91c1e294f5@dwillia2-xfh.jf.intel.com.notmuch> References: <20230629081651.253626-3-ruansy.fnst@fujitsu.com> <20230828065744.1446462-1-ruansy.fnst@fujitsu.com> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20230828065744.1446462-1-ruansy.fnst@fujitsu.com> X-ClientProxiedBy: MW4PR04CA0217.namprd04.prod.outlook.com (2603:10b6:303:87::12) To PH8PR11MB8107.namprd11.prod.outlook.com (2603:10b6:510:256::6) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PH8PR11MB8107:EE_|IA1PR11MB8197:EE_ X-MS-Office365-Filtering-Correlation-Id: be345f9c-6b8c-4f7d-0089-08dbbf323639 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: X1r6N2M1T29hcq9aJEfz0kUM4fZWThnLZa1UhPPeEYUzqWktPG47LpJGQW57PZUBJxicVV960OglkZ7mM6JWS85bl+e7Z0H8xMexuaSa2CwTDaqPWz1O6TSLSFZrjIBDh7IW9Td2AxV4xB1z5P6DDbGLHHO3pxUQI5/RgNjSppyNKFXDl5YC+sL+/XgMuJ13aB1YWa6sPhXhxYRW+Ykg9GDXKwaU+jU3t3nk5QSbM9QJMYNO4iVi2EUT8UAT5+B19kV0HQB7FCfJj/IBApxYNuv+66bS0adMRfmIyDwsOQfjovsVk3r2lSrx5OAJXeHwTUZk7B4eWrdNGHOk/8jFeKU3Q2rMauWxktCeGzWXflkixvN4jza7hYWh8li4okAdnNRmx8bzkAuoCdFG+aqiqHN8GWYcBD1nTP6aKHOmbD8l4cmn8oEYtbMQMYi3VYGHfR6PhGCwYtlKKdjjWZJnpRvvcgC9/UQ/9VHbxXWod1TAMkADExOQ7ALps3hbmYwr0hccbWnpwYKUigsr+xzMZz126XIm6CA4EvwoL9O0MMh1RKRXmP+xcGzDVtTdl6BYycRvKP+kj9ETWGJflSofN8l3440D2uhQJoEbCtgCOX2ykQMfF+QM90RonsvrbYrpntRaBz7puM/JYqVfALKRxg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PH8PR11MB8107.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(39860400002)(346002)(376002)(136003)(396003)(366004)(230922051799003)(186009)(1800799009)(451199024)(6666004)(66556008)(6486002)(6506007)(66476007)(66946007)(478600001)(966005)(8936002)(6512007)(8676002)(41300700001)(9686003)(5660300002)(4326008)(26005)(2906002)(83380400001)(316002)(7416002)(30864003)(86362001)(82960400001)(38100700002)(18704003);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?I2MUHLb7h6V4TZ/pepmXBBiSA2MweXfC/m/UIV0YE9U/qlOpvzc7yzvFU83D?= =?us-ascii?Q?QWVhfH+zFx9Y31NQBF/L74Y3HdjKuY3cqSkLRV4AsFYp2SDFzqDU48tm+i1N?= =?us-ascii?Q?6jF9Vj4OJ5LjlEAB7VZQr5orbSCsXulMGQScI0jJY/S6dCJYhfseMhKLgf9k?= =?us-ascii?Q?MGwDI5vxxn4RkeXLCXVoynUQTICFANNozrs4dtaoydJg/lKqFUSPzOEYMV3x?= =?us-ascii?Q?5w0e5l4C5NyTRhdSMEF/RongNvgUgDXuqwWQf7K58h5UD9Sfpk++foXjBFbZ?= =?us-ascii?Q?7w6eqFqqZc2eG5OhA2pC6qx9hSpZA6gGI7xaoNq+uo1t4BB+knZEM1L5UGLQ?= =?us-ascii?Q?kP4gEnnnppShaFw/c75yLBX4NHmdX8w/eN8hFQh1suzaRHfwlOgwPJKPhtBy?= =?us-ascii?Q?JumPmRJhiDqiG1QKlovc4IrXEhWOA6MjaxcaAppMUHrBbcyrvgNjWYIoZxDj?= =?us-ascii?Q?IC5DSVh1hJaNmcKOCn93UNKNYGOtFpOWM5hmalqkTFrSeMZKOHaEnvHylvCj?= =?us-ascii?Q?s+Itv2kCMRFaMiIgrD+5caF8vXpI1gWAK23BbThGzuiQqfqq+esuBWvOQftp?= =?us-ascii?Q?j2/1o603/TMNLMcLwH3PM1SDNzsPGSh9BV4h2kDLTVhAK+WL8xzH1opN5Pou?= =?us-ascii?Q?e7gjsb+b/t/s+ARX9NM+/I3V9yZoFVaSdxsunZgFDcPW+ZValoqROz1cMeua?= =?us-ascii?Q?AOVRzhm1kAPYPQqYvKCqMw5z029SDHdB6OFkcTazvWpnn/Jq5d7TqJ9yntno?= =?us-ascii?Q?zEGhC/6UE6AkTiPbMNTigqNOKzMZ2m1+izrJiaMov2hzfeZJ3jHEVpcdkHpk?= =?us-ascii?Q?YGxw50yFgPgXYtl04g/G6r8hohhOP4WiEDL33au/UIHX4UahJMjv2LHxImD0?= =?us-ascii?Q?SqF/lwMNP7r81bCFZ9Bl8WeA+KTi2KO4Iagv4kYlB3r/HSc0nIzEsgbdclvI?= =?us-ascii?Q?0FuXQCEkiXSaH7350kKFKp09PcQaeiGfj9bCFCgfOpZixCxlg8tXzpV7Tj0I?= =?us-ascii?Q?l/qb6mfQhEbIPQGQaUaDnxlgpAed68ELnTARDKDxiHqkq8ADr+3QYDJSI/7T?= =?us-ascii?Q?8Ei/qsqN4N4CJm0TozYoWpyWMPouwAgtfqyqt1vsj+O0vBcB0ynueb4lR/dx?= =?us-ascii?Q?rKl5DHMWZ6MI94dcXU1aj70EoA3XgX4DckdckpDVtoui3MMATgSuPEdu9eck?= =?us-ascii?Q?b/0qngMytM2dP/Zjaf1cJTTjypskMw0Jt42viqGZJdSwqtgmzGaErDNz0Cg8?= =?us-ascii?Q?CQJ8RG9eU6YPKWhrH1HS8eZ4t4dGHYLR4iFeIpqAsMrwB1egEFaWnANnipV4?= =?us-ascii?Q?J7b0QepPzCV8zkhdkMX2sSjykt/023Xh8rsNDJS+MKj4ghYtBYGcD1uSPfUc?= =?us-ascii?Q?/8FOZrxg78DF07RhUw3avDkqaLe+cybJ6ctCRt/xM3i+4PXJESvkyP2Gsx4B?= =?us-ascii?Q?F3c4Ze1UfPYVIYi/LLwbaSyLkFqTFw2p6XUFMYmR58PaQ6Ss3rQOHybuc/9H?= =?us-ascii?Q?WQTZSXLZKW5mHyhpN7lo76+QRxk1KjLbKM4jOjURVoe64PlfcgB11daCUxHI?= =?us-ascii?Q?8xjW+RCPH7sBMUC0pOXSKrgKWazyvTXnrIh9eSowlBsTsxOW2YFmhVkd8E3P?= =?us-ascii?Q?dg=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: be345f9c-6b8c-4f7d-0089-08dbbf323639 X-MS-Exchange-CrossTenant-AuthSource: PH8PR11MB8107.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Sep 2023 08:17:38.3792 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: MDp7qeiFcHHla1BbRP2KhqVJWwHCwrlfZimgIYEBU8S5TKgd1sy19Rpc6cl4kuM59lQF6jBM8T2PtyaItxm7z5cHvI72sGd5UTNnVzeJqS0= X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR11MB8197 X-OriginatorOrg: intel.com X-Rspamd-Queue-Id: D3BDE40018 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: oeifixfb9461d6ihi6exugdwk78yuz4u X-HE-Tag: 1695802738-658139 X-HE-Meta: U2FsdGVkX18kLpL++hJ5Cb3BZ92ohy3TetKD6DEk0Qv9FQByW+HiuCrIogsvDY/ZSRE4QQqbRVc+sDFowF4QU6E3xCSqhX1USDrL32217tMqd1dGn9Jx0qskbalux+0VttPLZDS80joBU7igU6vaaYY6CFPgZeMtzNs3/hkRm/WMHIybr7lglLituHG3b6Oyr5wI2jNHovQ8SQOWxo0iSBWR/DR980tUr+nFQt9hTlSJhhahOWD81/nzqSBGZhruaM6NL6oHlqgjEuH+Qy9n31z8fPQayPFUfelT4PBfQxGUWdt8DZIlE0FKI0bts7PlJpQpQjztGPFpAUOsAfTSyuj/J5CjMEhgozwNjJd1peN0sSzFW+BFW9JCmsuCIO6btApnQf7Wux0eeThBlpdJWlQVJ+2XN79Vo0hXhUrmLI84p/wzCO3gsBVUpkD9xeaUtvFdRic1IAR7pg9wwsJMWxIhReLdOW18DNp2DUcCbPLhUJHhQjn0NLzQViKxCNnQrgbtrTpISRjomo8FD39YW2GPwnbv3SvzcyPs6JbmdWMvwnodsgUVdqnEVTYc+8XCLlGfsx7h1G8WQ19WdG8Cq7nQXzan5+8rfWCRAR70pd6rwsYgbfVObDhcCTbK3Rt83qbTqE6XhWGWDsPm95UYdOdAllPT2IHpAxrw/GdVwvhHls6SgP1q6xTBaGDy2SzCgMIpRte0kL5O2X/oBP5/qmC3hDZ1RbSHhwPRrbjfS9KdrDJH3O2fnmKw4EZcYPA9XoGzvcpPQBvGfjbTQ3Xs5/TOPZ1OwbxiPAGTTaAe2lOE3vN6MAIac7aEUN2Zu5sg1waDnCZsV4eIGhzxmy5oxcMKFYJS0ctAXGGyZpnK41/YGYlFQo/b0h817is6uGF4uqFZdXh2pWk6v3A8WkwJ9CQMId7sRn3AV53UfzpKRwIXKhzbUAXHdLoLEEUL01wLhYw4bcHly2IrMnaT33n xkeVR0cl TuvOaApXkYgi8ApnJ5wI0fAvArzm/qJ/+27A2mHQQ77vyHN11XNiO4k98ytbJSoZPHH4WyROgM+hkm8mmZUe+TzBf4m4tZuhcHzhU2I6njnS4j8JBvwS2hOFfLGnETT6oYMuNFZd333MBt5rEezOPqdrU5ZYRMKcuAgnvm3MGLdMQifW08CkN/3PtkpDFd2+6hGGQav4NevMHdohn4UP4psaygg0hDikRIg3GhKC2AZ374Hyks+MDMBHF7VcsVMIhum+xQkoLFxwjBrcbK8jZTvJDuuvw4fmBxlk39plBKgA11kH8TD0aoI69CIQatcukUDTUjSw7xv46X06iYinBRsBBhTlAChkQkXPWRLatpqwlqOjsy50Pmra/ZI8o2ax8NPWupchHMvPwb7Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shiyang Ruan wrote: > ==== > Changes since v13: > 1. don't return error if _freeze(FREEZE_HOLDER_KERNEL) got other error > ==== > > Now, if we suddenly remove a PMEM device(by calling unbind) which > contains FSDAX while programs are still accessing data in this device, > e.g.: > ``` > $FSSTRESS_PROG -d $SCRATCH_MNT -n 99999 -p 4 & > # $FSX_PROG -N 1000000 -o 8192 -l 500000 $SCRATCH_MNT/t001 & > echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind > ``` > it could come into an unacceptable state: > 1. device has gone but mount point still exists, and umount will fail > with "target is busy" > 2. programs will hang and cannot be killed > 3. may crash with NULL pointer dereference Thanks, this addresses my main concern that this new capability is needed otherwise DAX regresses the survivability of the kernel when removing a device from underneath the mounted filesystem compared to removing a non-DAX capable block device. > > To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know that we > are going to remove the whole device, and make sure all related processes > could be notified so that they could end up gracefully. > > This patch is inspired by Dan's "mm, dax, pmem: Introduce > dev_pagemap_failure()"[1]. With the help of dax_holder and > ->notify_failure() mechanism, the pmem driver is able to ask filesystem > on it to unmap all files in use, and notify processes who are using > those files. > > Call trace: > trigger unbind > -> unbind_store() > -> ... (skip) > -> devres_release_all() > -> kill_dax() > -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE) > -> xfs_dax_notify_failure() > `-> freeze_super() // freeze (kernel call) > `-> do xfs rmap > ` -> mf_dax_kill_procs() > ` -> collect_procs_fsdax() // all associated processes > ` -> unmap_and_kill() > ` -> invalidate_inode_pages2_range() // drop file's cache > `-> thaw_super() // thaw (both kernel & user call) > > Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove > event. Use the exclusive freeze/thaw[2] to lock the filesystem to prevent > new dax mapping from being created. Do not shutdown filesystem directly > if configuration is not supported, or if failure range includes metadata > area. Make sure all files and processes(not only the current progress) > are handled correctly. Also drop the cache of associated files before > pmem is removed. > > [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/ > [2]: https://lore.kernel.org/linux-xfs/169116275623.3187159.16862410128731457358.stg-ugh@frogsfrogsfrogs/ I only have some questions and comment suggestions below, but otherwise consider this: Acked-by: Dan Williams > > Signed-off-by: Shiyang Ruan > --- > drivers/dax/super.c | 3 +- > fs/xfs/xfs_notify_failure.c | 99 ++++++++++++++++++++++++++++++++++--- > include/linux/mm.h | 1 + > mm/memory-failure.c | 17 +++++-- > 4 files changed, 109 insertions(+), 11 deletions(-) > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index 0da9232ea175..f4b635526345 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -326,7 +326,8 @@ void kill_dax(struct dax_device *dax_dev) > return; > > if (dax_dev->holder_data != NULL) > - dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0); > + dax_holder_notify_failure(dax_dev, 0, U64_MAX, > + MF_MEM_PRE_REMOVE); > > clear_bit(DAXDEV_ALIVE, &dax_dev->flags); > synchronize_srcu(&dax_srcu); > diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c > index 4a9bbd3fe120..79586abc75bf 100644 > --- a/fs/xfs/xfs_notify_failure.c > +++ b/fs/xfs/xfs_notify_failure.c > @@ -22,6 +22,7 @@ > > #include > #include > +#include > > struct xfs_failure_info { > xfs_agblock_t startblock; > @@ -73,10 +74,16 @@ xfs_dax_failure_fn( > struct xfs_mount *mp = cur->bc_mp; > struct xfs_inode *ip; > struct xfs_failure_info *notify = data; > + struct address_space *mapping; > + pgoff_t pgoff; > + unsigned long pgcnt; > int error = 0; > > if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) || > (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) { > + /* Continue the query because this isn't a failure. */ > + if (notify->mf_flags & MF_MEM_PRE_REMOVE) > + return 0; > notify->want_shutdown = true; > return 0; > } > @@ -92,14 +99,60 @@ xfs_dax_failure_fn( > return 0; > } > > - error = mf_dax_kill_procs(VFS_I(ip)->i_mapping, > - xfs_failure_pgoff(mp, rec, notify), > - xfs_failure_pgcnt(mp, rec, notify), > - notify->mf_flags); > + mapping = VFS_I(ip)->i_mapping; > + pgoff = xfs_failure_pgoff(mp, rec, notify); > + pgcnt = xfs_failure_pgcnt(mp, rec, notify); > + > + /* Continue the rmap query if the inode isn't a dax file. */ > + if (dax_mapping(mapping)) > + error = mf_dax_kill_procs(mapping, pgoff, pgcnt, > + notify->mf_flags); > + > + /* Invalidate the cache in dax pages. */ > + if (notify->mf_flags & MF_MEM_PRE_REMOVE) > + invalidate_inode_pages2_range(mapping, pgoff, > + pgoff + pgcnt - 1); > + > xfs_irele(ip); > return error; > } > > +static int > +xfs_dax_notify_failure_freeze( > + struct xfs_mount *mp) > +{ > + struct super_block *sb = mp->m_super; > + int error; > + > + error = freeze_super(sb, FREEZE_HOLDER_KERNEL); > + if (error) > + xfs_emerg(mp, "already frozen by kernel, err=%d", error); > + > + return error; > +} > + > +static void > +xfs_dax_notify_failure_thaw( > + struct xfs_mount *mp, > + bool kernel_frozen) > +{ > + struct super_block *sb = mp->m_super; > + int error; > + > + if (kernel_frozen) { > + error = thaw_super(sb, FREEZE_HOLDER_KERNEL); > + if (error) > + xfs_emerg(mp, "still frozen after notify failure, err=%d", > + error); > + } > + > + /* > + * Also thaw userspace call anyway because the device is about to be > + * removed immediately. > + */ > + thaw_super(sb, FREEZE_HOLDER_USERSPACE); I don't understand why this is not paired with a freeze in xfs_dax_notify_failure_freeze()? > +} > + > static int > xfs_dax_notify_ddev_failure( > struct xfs_mount *mp, > @@ -112,15 +165,29 @@ xfs_dax_notify_ddev_failure( > struct xfs_btree_cur *cur = NULL; > struct xfs_buf *agf_bp = NULL; > int error = 0; > + bool kernel_frozen = false; > xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, daddr); > xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, fsbno); > xfs_fsblock_t end_fsbno = XFS_DADDR_TO_FSB(mp, > daddr + bblen - 1); > xfs_agnumber_t end_agno = XFS_FSB_TO_AGNO(mp, end_fsbno); > > + if (mf_flags & MF_MEM_PRE_REMOVE) { > + xfs_info(mp, "Device is about to be removed!"); > + /* > + * Freeze fs to prevent new mappings from being created. > + * - Keep going on if others already hold the kernel forzen. > + * - Keep going on if other errors too because this device is > + * starting to fail. > + * - If kernel frozen state is hold successfully here, thaw it > + * here as well at the end. > + */ > + kernel_frozen = xfs_dax_notify_failure_freeze(mp) == 0; > + } > + > error = xfs_trans_alloc_empty(mp, &tp); > if (error) > - return error; > + goto out; > > for (; agno <= end_agno; agno++) { > struct xfs_rmap_irec ri_low = { }; > @@ -165,11 +232,23 @@ xfs_dax_notify_ddev_failure( > } > > xfs_trans_cancel(tp); > + > + /* > + * Determine how to shutdown the filesystem according to the > + * error code and flags. > + */ This comment is not adding any value. It would be better if it clarified why why want_shutdown will be false in the pre-remove case? > if (error || notify.want_shutdown) { > xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK); > if (!error) > error = -EFSCORRUPTED; > - } > + } else if (mf_flags & MF_MEM_PRE_REMOVE) > + xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT); > + > +out: > + /* Thaw the fs if it is frozen before. */ > + if (mf_flags & MF_MEM_PRE_REMOVE) > + xfs_dax_notify_failure_thaw(mp, kernel_frozen); > + > return error; > } > > @@ -197,6 +276,8 @@ xfs_dax_notify_failure( > > if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == dax_dev && > mp->m_logdev_targp != mp->m_ddev_targp) { Maybe a comment: /* * In the pre-remove case the failure notification is attempting to * trigger a force unmount, the expectation is that the device is still * present, but its removal is in progress and can not be cancelled, * proceed with accessing the log device. */ > + if (mf_flags & MF_MEM_PRE_REMOVE) > + return 0; > xfs_err(mp, "ondisk log corrupt, shutting down fs!"); > xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK); > return -EFSCORRUPTED; > @@ -210,6 +291,12 @@ xfs_dax_notify_failure( > ddev_start = mp->m_ddev_targp->bt_dax_part_off; > ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1; > > + /* Notify failure on the whole device. */ > + if (offset == 0 && len == U64_MAX) { > + offset = ddev_start; > + len = bdev_nr_bytes(mp->m_ddev_targp->bt_bdev); > + } > + > /* Ignore the range out of filesystem area */ > if (offset + len - 1 < ddev_start) > return -ENXIO; > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 2dd73e4f3d8e..a10c75bebd6d 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -3665,6 +3665,7 @@ enum mf_flags { > MF_UNPOISON = 1 << 4, > MF_SW_SIMULATED = 1 << 5, > MF_NO_RETRY = 1 << 6, > + MF_MEM_PRE_REMOVE = 1 << 7, > }; > int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, > unsigned long count, int mf_flags); > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index e245191e6b04..e71616ccc643 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -683,7 +683,7 @@ static void add_to_kill_fsdax(struct task_struct *tsk, struct page *p, > */ > static void collect_procs_fsdax(struct page *page, > struct address_space *mapping, pgoff_t pgoff, > - struct list_head *to_kill) > + struct list_head *to_kill, bool pre_remove) > { > struct vm_area_struct *vma; > struct task_struct *tsk; > @@ -691,8 +691,15 @@ static void collect_procs_fsdax(struct page *page, > i_mmap_lock_read(mapping); > read_lock(&tasklist_lock); > for_each_process(tsk) { > - struct task_struct *t = task_early_kill(tsk, true); > + struct task_struct *t = tsk; > > + /* > + * Search for all tasks while MF_MEM_PRE_REMOVE is set, because > + * the current may not be the one accessing the fsdax page. > + * Otherwise, search for the current task. > + */ > + if (!pre_remove) > + t = task_early_kill(tsk, true); > if (!t) > continue; > vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { > @@ -1788,6 +1795,7 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, > dax_entry_t cookie; > struct page *page; > size_t end = index + count; > + bool pre_remove = mf_flags & MF_MEM_PRE_REMOVE; > > mf_flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; > > @@ -1799,9 +1807,10 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, > if (!page) > goto unlock; > > - SetPageHWPoison(page); > + if (!pre_remove) > + SetPageHWPoison(page); This problably wants a comment like: /* * The pre_remove case is revoking access, the memory is still good and * could theoretically be put back into service */ > > - collect_procs_fsdax(page, mapping, index, &to_kill); > + collect_procs_fsdax(page, mapping, index, &to_kill, pre_remove); > unmap_and_kill(&to_kill, page_to_pfn(page), mapping, > index, mf_flags); > unlock: > -- > 2.41.0 >