From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A5B7C54791 for ; Wed, 13 Mar 2024 06:32:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C599E940019; Wed, 13 Mar 2024 02:32:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C09A4940010; Wed, 13 Mar 2024 02:32:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8330940019; Wed, 13 Mar 2024 02:32:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 95F52940010 for ; Wed, 13 Mar 2024 02:32:55 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3108881100 for ; Wed, 13 Mar 2024 06:32:55 +0000 (UTC) X-FDA: 81891047910.09.F913F89 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by imf15.hostedemail.com (Postfix) with ESMTP id 196B9A0008 for ; Wed, 13 Mar 2024 06:32:50 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=WmI0xAGt; dmarc=pass (policy=none) header.from=intel.com; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf15.hostedemail.com: domain of dan.j.williams@intel.com designates 198.175.65.10 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710311572; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gt8BOjG248V8hFz2gUx0ZnI4lhL7j/GZkVntlnF3XT4=; b=WT2+c8i12ds2C7G+DVHhvX+C0kpax+r+ef95LjTIcDbCV3SsW3yWg/eQ0ofx1PrKNVZPfZ S9JfluIKERwRZjnf3/EybGxYCHK9xlqxYh79U0P5/F0xx3frNAUHwSHlEDYB/QidNXDPDQ Mo5wcsHvCg8przCY5up0odQJrCi/0Ds= ARC-Authentication-Results: i=2; imf15.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=WmI0xAGt; dmarc=pass (policy=none) header.from=intel.com; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf15.hostedemail.com: domain of dan.j.williams@intel.com designates 198.175.65.10 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1710311572; a=rsa-sha256; cv=fail; b=ue1qibOFXk5Nb2FU5NqpXnsZ2vwKn2df6Ab4iSZLOhfggpnBkII56GvV1T0g2+J9YblTKW OApLgXWHidGV6HfIVZXFJYQQ59WruTkFU2P2G5rpkTTiU5DPmWqbwuWYYHjlJsTVHcpzg1 dGojESYyRtRvTsgMUeuqIN8TPVL4tQw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1710311571; x=1741847571; h=date:from:to:cc:subject:message-id:references: in-reply-to:mime-version; bh=n1FaCQ3Ym4YX6R4q6yADKdxeV5ria5yHWlGKVQGcKUw=; b=WmI0xAGt3BDa53eSJssefGmVJCo5vZJVhw5JBWd4mG1noGCnopy23PBX NLv/iqqJf48OZxHKYCm31NLHE+Vzf9Wyr59nYypsVIBJFzcEY6EHaK6oB AXZe5X48jJ761FoLoDcixfCt736GFmA3Z5WYmwBSXWPDf54mDswSnHgHa 2rB0vl3LDACFCtv29OlcwZkvAJwUoh8mb1fVIgIRkZxLaMPzcw9cqLds3 mUE0QFS+7j52sQ6UxBaIzpaCiXvxoZbmLoDJO9VVhgo33OOWSZAdEUz/2 5nC6sj9WSV/WJA4s91rRM+awEOcCv1x4tVMBtzoV3j1ApGVp1ESP9V1jO w==; X-IronPort-AV: E=McAfee;i="6600,9927,11011"; a="22509637" X-IronPort-AV: E=Sophos;i="6.07,119,1708416000"; d="scan'208";a="22509637" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Mar 2024 23:32:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,119,1708416000"; d="scan'208";a="11899008" Received: from fmsmsx603.amr.corp.intel.com ([10.18.126.83]) by fmviesa008.fm.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 12 Mar 2024 23:32:49 -0700 Received: from fmsmsx603.amr.corp.intel.com (10.18.126.83) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 12 Mar 2024 23:32:48 -0700 Received: from FMSEDG603.ED.cps.intel.com (10.1.192.133) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35 via Frontend Transport; Tue, 12 Mar 2024 23:32:48 -0700 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (104.47.59.168) by edgegateway.intel.com (192.55.55.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Tue, 12 Mar 2024 23:32:48 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PAbr8Pn3rTAsysWpjydURXC/yX0chXg7Kw1IrHVvMVPikXyz04sErwTr0dHmjJ+lOMqvR9Hl8UUd/WtsbUgZgv2QArBqoZ0jqQfz8KvIXz5N9IF4xmIpJPDP4X6nFQA2cdbTbTvxWqHHLVA7TTsEGaJHJ7CsdBOTnkHqhyN/K8NzKV6R39UWhx7ZkimQqdShzJqlr8tgX+Hn6VYWEgPRhbJh0FrbA8XiVaxIRYq7kWhvQQLyM7pDFXjGl0YR9f2+FeLpRZzonBYRu3Jw1hTAtQE6PZ6heAZxfHkiThf/FgMaQu5Mjp4bSploRdGJnP9iHiAMndayFvPo5+vfqP4d9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gt8BOjG248V8hFz2gUx0ZnI4lhL7j/GZkVntlnF3XT4=; b=mZGbdF7hUMV2/IQOTqUz4lwelBB5ChlsQ1mARhpOcir6D1gHnUZyBu9w+mCYEab+fNVvKZXek+n5ijHKHPyd6cxtlFLLaa2uhWjfIHd6vyuNz8ynUGZg7j33UZsS3+tYW3UV3CqvIqWDz39REE0LnjRwTI/MnqM4CwuyRr2UMCtfhiQ813KsICLVA1O+IXDxzpg6g9acpF05kZZy9K1h6SM2Pghd7kjIO65kKph+GYVdESgv7F5l8f72BCF1ZMpP5qOedHfn2Yga0sI77WzIdkNDK4Iwzx7hF03NYsqYlO75P3xLz+atDmPH0F57WFtclamblL8S7iGLw080knwJHQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from PH8PR11MB8107.namprd11.prod.outlook.com (2603:10b6:510:256::6) by SA1PR11MB8255.namprd11.prod.outlook.com (2603:10b6:806:252::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7386.19; Wed, 13 Mar 2024 06:32:41 +0000 Received: from PH8PR11MB8107.namprd11.prod.outlook.com ([fe80::82fd:75df:40d7:ed71]) by PH8PR11MB8107.namprd11.prod.outlook.com ([fe80::82fd:75df:40d7:ed71%4]) with mapi id 15.20.7386.015; Wed, 13 Mar 2024 06:32:41 +0000 Date: Tue, 12 Mar 2024 23:32:38 -0700 From: Dan Williams To: Alistair Popple , CC: , , , , , , , , , , Subject: Re: ZONE_DEVICE refcounting Message-ID: <65f148866bc56_a9b42947@dwillia2-mobl3.amr.corp.intel.com.notmuch> References: <87ttlhmj9p.fsf@nvdebian.thelocal> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <87ttlhmj9p.fsf@nvdebian.thelocal> X-ClientProxiedBy: MW4PR04CA0381.namprd04.prod.outlook.com (2603:10b6:303:81::26) To PH8PR11MB8107.namprd11.prod.outlook.com (2603:10b6:510:256::6) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PH8PR11MB8107:EE_|SA1PR11MB8255:EE_ X-MS-Office365-Filtering-Correlation-Id: abd51317-eeac-424a-9bfa-08dc43276279 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: KJStwdh4p6CZxDJK7thRObh4Y/5qY1LjKHu/3vNEzHaxwbqLqrs8ECMYTCG9OBSsepEKaQF7cBhHqk8nt0ttCC8j80c+vQ7E6BXsby1c9C7CIJMZXr+puXpjfhFhjxZr0M2fvrZqPM3nMN5I8/jJ5J7zjpE+ZAx/i3UbiOnzHhAzLRdz3Av0eP/AK3qt3fssY57wYiVhsljpXcsiO+CCmkQaVNE33RMElwiQrVWoBr6NosneMG+/e7voZd48dc7hed+xHvUo8ckMa75jERhXuYHOn7pOtqxsM67Q9o3ueBylFl+VbbXc+Ja+dQGN9BvjXcbXXE/NpaqG/hQplARDNi//mvt+Yq8xqlXEGyLckfOb0duR+Jsb4eVH/jtxY3/34BPX+tOj4z75aCc0ZHTAX3yM//bASNZs+w1hEht/Jxd+NtMk8eS29X+88MY9QM4cqLTaaPSZCbRC6dkhZK5r35V4JH1AOeg3dsYN4/d7Hivq0L8RSnOa+Sox3JJAMMSvJf8ICjh1aJZTMlq09dZ2wdsr02v0CcQyVzPsax/+/B/KQ1MGrLSqXx5kxD5pkNyZH5gDb13LEBVlNLa+TSPswnH+VW8fqPTAuPmssRZKI1GOxhd6Q4L4i95TIjN55DGBr7UDv9pSO+jZMmmH4uWwzkM0C/Wr5CyNvD6pVbTaWCY= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PH8PR11MB8107.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(1800799015)(7416005)(376005);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?u+dAMc74omN2yZmHGJRTEgQYnjP0rWFXS1pyeICfVnN/GS9vJOBJVxmsT4lE?= =?us-ascii?Q?bv/P05RpQJ9y2uc8B8l7+d+Wgw5kT/MzIw22pf6YGecB45LrvJe+iXK99L7Q?= =?us-ascii?Q?2fQcnYw0rC54Q3oxa0x+ws9JoSquVfZvxYCwrEvM2HtQitJ0u80+5QaijaaE?= =?us-ascii?Q?cwOs++MY0sOu8xmqILP4ZpuLreWHjqll33/20KCzfXkGBDLrHZlC/oVSqy3x?= =?us-ascii?Q?JB8bPin01mCvYcP/Q51pYYPs7OQi0qZxpdXiVJ0BQma50AYuNSXh86+QDU/h?= =?us-ascii?Q?tNgF1ORvo4oc3IUy6y+t12UbWnfrYq0bfNNWZgqAxxbms4/SkBP5lN90prL1?= =?us-ascii?Q?9mIQUO+aYwzlySlR8VmZ7L2/1MaVMBGu6ZGb1nMJT9nCOyVtDyA15lr6h8Id?= =?us-ascii?Q?6+mSOBKGnQ/6UlF7Z+b/SSUCjmCBfk6xCYfnuckhgSH3SxcKgYEEHYuUUeV6?= =?us-ascii?Q?efNRGRU8OVqBOyeUIfEdn2Fnidzq3eQ0DvW4vAKuBKDXBHK+TVQwCWyZFGSO?= =?us-ascii?Q?2HNU6xfXrb7qWYwiABh3HEjLgs9oVv9w/UMixv6zKtky8dUet9HnBEXu43sZ?= =?us-ascii?Q?ZefVO+TbyUVAF0Y51FEoaYzRdUfz020rSNTz/l3HDho5MFo+Elg1hLAyUYKN?= =?us-ascii?Q?1/QB5WFSv0BbokW5EW30GpZIxBfmQ8AxfRh/+2s+ik8/17qzBc59UnhH2Okn?= =?us-ascii?Q?o9PoEKpkBGKS9KeLLZbADYCsl+UxMVMj/HbJji6WSTavB9inA71dRFqPtusw?= =?us-ascii?Q?HPT/suJEVLU1swZziPZx0EhAeA0Kek9pXxxqTE2YOvEUcFAL0xRtZEusY3y+?= =?us-ascii?Q?eEbzu1CgJ3+le3zWeXKuox/g5OaB70J/iK1rlZB7Bk8WV5nRgQo0WwL/1N1i?= =?us-ascii?Q?pw7cLKTkIRgVJycdcJz7oJZDZSQyb3ZbE6xGRR8qJYl49WfR5hbKfYcGd4ul?= =?us-ascii?Q?3dJ62WfSGERdC5iCOMJFaPt04dHC15QdO6qAg/mU6a9MnYMdu9+1r8+KDK89?= =?us-ascii?Q?5sgweCZeRoKj3CLoNxrlZ8+HhoeTk4d1qNnVbek7y56QYROPDzuWIfxfX1uI?= =?us-ascii?Q?be9hpoa26xHzVkmtUbnkqrnIgaydjMw+nySZkHMYgiNcbPwT4bzGcifVGZQx?= =?us-ascii?Q?dDBB8Y6SjvokPl3gLeLZc4XNi11OOZhTvPuvOZCEef3OgG7eccWAGuKNff5y?= =?us-ascii?Q?rUWCHlTqLKCYkhZxLlnNp7RN7d2eDdXUausszwgwlUXpeZOw/y918/45PCKO?= =?us-ascii?Q?bKYyuNmn1ZTt5/yXTTR/5gXw17wPqK3YHLVO8ji4P15q4c3Ya8JLgYzSmrgq?= =?us-ascii?Q?B5qBSY2EtgFbXZefbUg+tLWSkOZtLgU9VawAlbBB4lIFst6scpXP06t+pxpk?= =?us-ascii?Q?ccS6JoFwHN2mIY/KuY9kYGuWZhC50PEAl+FAG5HCSS5BA7N7Bm0sAF+QFDqX?= =?us-ascii?Q?5Gc+vTLJ13KEI1s7ABGR/V35KNeAa7kiH0dYDxogaNllNTHoGjeagZas7lYD?= =?us-ascii?Q?jXxstZLcL6UwbM2UkXcJKO/OmE+l1se01Dq8Fu51pue5Kuw0xxzFRJX0oh0y?= =?us-ascii?Q?oaHUjFy2LMemzvAkqLi1HPm5njlacsGf9QvEZtsIKLy7mXtLLeTpSehlgrfH?= =?us-ascii?Q?mQ=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: abd51317-eeac-424a-9bfa-08dc43276279 X-MS-Exchange-CrossTenant-AuthSource: PH8PR11MB8107.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Mar 2024 06:32:41.6026 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: oK5GemeKx7oueiP2As6CbLdgLrDvTnn2Wq8FpYB7t6oi7k//s8wQ3GG0FvD74sM9c5yR6/xZvzOMcUFkfWtfQJ5VbR1LJB/DGsgkAj3tcsA= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR11MB8255 X-OriginatorOrg: intel.com X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 196B9A0008 X-Stat-Signature: ai7p19hxits3y439n9muirqst37f4npq X-Rspam-User: X-HE-Tag: 1710311570-687827 X-HE-Meta: U2FsdGVkX1+MgBZbOUOwD9k4AHybtO4L9c30tWyzO5AsmN1kFtVMfiXymHeBAnIeHzQq9hyXAvt3YOsL9nXVaBcU2pWIqRu56R8etPDMMjhyA0zj5HbNcrENKjp0di53FvIr7S6FSu4lpoHSuYN+pJYUxo0NTd2w46wE8cVyr2ffO50SlK0oIQwMsf/LAWnPe9NoGyNGTbAe1qPx5myYbavIGDhpSkQazhP1HY1AeSLX82EJKUnrm2SCaaxdZtJBFaP1HMfZNKMXV9/nHEQZneyO2hG74dv+Z7hSSnsZcmnncqs22aMKq/Tk8RaL5kTOHYgvEQCHeBH1Ph+xtVAV+dhi/71irK7dcJJ+O/wIZiJNwJYahzU+naoIcdUEiSpMxhIbeiuY2w+3r4u0U5GarLL9V3qFgVaxQTeiAEb5Bci9EePYaArekaGXfkdrzowoTCYPa07Sx1/87vHPb6Jpay4gGdZRggYy/9Z1URpUw0tLnHyRGukQy+h4E456gVPkK9LL8FRkNK/WDlRyWhmw70myd94oRK5BQWpcnyhu9z3JNVQjLnoBF/tGuf3Gwu0xo7EUouSlAl9bsl7D1iPUzuwVY7lB0FhbAJA+baeNVN5IDiNL4jwoPyhFN7lIxWqSaCsAx5qKhWtXKQUQPR45TLHGZtc2wd4J62Rt2uvcKT/hC4w16+Gp1eJ/4NBRxQirLizOx1JWdh/gaoLMHoqlXMlcdoM0uLck1+++gCMpY1ZWfIIdasLN6tHzLIAnzSUzx3rMhA7h0kKuIDxnc869MdCI3gc5+B7trGI9X60EatV4UVmdJvqxRodiUleUejU7vc9hdj9ARcebdL2JoN0JUwRz+WEG5QVcQr9jgdRzc3OX8zCqXoMJuMRup/GkNrfJ3f6zCoHDTT1POulua3v/OPZJ9FCdhZiC1WJyYMlliurt2txpab7otA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Alistair Popple wrote: > Hi, > > I have been looking at fixing up ZONE_DEVICE refcounting again. Specifically I > have been looking at fixing the 1-based refcounts that are currently used for > FS DAX pages (and p2pdma pages, but that's trival). > > This started with the simple idea of "just subtract one from the > refcounts everywhere and that will fix the off by one". Unfortunately > it's not that simple. For starters doing a simple conversion like that > requires allowing pages to be mapped with zero refcounts. That seems > wrong. It also leads to problems detecting idle IO vs. page map pages. > > So instead I'm thinking of doing something along the lines of the following: > > 1. Refcount FS DAX pages normally. Ie. map them with vm_insert_page() and > increment the refcount inline with mapcount and decrement it when pages are > unmapped. It has been a while but the sticking point last time was how to plumb the "allocation" mechanism that elevated the page from 0 to 1. However, that seems solvable. > 2. As per normal pages the pages are considered free when the refcount drops > to zero. That is the dream, yes. > 3. Because these are treated as normal pages for refcounting we no longer map > them as pte_devmap() (possibly freeing up a PTE bit). Yeah, pte_devmap() dies once mapcount behaves normally. > 4. PMD sized FS DAX pages get treated the same as normal compound pages. Here potentially be dragons. There are pud_devmap() checks in places where mm code needs to be careful not to treat a dax page as a typical transhuge page that can be split. > 5. This means we need to allow compound ZONE DEVICE pages. Tail pages share > the page->pgmap field with page->compound_head, but this isn't a problem > because the LSB of page->pgmap is free and we can still get pgmap from > compound_head(page)->pgmap. Sounds plausible. > 6. When FS DAX pages are freed they notify filesystem drivers. This can be done > from the pgmap->ops->page_free() callback. Yes necessary for DAX-GUP iteractions. > 7. We could probably get rid of the pgmap refcounting because we can just scan > pages and look for any pages with non-zero references and wait for them to be > freed whilst ensuring no new mappings can be created (some drivers do a > similar thing for private pages today). This might be a follow-up change. This sounds reasonable. > I have made good progress implementing the above, and am reasonably confident I > can make it work (I have some tests that exercise these code paths working). Wow, that's great! Really appreciate and will be paying you back with review cycles. > However my knowledge of the filesystem layer is a bit thin, so before going too > much further down this path I was hoping to get some feedback on the overall > direction to see if there are any corner cases or other potential problems I > have missed that may prevent the above being practical. If you want to send me draft patches for that on or offlist feel free. > If not I will clean my series up and post it as an RFC. Thanks. Thanks, Alistair!