From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44401C02192 for ; Wed, 5 Feb 2025 10:12:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A8D028000A; Wed, 5 Feb 2025 05:12:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5588E280009; Wed, 5 Feb 2025 05:12:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F9A828000A; Wed, 5 Feb 2025 05:12:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 21DC1280009 for ; Wed, 5 Feb 2025 05:12:44 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id CB0FFAF7DA for ; Wed, 5 Feb 2025 10:12:43 +0000 (UTC) X-FDA: 83085477006.15.FFD4D8A Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2048.outbound.protection.outlook.com [40.107.92.48]) by imf23.hostedemail.com (Postfix) with ESMTP id 2412B14000B for ; Wed, 5 Feb 2025 10:12:40 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=J8xTEM9d; arc=pass ("microsoft.com:s=arcselector10001:i=1"); spf=pass (imf23.hostedemail.com: domain of apopple@nvidia.com designates 40.107.92.48 as permitted sender) smtp.mailfrom=apopple@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738750361; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zMWreldy8G9EIqtc4bLaLP3d8ZnUiqNtPVCS49Mc/hY=; b=8N3cvGcqiqVWsAmbmna1EbuF+0JgtMJfoYDe9TQgJ15Y2IuJzdwOO7GNhEZX1/+QOxzZks XvdUI5j2fWchRvdy5zBv5zfxpZx837WPYR2u2lz1p1V3hr4NoMsUfyJLFsnnBDR9WwttOV 5DZU1/sr3/vOy+svWQ3Kv3n2J4cb86s= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1738750361; a=rsa-sha256; cv=pass; b=vU2wf/wsNwSYSuh59qQXimlK0K3N2GeYKd8cMnoC6PyBQtMtY1Hl/GIU8MLQJA55dvqOw5 OAvcWiFdtUiyM3CVyEZsgUCqRJ4oPV2WC0NLO0VnQ4ODyWJ1rPmZkfzZwxyk/sYHPkvaPj lPyLBHyGjyCmscv98c7A0Y2E1Ldmy2U= ARC-Authentication-Results: i=2; imf23.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=J8xTEM9d; arc=pass ("microsoft.com:s=arcselector10001:i=1"); spf=pass (imf23.hostedemail.com: domain of apopple@nvidia.com designates 40.107.92.48 as permitted sender) smtp.mailfrom=apopple@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=by9n+5KCo5auXsWztDUBvBPw1wWA9/2nqb0NZnuCaGoKDlwetPsvnzcLjoOnB0Sa5MGN1GEC4o2fr9ByhMJ4v259vD53hmwOU7rgj2dqAutkSxKNfpAL+nVEzp0w30uQEC1oCrZ8ZgLOvG16O6VilmmNrWZbhTCjjkEoiWF8TPEF9ppaAHO2Z8ooy2V4GGkKrxcrkNqO2stNXek5TCQit4qkBwcoBECTavJ3XLI3rg97siiO8juDd9pNCaiTCgO4NPX6kgExtoWPMyl2swIMNoWxOKlc2dkqQhgIC8vSIpgbNlAbgiW3ycr/Ur1nMBTf48cxNPGrSxG36EmnQMH/7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zMWreldy8G9EIqtc4bLaLP3d8ZnUiqNtPVCS49Mc/hY=; b=cLWfoelL0+GHh4gNm15fZ4vU06jT797biORkXIO2COWforgOEPiSB1LJee5H4BDsXbsF5JORKu/yjGoH7oHFqwuMZTNuoesA/UW9CrImPPkQCUpr4qoZP7NjRWIiUjV4vMdmvxsmhV2PeVImItwjUMo8cC39yd58L834zNLBwwo/w1Y+AiNdzuXObkWeIwVr66c6hA91a5VgH+THSLwiAFHCrJRqQj1FAA2OhbcljJhDwYhrmxZBaxSIDeWHpAO2pDHrqqcPwyrCVc2u0krui/oz4ADHS/tjEc8LZkmbej8s/otQC3asFPQNWLMbl0rUbiK0epvOGEQwduGzoqBlJw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zMWreldy8G9EIqtc4bLaLP3d8ZnUiqNtPVCS49Mc/hY=; b=J8xTEM9d8BKEV5wBnNz2KvcTV1CgYpwny+mvboa8aYp5ZG8HNtLDrlHGbtvoAeJKaJDDG60AU2QXwklTPnR4z2v6zaSbyK0i1c2p6CbuAKvEatgs631wJJznv8bxDqnRy0Dm60UAayqubp+JrrvCdOalW5VEUiVxsnW31cpbUKDjgy2sSL2Um2kjVlM7DWxPNn9lSaKcl07UHTaIEvNTbJ5B0MUKgD//4zyM/te98GGyRMQYoqrxhTLzr9KEQh/21PJl7iAosCO+raJnzPYQXImklOitnXOENFH6MVlDhdNMZv5rGPxTJVv9S721DK6nmBhug3phn9Gjzd7VH8GPPg== Received: from DS0PR12MB7726.namprd12.prod.outlook.com (2603:10b6:8:130::6) by IA1PR12MB7688.namprd12.prod.outlook.com (2603:10b6:208:420::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8422.11; Wed, 5 Feb 2025 10:12:37 +0000 Received: from DS0PR12MB7726.namprd12.prod.outlook.com ([fe80::953f:2f80:90c5:67fe]) by DS0PR12MB7726.namprd12.prod.outlook.com ([fe80::953f:2f80:90c5:67fe%7]) with mapi id 15.20.8398.025; Wed, 5 Feb 2025 10:12:37 +0000 Date: Wed, 5 Feb 2025 21:12:33 +1100 From: Alistair Popple To: David Hildenbrand Cc: linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org, willy@infradead.org, ziy@nvidia.com, jhubbard@nvidia.com, jgg@nvidia.com, balbirs@nvidia.com Subject: Re: [LSF/MM/BPF TOPIC] The future of ZONE_DEVICE pages Message-ID: References: Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SY5PR01CA0081.ausprd01.prod.outlook.com (2603:10c6:10:1f5::15) To DS0PR12MB7726.namprd12.prod.outlook.com (2603:10b6:8:130::6) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS0PR12MB7726:EE_|IA1PR12MB7688:EE_ X-MS-Office365-Filtering-Correlation-Id: 0e602397-412d-4e95-e2dc-08dd45cd9dc6 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?7RDId/ZRD4AHPr/wWh5v5r9baECXULdSN80M/AwMrUceq78XdKtIBak7R/sN?= =?us-ascii?Q?TpOXuq9b2Hi93Jmb2PD5Yh8MSBDTxboDoGNTwQt+YSCM4oOV2VB87JZO9357?= =?us-ascii?Q?1+y81Ac2zRE+sS7XW3UYlot6Uhsxh0we3KpRLBUGWvQ8S4yT4hvOqXqnd4Uv?= =?us-ascii?Q?/P0W7YGHQuauu1qwwObPksnW6PFyIYdzp/1fTnQy7F0P+ErkI3sGSfrsCRpJ?= =?us-ascii?Q?t7wa80AS43fOqWYI9gu/0BSR3ZxRHx2UFVl7c2Va28KmRbCg/AULIupvB3Gu?= =?us-ascii?Q?F16m2MNgfR+Ra6U9Am9QGyHP/bwnu0r3F05ZB79KpK+sPeHl+EV28DJApiwc?= =?us-ascii?Q?24ddIlD3lOyjlnAD5ozouvPRiBUnL+gYC1dKAbb4ImB+vjh1sgKzZVIfOMVv?= =?us-ascii?Q?1exo2Ck7MTdvpXZnvaNraTsN8zzNxRzOB4LPe0yLedYbiBQf3SamPcNf9u57?= =?us-ascii?Q?xQBPvVeut39rkl394jyXLZs6ILYSuB0MCCFEl/+rnsO8hacx/q+zkBSJoczQ?= =?us-ascii?Q?/22OY8tN8kHjkE5wbKPo4xgg97qT7vDwDlWm3VbXQVbK60ZJjaupkYXO9M1d?= =?us-ascii?Q?5T8rbMKtCoXDewxnq14T8E7EtMjW+74J0oWch1fyUvSlW5PL7Ug/sJMsLSXd?= =?us-ascii?Q?So6+PDvOIwQGzaV4A6Tk1abPDmJJs6ehbOCqIxjWNUQgXycWTD/12kI2SOs3?= =?us-ascii?Q?a0Uppd7EX9wcuPfDbn5gUay0NJR5h2Py5EnIHiqXZ/MPl8j1pWgWBW4AqdC6?= =?us-ascii?Q?iC+wjq1I/7M5bqdasvXJRfK+b0CdDxt6e2vHSc2wl5boK4MQnYRPa6d3qMyL?= =?us-ascii?Q?vM3fh4rFsDcylfB0SiK9WgDWq/+2AW1JDkLtK4GFWuUs2p02MH3SN/CuhJXY?= =?us-ascii?Q?iACS75ekUtuok41yoO3g6REfHT2myZOJVVfMRw6TrUkQ8MlRyWa9xp7utp+o?= =?us-ascii?Q?L09zj29WP5aFQFiFbqNPhVsSwhbINkyxafBW2GXSckxLPcAaH7buoHyNYW6E?= =?us-ascii?Q?CkQCJaEeHI2ix9EnSoXTkdSN1FzRXGyKTuqynWPMlo4dil6GqBftfm0m0cAZ?= =?us-ascii?Q?soowZiwYzend6Z95QDowQv7ZHYp27CigI4WWbeichPWkDvPxQ4UWowFC5ZBI?= =?us-ascii?Q?AKvnhhbEwH0pPwwXv2dJmnma84dBUUsamDuubfoeTdFkhP6ba82CI1NODjXA?= =?us-ascii?Q?krT4QP5Kxz93KChi69Lgo7WNFvL4R0uIefSz3s1cazztzzdF1HaD5Hkz4dq8?= =?us-ascii?Q?AH+9DSGRYSoC7yxYrxffca+n/tn0KzvcuJKCnjNeODHsqdQmKmaNgyZj7tZm?= =?us-ascii?Q?+b4tJVtcHT9ixsRLIx9R4B5OO8ahqSH2pQeYdrioVetPqUYcsNTZ8JH98tRo?= =?us-ascii?Q?34gIToixa4sHpCFADdEK4XTmIjBh?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS0PR12MB7726.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(366016)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?+QdFroL4dHi/vcCw+K+NhxQCyKefcneYyi2yu4pYbjRffctA6AnD29fuX+F8?= =?us-ascii?Q?vqLUPo34oyWfVutBQBa44MP9ZWQMSjKcuAwDzkPUNXpy03W8DooydgQJidT/?= =?us-ascii?Q?I2ChnrlVOUIy8sWyt1bnhRP2nya4al6JSUrV6/ig0FAKKGa5AFqmL2Lvl4D/?= =?us-ascii?Q?0n8D57fc8ew0imI95wY9mL/9FXJhIHhHd4BLEj2RAEOX7dLsaUNGOcqw4y1a?= =?us-ascii?Q?Qqou/RxwZlwyYyLSTT9hf2hVGai8K3q2fdVpv6dawdBbB1UPGDJuVHOOU2LM?= =?us-ascii?Q?Xjh4M11y2sjOCddBokoTkukEUHgq2NF60DYMs9XLcU8GLa4e4cmfIGvhmnyw?= =?us-ascii?Q?tw+I0g/hhN2GfkF0aLbfewRmr2WbQBZ6xSv0kGvY+70d8t4hZF++A7lLwCMw?= =?us-ascii?Q?DVniK5O0wAeIg86M+JvVc+r8F5QsRXfKsswDnS7RjgAAoB5oCN38ZpUCHo2S?= =?us-ascii?Q?+bdQcbnO9G4kcIJ70UvwQatecjjJ13PpV45CYPmf5Xqh6cNWyYCIU6Lejoxx?= =?us-ascii?Q?mmlKxc+CIhC7zW4VfTzln4zVuSuDY0e0iunnNu9813QZOrNL0C/rvheAwXia?= =?us-ascii?Q?2mVbX7O4Iz8wfZgyj27i1cRMUcG9SkvUQ17gEXyuoQx0eKDqtaFutg0tixPV?= =?us-ascii?Q?rUhi7zzFJqQi5A/wL7ROaomMJAOmnq7WW0Th69+4QeCIPe4PqoYScilxpoMl?= =?us-ascii?Q?gH/y7rhXGmcVtSrq00Z1EUX4CwgYVDq3La6KwY0JUM3Eij/le1K0RzpMw+QC?= =?us-ascii?Q?UHjH3rDXacynizCSlE4ptj/CEmlGDkmL7pbW+TTZo5bn7a9Pk5z70O2SBKsJ?= =?us-ascii?Q?n3Tx4RyJrBpdPxQ3cQwfxPOKq0GubhzQckHBTAoObasQMjQRpFxjbpNboxf/?= =?us-ascii?Q?0ZLDKp82SgUgVdcVdhpInbCDXPDNZ91svLuqpoF2AonDCYpKwwmRIDjD+NMG?= =?us-ascii?Q?Do4KY4rGr0l/rwSeMl8HQWZvDI4Z7msfJYYsZEiuI+TBYBltr6uhwMvQnONi?= =?us-ascii?Q?xDlBRCqB7J65dfQD/j0tQgl8WQ9rXPVuGCla1cFZ1OmKlGNXlKqfpxz4BJvT?= =?us-ascii?Q?tyk0R1jkWzZkBUeZvcgyALXyRuyYCtcpnGSmQoARKjackN6SEvhe13oCSphM?= =?us-ascii?Q?Fw+AL2RSRe3bzVgRjFFkOJ6wCrVRZS3DgRUIipJGmWzu7Z7VJQ9BPuZsutnB?= =?us-ascii?Q?Ul0xMc19Bputl11RaWiDsAaT78ZUxus5QRx5sVtXlDgIbTUDExZcyZWEvhMk?= =?us-ascii?Q?mpHXbiT70iFMB4C6imvqwJuvud7udMdQwrtlfMCmnbfepkkWr4lEhbe4aLvi?= =?us-ascii?Q?fmD0pkSiq9g8tA7qDAime3a0cijbB32mf0TGszb8311kwKeKpHCwm27XwdZy?= =?us-ascii?Q?V18QQlThEgowL0hddDu0I2Yfp6c5GxDWPHdA5GL190VoULdaZquTHn2AiX3L?= =?us-ascii?Q?l6KChIVhBXFQLnb/DdcVaxiDMBROEEJUuLZB4YVlt2teCqHVGvcbtbMIRIz8?= =?us-ascii?Q?tIbosWpMMGI14X9z5k5pTh0n+qmduMmd2VJwYJnQtiz6x3U4xAfPrNrmsqMa?= =?us-ascii?Q?7IiPB16viGXPwivCg3Q1CJ11xQulT/sQVD+52Rkq?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 0e602397-412d-4e95-e2dc-08dd45cd9dc6 X-MS-Exchange-CrossTenant-AuthSource: DS0PR12MB7726.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Feb 2025 10:12:37.6961 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: VykqeHkeCb8dZgHKzQVTeOY2C3atxKZQkKxcloOiOl9usXpdpvmk38GX5X0nu4m96aPcBxgRitnCAj2UCtAu9g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB7688 X-Stat-Signature: pbudsdfow4w8h6jd3zb6nw6km1m77d6e X-Rspamd-Queue-Id: 2412B14000B X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1738750360-687808 X-HE-Meta: U2FsdGVkX1+2yZBg3RRaUF3Un/C5o1m+13ApgsDq2YkNDglCXYmePogOV9y4k69bV+5VjJjfW4bkhH1p7PyAUVJUDs14eJBFk+t5z1i9zGtD1A0wCMbAZ7TyCG9LEvCSWO/DId9xRXSTPpoBk+JF1NVX15fD4X2UBmF2qBkWCLDU+hdn6DPrp3+Bsnx2thBlm1Tmy2xz99HyOdyObJRZhKFZvoVNIwmXqXkOTjJPsRPbd5Jm0rPaCxAEAWuU4r/6pnwYyR5reMcc/xBIsDevSwnZheJ25GXUta7mD5BCaVE3ifC/i7tqPK+hh8I6A+Al+6J7ib96rXl5MSbGJJB//vSQ0YM1wfX+9Vy2CM0jxiFyQXgOmjdK5E632+zHYrnSItCQBDBX3Mc7+Kx7MC3d+MIt9Ay0ZwXvgqypgBrPtd/EG9IQYDi6htvLYf2XANYi6q7xjLSBVVz1m5RP+3QvsUfoJIrnTEqZjL1zBhfL0SorrbKtNpJ2r/rgfH8seN+wWUjcVDQ4cUzl46MG9NfC1JBk33je1CJ0ZZTo35jmEAfFtiOEYgS9msXd1yyHFDtWESBI9T41WIytOsh3rReuVerw4VPbK7hPLfZUNZgzsE2BfEM7Iiylgr1fuWxFwC6dXNEevhOjfnleVvRSWN3ChQs/9GrUaJZns/2WUXYX3eXqYB2QyAubKI93KAliuTrLJRd1R49xXZyK1ec3ACXrAnw3LPyXhBmZPeUeTvf3GbjUrng9+D+Z27k31+IJ9o2lqAOrn7JQEM1brAjuQPzOSVqYt8oSLRMLTpLFgDVtgpwI+auvBvkQQaLlSg9hTCgsTOlTkXeHccLNzPxA0vxKmvt7mGThsiMEZEYnoeG2xpNpozfTQ1fDqp7yubon0hjvCnMXijRYJ+YefIv5ZOPDOmDuJPyvoJNIqHbm9I1mx6NMbGo+nJ362s0InCG2/p8pVevaquj75wTQX0KabfF JFOyzlzu J9h3ODKqLjVvFj/dBjV1imM6h9GjqSTtcuzlSJi2aDXLtK25LYuLmJJFIvDZn5eDT6jWZ56DWNzDMtlNhwtOknZBg+eQMcPZk64kEGx7v5enZz/iwvxsEc2qWFX+nJREFkiVP7y0fxRoSLFCLVwIgiS/2NeoGZ3AjeAWZL7PWr+qcYp2/70whCPRgFTpIB9EN+f1O6NqqwBGzC6VJCKUZlrqLc+gGapxhWbjfMqZ0j8oHJyJaWJWuvIV3j3ukK3yRW8AMqDO06B22WYyZDxoAE80fur+vK/2ox7bq2Wq5JFVkWxV/vGjoFimZTvYkbuDMtJ0DGzz4wPLfDkZZ0ef3STO4CVUxMbG8SnSI6PyvB3QI6QpXJe8gfnJCwcTg6EoQL38JXD2i1ZMyudQEA+snNXk1EPJ6e9LRfxuKCVfcn2tDMQYO6/IF4sinMLkGIcBQ/x8p7w4M7mhUSi6+hMOOhVv9gGgE9whV0IdD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 31, 2025 at 09:47:39AM +0100, David Hildenbrand wrote: > On 31.01.25 03:59, Alistair Popple wrote: > > I have a few topics that I would like to discuss around ZONE_DEVICE pages > > and their current and future usage in the kernel. Generally these pages are > > used to represent various forms of device memory (PCIe BAR space, coherent > > accelerator memory, persistent memory, unaddressable device memory). All > > of these require special treatment by the core MM so many features must be > > implemented specifically for ZONE_DEVICE pages. > > > > I would like to get feedback on several ideas I've had for a while: > > > > Large page migration for ZONE_DEVICE pages > > ========================================== > > > > Currently large ZONE_DEVICE pages only exist for persistent memory use cases > > (DAX, FS DAX). This involves a special reference counting scheme which I hope to > > have fixed[1] by the time of the LSF/MM/BPF. Fixing this allows for other higher > > order ZONE_DEVICE folios. > > > > Specifically I would like to introduce the possiblity of migrating large CPU > > folios to unaddressable (DEVICE_PRIVATE) or coherent (DEVICE_COHERENT) memory. > > The current interfaces (migrate_vma) don't allow that as they require all folios > > to be split. > > > > Hi, > > > Some of the issues are: > > > > 1. What should the interface look like? > > > > These are non-lru pages, so likely there is overlap with "non-lru page migration > > in a memdesc world"[2] > > Yes, although these (what we called "non-lru migration" before ZONE_DEVICE > popped up) are currently all order-0. Likely this will change at some point, > but not sure if there is currently a real demand for it. > > Agreed that there is quite some overlap. E.g., no page->lru field, and the > problem about splitting large allocations etc. > > For example, balloon-inflated pages are currently all order-0. If we'd want > to support something larger but still allow for reliable balloon compaction > under memory fragmentation, we'd want an option to split-before-migration > (similar as you describe below). > > Alternatively, we can just split right at the start: if the balloon > allocated a 2MiB compound page, it can just split it to 512 order-0 pages > and allow for migration of the individual pieces. Both approaches have their > pros and cons. > > Anyway: "non-lru migration" is not quite expressive. It's likely going to > be: > > (1) LRU folio migration > (2) non-LRU folio migration (->ZONE_DEVICE) > (3) non-folio migration (balloon,zsmalloc, ...) > > (1) and (2) have things in common (e.g., rmap, folio handling) and (2) and > (3) have things in common (e.g., no ->lru field). > > Would there be something ZONE_DEVICE based that we want to migrate and that > will not be a folio (iow, not mapped into user page tables etc)? I'm not aware of any such use-cases. Your case (2) above is what I was thinking about. > > > > 2. How do we allow merging/splitting of pages during migration? > > > > This is neccessary because when migrating back from device memory there may not > > be enough large CPU pages available. > > > > 3. Any other issues? > > > > [1] - https://lore.kernel.org/linux-mm/cover.11189864684e31260d1408779fac9db80122047b.1736488799.git-series.apopple@nvidia.com/ > > [2] - https://lore.kernel.org/linux-mm/2612ac8a-d0a9-452b-a53d-75ffc6166224@redhat.com/ > > > > File-backed DEVICE_PRIVATE/COHERENT pages > > ========================================= > > > > Currently DEVICE_PRVIATE and DEVICE_COHERENT pages are only supported for > > private anonymous memory. This prevents devices from having local access to > > shared or file-backed mappings instead relying on remote DMA access which limits > > performance. > > > > I have been prototyping allowing ZONE_DEVICE pages in the page cache with > > a callback when the CPU requires access. > > Hmm, things like read/write/writeback get more tricky. How would you > writeback content from a ZONE_DEVICE folio? Likely that's not possible. The general gist is somewhat analogous to what happens when the CPU faults on a DEVICE_PRIVATE page. Except obviously it wouldn't be a fault, rather whenever something looked up the page-cache entry and found a DEVICE_PRIVATE page we would have a driver callback somewhat similar to migrate_to_ram() that would copy the data back to normal system memory. IOW CPU would always own the page and could always get it back. It has been a while since I last looked at this problem though (FS DAX refcount clean ups took way longer than expected!), but I recall having this at least somewhat working. I will see if I can get it cleaned up and posted as an RFC soon. > So I'm not sure if we want to go down that path; it will be great to learn > about your approach and your findings. > > [...] > > > There is a lot of interesting stuff in there; I assume too much for a single > session :) And probably way more than I can get done in a year :-) > > -- > Cheers, > > David / dhildenb >